Advent of Computing - Episode 134 - Beyond the Punch

Episode Date: June 23, 2024

This episode I'm opening up my research vault to present some interesting pre-digital technology. Back before computers us humans used to write everything down on paper. Over time that lead to some or...ganizational issues. By 1890 punch cards show up to solve one aspect of this problem, but that technology had it's limitations. We will be looking at other paper-based approaches to data management, as I slowly try and explain a realization I've come to about the early history of hypertext.

Transcript
Discussion (0)
Starting point is 00:00:00 In 1968, Doug Engelbart stunned the world with a demonstration of a new program called NLS. That event has come to be known as the mother of all demos. In it, he unveils the mouse, a graphical user interface, networking, telecommuting, and hypertext. It struck as a bolt from the blue. Most of these technologies were completely new, or had never actually been implemented on a computer. For those in attendance, it must have been like stealing a glimpse of the future. But the demo didn't come out of a vacuum. If you've learned anything from my long-winded ramblings, it's that few things truly come from a vacuum. It took years
Starting point is 00:00:47 of research for Engelbart and his colleagues to reach this point. Engelbart himself was inspired by an earlier paper, Vannevar Bush's Seminal As We May Think. The core idea of that paper was that better user interfaces needed to be developed, and that data needed to be organized as chains of connected information. We can go deeper still. In 1962, Engelbart publishes a paper titled Augmenting Human Intellect. It's the blueprint for NLS, which really makes it the blueprint for modern computing. If you listen to the show, then I'm sure you've heard this story before, but I'm going to tell it again because it's fundamental to one of the greatest puzzles in computing. In an appendix to this paper, Engelbart describes a cardstock-based hypertext system that he used during his research. He used the system while writing Augmenting Human Intellect. The system had links, connections between ideas.
Starting point is 00:01:47 It could be sorted, organized, and ideas could even be stitched together to form larger ideas. And the medium? It was the enigmatic, edge-notched card. It's a technology that, I think, may form a missing link between pre-digital ideas about data and the modern computer. Welcome back to Advent of Computing. I'm your host, Sean Haas, and boy am I happy to be back home. host, Sean Haas, and boy am I happy to be back home. It's been about a month since I've actually sat in my office studio in front of a microphone, so I'm very, very pleased to be back in the saddle as it were. This is episode 134, Beyond the Punch. This time, we're dipping back into my
Starting point is 00:02:42 super-secret research. This is an episode that I've been picking away at during my long travels, and I finished up now that I've been home. It's something that I think is impactful, and it's something that's been puzzling me for a long time. There's a reason I named the podcast Advent of Computing and not Advent of Computer. The history of the computer is so much more than just the history of the electronic digital computer itself. There's a pile of context both around the computer and prior to the computer. Hence, I cover computing and not just computers.
Starting point is 00:03:20 This is one of those episodes where we'll be in the larger category of computing. Today, we won't even really touch a computer. Rather, we're going to be talking fundamentals. The technology behind the punch card is patented by Herman Hollerith in 1884. The punch card itself, in primitive form, appears in 1890 during that decade's US census. One view of computing puts this event as the start of the digital age. You can trace a line from Hollerith's punch cards up to modern day computers. But hey, you know me, it's Sean. I love to introduce complications. I love to ruin a nice story.
Starting point is 00:04:01 I personally believe in this larger tides and forces view of history, that over time, certain inexorable forces, certain ebbs and flows, lead to progress, or at least lead to change. Call it psychohistory if you're a big Asimov fan. One of the outcomes of this view of history is that you start seeing weird connections between events. One cluster that's fascinated me for a while occurs around the time of the punch card. There's this strange liminal time period between this punch card era and the first digital electronic computers. It spans about 50 or 60 years, from the 1880s up into the 1940s. or 60 years, from the 1880s up into the 1940s. During this period, there are a number of paper-based storage mediums very similar to the punch card. At least, they appear to be similar. Some are derived from the punch card, some are extensions to the card, but many are totally independent from
Starting point is 00:05:01 whatever Hollerith was doing. Today, we're going to be living in this liminal space, looking at pre-digital paper storage, and I swear, this is all going to start to make sense and come together as we go. The key connecting fiber here is data. That is the mantra of the day. Punch cards are the best-known paper data medium, but they aren't really a very good format. That was even the case back in this early time period. There are certain things that punch cards simply can't do. They're a tool that only works for a few very specific problems. We're going to see how other kinds of cards got around those limitations,
Starting point is 00:05:42 how other formats lead to different possibilities, and how all these mediums are connected. This is all leading up to a larger idea that's been brewing in my head. If you've listened to the show for a while, then you know I have some frankly wacky and out there ideas about early hypertext. Well, I think I've recently put together another part of that puzzle, but we're going to have to get there slowly. You're going to have to buckle in for this one, I think. The entire framing for this episode is going to hinge on a bit of an anachronism. Put another way, I'm going to be pulling a mean-spirited trick in order to make things easier to understand, to give us a nice little word instead of a sentence to explain things.
Starting point is 00:06:29 Today's anachronism is the information problem. It's the lens we're going to be using to examine everything this episode, so I gotta explain it. I talk about the information problem at length whenever I cover hypertext, so I'm just going to give a short explanation. That's all we'll need, after all. It's this idea that was first formulated by Vannevar Bush during the Manhattan Project, and then explained in 1945 in his article As We May Think. The core of the idea is that there will reach a point where there's so much information
Starting point is 00:07:04 that it becomes impossible for you to find what you're looking for. On its face, that sounds like an annoyance. It's like having too many channels on TV, right? Where this becomes an issue is in research. Imagine you're trying to do something totally new. Like, let's say, splitting the atom. You know, random example, unconnected to history. To get to that point, you have to figure out the current state of the art. You have to survey all
Starting point is 00:07:32 relevant works and figure out where things stand. Otherwise, you're going to go down dead ends that have already been examined, or reinvent all kinds of old technology. Reinventing the wheel is one thing. Reinventing a method of isolating isotopes in a centrifuge, well, that's another. That's a much more costly reinvention. Eventually, we'll reach a point where there is so much information, such a glut of data, that research will become impossible. It will take too much time to search for sources, to dig up the background needed. It will become impossible to find the shoulders for sources, to dig up the background needed. It will become impossible to find the shoulders of giants that you need to stand on.
Starting point is 00:08:12 Thus, progress will end. We'll enter this paradoxical dark age where the sheer volume of information will make any information meaningless. The information problem has been stated and formulated in many different ways over the years, but I'd really like to go back to Bush's version. It's simple, concise, and it falls nicely on the modern timeline. As you may think, as I stated in the very beginning, has a direct influence on the development of hypertext in the mind of the 20th century. So we can use Bush as something like a key to understanding information management. I think this makes it acceptable for me to use this later lens to discuss earlier
Starting point is 00:08:51 developments. It also, I think, helps that Vannevar Bush wasn't a digital computer person. He wrote, as we may think, before the first digital computers came to life, and he never worked with digital machines. So, a closer reading kind of shows that he's viewing the data problem here in a very old way. But that's a totally different discussion. We can start by applying this idea of the information problem to punch cards as they existed in 1890. The state of the art at this point was, well, it was rough. Hollerith had initially developed punch cards specifically for the US census. They were something of a special purpose medium, at least roughly speaking. In general, all punch cards are composed of a grid, with each position being either a whole or an intact region of cardstock.
Starting point is 00:09:49 A census card broke data up into categories, which were fit onto the card wherever there was space. You get these uneven groupings of numbers, a tiny male-female chunk, regions for states and languages spoken. The entire card is almost gerrymandered to find holes. This is fine for special purpose or custom applications, but the specialization has some knock-on effects. It means that equipment has to be specialized. Initially, this is really just the tabulator, the machine that reads the punch card and acts on its data. Census-era tabulators were basic. They simply incremented counters on each category as they read cards. Readings were taken down manually into paper ledgers. There's also the complication of sorters, but we'll get to that in a bit. Punch card technology does improve over time, very much so.
Starting point is 00:10:48 But this is the primordial state of the medium. In order to use a punch card for something other than census data, a custom card format has to be designed. A custom tabulator has to be wired. I think it's clear to see the downsides here, so I'm not going to dwell on them. Rather, let's look at the utility. The first huge advantage here is the unambiguous representation of data. This had existed in earlier eras, perhaps most notably in the prehistoric punch cards used in the automatic looms of the 18th century and
Starting point is 00:11:19 spools of music used by player pianos. All these formats allow for good and proper digital data encoding. A 1 is a 1, a 0 is a 0, with no wiggle room. That's one huge thing to love about these mediums. But there is something crucial about these earlier card formats. Loon cards and piano rolls are missing a critical part of the digital equation. Their data isn't fully discrete. Loom cards were chained together to form a pattern. That meant you couldn't grab a single card out of the chain. You would lose context and destroy the chain in the process. The same is true for a piano roll. To pull out a note, you'd have to take scissors to the thing, and you'd be left with,
Starting point is 00:12:06 you know, maybe the whole to play a single chord. It might be a nice chord, but pulling out that discrete component would ruin the song. Punch cards, on the other hand, are totally discrete. Each card can stand alone as an indivisible grouping of data. That has its pros and its cons. On the positive side, you can actually have a smallest unit of data. That means you can shuffle things around without destroying your dataset. You can have unordered information. You can actually do something like manipulate your dataset. You can have it stored and sorted in different ways, making groupings of similar data, or you could even pluck out a single card. But this is a double-edged sword. Since punch cards
Starting point is 00:12:51 are discrete, there's no inherent order or context to their data. That's fine for something like a census. It's great for raw datasets. It breaks down when we get to more complex information, to digested information, or information you intend to handle more than once or twice. In that sense, punch cards are a great medium for raw information or for long-term storage. They can be a fantastic medium to work with or to express datasets in. But you don't want to live in the world of punch cards. It robs your data of context, of a whole other dimension of information. By the 1900 census, Hollerith had refined his design. He designed a general-purpose encoding scheme, which meant all cards for any task would represent data in the same way. That would solve one major issue with the medium,
Starting point is 00:13:46 but the context issue still remained. A partial solution was completed in 1901, an automatic card sorter. Now, there were ways to sort cards earlier than that, but they hinged on running them through a tabulator, which would automatically open a trap door on a sorting box that you could then drop a card into. It was fully manual. The new automated sorter was a device that sorted cards into categories based on their content. You would put in a stack of unsorted cards, they'd be fed through a hopper, through drums and brushes, and then eventually be sorted into a number of pockets on the far side of the machine. In this way, you could, say, automatically pull all census cards for males,
Starting point is 00:14:31 or sort roughly by age. This helps, but it's only a partial solution. It's also a clumsy solution. A card sorter, at least in this design, only lets you categorize and order your cards. It also exists outside the medium. By that I mean, punch cards don't have any inherent way to order or group themselves. There is no part of the card's encoding that accounts for order or context. You have to have this separate machine that knows how punch cards are encoded, how data is stored, how to read them, and how to move them around. You also need a way to configure that machine. You need all this extra artifice to get your data into a meaningful context. In this way, punch cards solve the information problem, but only really a sliver
Starting point is 00:15:24 of it. They're able to take a large amount of data and find a way to quickly answer questions about that data. How many men were living in New York in 1890? What was the median age of a factory worker in Pittsburgh? But there are all kinds of questions, all kinds of aspects to the information problem that they can't solve. Crucially, this is not a modern critique. Folk took note of this limitation as early as the 1890s. There were attempted solutions in that very decade. Many of these modifications and expansions came from the world of accounting.
Starting point is 00:16:02 When you strip away the entire facade, all the technical details, tricks, and twists, computing has always been about automation. Programming languages were developed to automate away the task of programming. Computers themselves are developed to automate away tedious mathematics. Punch cards were developed to automate away record-keeping and simple data processing tasks. This simple reality leads to kind of a funny outcome. Many advances are actually made on the shoulders of very boring problems. Automation is created out of sheer boredom with a task. The information problem is a prime example.
Starting point is 00:16:41 Hypertext is developed because record-keeping and data retrieval, literal librarian work, is too repetitive and too boring to do by hand. Then it should come as no surprise that we can find exciting developments by searching for boring, repetitive work. Accounting fits this mold to a T. There are two boring aspects of accounting in this period that were ripe for automation. The first is the mathematical side of the equation. The discipline required long and tedious sums, which were already somewhat automated by mechanical calculators. Punch cards further automated this process and were a huge boon to the industry. They allowed information to be stored and then summed at a later date.
Starting point is 00:17:34 But punch cards and early machines couldn't really categorize data. In 1901, Hollerith did introduce the horizontal sorter, but that was a very limited machine. It could sort cards one column at a time. It was a workable solution, but with many caveats. Thus, there was room for improvement, room for automation. One of the accountants to take up this dangerous task was one Charles H. Talmadge. Born in 1857, Talmadge would never make it to the digital revolution. He would, however, live through the first bits of data automation. An accountant by trade, Talmadge would first come into contact
Starting point is 00:18:12 with automation in the 1890s. In 1893, he co-founded the Automatic Timestamp Company. The company's first product was, as you may be able to guess, an automatic timestamp. It was a very simple clockwork device that could stamp the current date and time on a slip of paper. By 1900, Talmadge is out on his own. It's from here that things get interesting. In 1906, he files a patent for a primitive computer. Now, I did say this episode would be computer-free, and I still mean it. This isn't a programmable computer, or any computer in the true sense. It's a so-called accounting mechanism. On the surface, this sounds pretty out of place. And yeah, honestly, this is one of those times where we just seem to get a huge jump in
Starting point is 00:19:06 technology. Talmadge just kind of shows up in 1906 with this almost-computer. What makes this almost-sensical, to me, is some interesting context explained in the book Before the Computer by James Cortada. In that text, Cortada explains that automation technology in this period should be seen as a collection of similar mechanisms. Stick with me here because I think this argument actually pulls a lot of interesting ideas together. To quickly paraphrase Cortada's argument, it all starts with simple mechanisms like typewriters and, yes, even automatic timestamp machines. To make something like a typewriter requires a certain level of mechanical know-how, both on the design and manufacturing side. You need gears and levers and cast parts with certain tolerances to them. That means once you get into these automation machines, you fall onto a slippery slope.
Starting point is 00:20:05 Adding machines and typewriters use very similar parts. At least, you can make all the same parts using all the same manufacturing equipment. Tabulators just require a few electronics and relays on top of existing mechanical parts. So many times, a company would make all kinds of automation machines, typewriters, adding machines, tabulators, and even these almost computers. So, back to the specific accounting mechanism. This is a machine that's very much in this mold of mechanical automation devices. It's a machine to automate double-entry bookkeeping, a very particular form of accounting.
Starting point is 00:20:46 It has a set of memory cells, each of which contain a number. That number represents an account in, say, a bank. Those cells each have mechanisms for adding and subtracting from that register. A teller runs a transaction from a central console. For, say, a transfer between accounts, they select a transaction from a central console. For, say, a transfer between accounts, they select a source and a receiving account, then input the transfer amount. The mechanism physically connects the corresponding memory elements, then grinds some gears to complete the transfer. It even makes a little receipt slip for you. The accounting mechanism isn't programmable. It's not really
Starting point is 00:21:25 configurable either, but it handles inputs and outputs. It stores a state. It has addressable memory. And there are state transitions. It's super, super close to a computer. What's really weird about the whole thing, at least to me, is that Talmadge even shows memory as a grid of cells. Many mechanical machines just kind of kept storage around, since there were physical limitations to keep in mind. But the block diagram that Talmadge has in his patent actually looks pretty modern. You can look at it and point out memory and I.O. channels very easily, something that is common on a block diagram for a computer. Something interesting to note here is that the accounting mechanism uses internal storage only. At no point does it pop out a
Starting point is 00:22:18 punch card or some kind of data medium. Inputs are only possible via a little keyboard. Its internal state is simply kept consistent on rotors and gears, not stored on reams of cards. The output can be a little receipt, but that's not a data storage medium. This may seem strange, and it is strange in the context of computing, but this isn't entirely a computer. It's meant as an automation machine. But this isn't entirely a computer. It's meant as an automation machine. In that sense, it's basically a highly specialized adding machine with built-in memory. The idea here was that an accounting machine could be used to replace ledgers.
Starting point is 00:23:01 Good old-fashioned pen-and-paper accounting requires that kind of long-term storage. Data stays in a ledger forever. Therefore, account totals would have to stay inside the accounting machine. A bank would be able to drop this in place, maybe inside a vault, and use it to fully automate a large swath of their accounting needs. It would stay in place, running transfers forever. In that capacity, this is less and less like a computer, and much closer to something like factory automation machinery. But Talmadge didn't stop there. In 1908, the same year of this mystical accounting machine, he filed another patent. This was very cryptically titled Indexing and
Starting point is 00:23:41 Assorting Means. If you know me, then you can probably guess how excited that title makes me. This patent is one of the earliest examples of an edge-notched card. It's also this wild mashup of technologies. The actual filing is simple. Townwood's cards are a modified punch card that includes space on their edges where notches can be cut and a registry hole in one of the corners. Data could be added to the edge of the card as notches, but crucially, this wasn't normal data. This was metadata, or indexing information. It was just little bits of information used to describe what kind of data was punched on the front of the card, or, as Talmadge called it in the patent, category data.
Starting point is 00:24:33 This notching allowed for two cool tricks. The first is grouping. It's possible, through a pretty fast process, to select all cards that fit a certain category. This was done using either two needles or a needle and some kind of edge on a table. So let's say you have a stack of cards where each card represents a task. Completed tasks have a notch in some position and incompleted tasks are notchless. To select completed tasks, you set one needle down on a flat surface, or take some kind of little protrusion on a flat surface. You then take your stack of cards, line up the lower needle with the position of the completion notch, and push down. Cards with a notch will fall all the way down to the table surface, since the notch fits around the lower needle. Cards without the notch will stay on top of the needle.
Starting point is 00:25:33 The upper corner of the cards, where the single registry hole is, will now be accessible. You can take your second needle, thread it through the registry hole, and pull out all the cards without notches. All cards that were pushed up by the lower needle. Thread it through the registry hole and pull out all the cards without notches. All cards that were pushed up by the lower needle. Then you are left with two stacks of cards. One for complete tasks, notch and all, and the other for incomplete tasks. By using this operation, you can index and organize data. The second trick is in maintaining that organization. If you group cards by category,
Starting point is 00:26:14 then you will see, visually, when a card is not in the right place. When cards are organized, all the notches line up nicely. At a glance, you can tell if a pile of cards is organized or not, and you can see which cards are not in the right spot. The question is, why would you want to do this? Why would you want these manually sorted cards? Well, that gets a little complicated to answer. Let's say you're a Hollerith shop. Your office is decked out with the latest tabulators, punch pantographs, and horizontal swords. It's a very expensive setup, but it's the state of the art. The main thing you
Starting point is 00:26:46 could gain by implementing Talmadge's notches all come down to categorization and convenience. Automated sorters can only handle so many categories, and they can only categorize by data. Notches on the card's edge don't necessarily need to duplicate data on the card's face. They can be pure metadata, information that only describes information. Maybe you have a field that you want to sort by, but you never want to actually tabulate. That could be the case for any number of reasons. Card space was limited, so you couldn't cram every possible data point onto a card. You might want to organize cards by, say, collection month, but not want to waste precious whole space on that data point.
Starting point is 00:27:33 This notching system also allows for fast checks of organization. Is your stack of cards sorted? Using Hollerith's system, you can't check. You have to throw them into a sorter and run the full operation again. Each card had to be read one after another, but with notches, well, that's a different story. Metadata was on the card's edge, so you can actually see the information on the edge of stack of cards. You can quickly check for organization. The same schema also supports ad hoc sorts. Maybe you want to very quickly try something out without tying up hardware. Just grab your needles and
Starting point is 00:28:12 get to work. In that way, more exploratory analysis was possible than what you could do using normal punch cards. One final feature is selection. Most of these other features have been improvements to the usual punch card workflow. Pure selection, however, is totally new. I already explained how that operation works. It's the whole two-needle boogie. You could use that to manually sort or create groups of cards, but you could just as easily use it to select a single card. This really depends on what data you're encoding on the card's edge. So let's say you have notches for collection month, tax status, sex, and employment status. You could, with a short series of operations, select all records for working men that are delinquent on their taxes.
Starting point is 00:29:06 You could further look for just those deadbeats that were identified in March. You could do the same with normal punch cards, at least in theory. You would have to set up this order to carry out four different categorizations. It would take a long time, but it was doable. categorizations. It would take a long time, but it was doable. The key difference here is that notching approaches random access. You can actually pluck something from a stack using that card's characteristics. At least up to a point, large datasets get physically problematic. But with punch cards, at least on their own, there's no random access. They're closer to sequential access. You have to actually scan through a pile of cards one at a time.
Starting point is 00:29:55 That is automated by a machine, but that's still slow and very limited. Talmadge would continue to file patents up through the 19-teens. These were for increasingly complex accounting machines, which began to incorporate punch cards as data input and storage media. However, the notch would never return. For Talmadge, this may have just been an idle idea, but there's something fascinating hiding there. With that, I think we have a jumping off point to go further back and look at technology that's contemporary to the earliest punch cards.
Starting point is 00:30:30 Now, we're going to be talking about some cards that, while contemporary, are perhaps unrelated. This episode is, at least vaguely focused on the punch card, but we're going to be talking around the punch card. Allow me to explain that a little bit. Punch cards as a medium don't appear in a vacuum. This is a mistake that I often see when discussing computers. It's easy to put the start of the digital age at the feet of either Hollerith or maybe Babbage, depending on how you view data or math. One thing I've learned time and time again is that the origins of computing aren't where you expect them.
Starting point is 00:31:09 Humanity's problems are very old at this point, and our solutions are equally ancient. In the 1890s, Paul Otley, a Belgian lawyer, started work on a project called the Mundanium. This was planned to be a collection and accounting of all human knowledge. To make this effective, he would have to solve the information problem. Put another way, he needed a way to organize data effectively. He needed some kind of way to select and retrieve information from a limitless amount of data points. His solution was to keep records for all information on note cards, and to organize those cards using a filing system of his own design. Each card had a unique identifier, something like the Dewey Decimal System,
Starting point is 00:31:58 but much, much more sophisticated. Oatley designed the system using the latest technology available to him, and some of the most modern ideas in data management. In this capacity, Outlay was facing the same problem as Hollerith. Both had a mountain of data that had to be dealt with quickly and efficiently. However, the slightly different requirements of each problem led to vastly different solutions. Hollerith focused on a system for accounting and tabulating data, the ability to turn piles of records into meaningful information. It's a transformative process. Oatley focused on selecting and indexing, the ability to find a single data point and
Starting point is 00:32:43 select that. The ability to find a single data point and select that. We can see these two schools of thought, and really these two types of problems, echoing through the history of computing. The earliest machines that come out of the Second World War are all focused on answering concrete data questions. Under what conditions will atomic fusion or fission occur? How do I need to aim my artillery piece given certain conditions? What is the content of this encrypted message? Later systems, more augmentative systems, exist to answer very different questions. Vannevar Bush describes these types of associative systems as being used for research.
Starting point is 00:33:21 Something like a vast automatic library, a la Paul Olet's vision. Doug Engelbart and Ted Nelson expand on this, describing massive systems for organizing and personalizing new information. The modern internet is the direct outgrowth of this school of questions. How do I take an unlimited amount of information and retrieve one idea? Talmadge's cards are interesting because they show how the second path, the path of indexing, can coexist with the data path. But Talmadge was not the first to automate this indexing process. For that, we need to step back from the punch card itself and look at its low-tech cousin. That is, of course, the index card. Outlay wasn't the only one to use and abuse index cards.
Starting point is 00:34:15 The banking, accounting, and insurance industries were all early adopters of this format. Once again, progress shows up where there are boring problems to solve. That said, I think this makes a lot of sense. Accounting is a very data-heavy industry, as is banking and insurance. Outside of research, these would be the places most likely to encounter the information problem. We see the exact same pattern during the digital revolution. problem. We see the exact same pattern during the digital revolution. Machines appear in research and then in these data-rich industries. This is where we find our next key player. Henry Stamford was one of the few to brave the dangerous waters of life insurance. His family immigrated from Ireland to America shortly after his birth in 1847. So once again, we're dealing
Starting point is 00:35:05 with the same rough generation of people here. As a young man, he served for a brief time in the Union Army during the Civil War. Stanford wouldn't see combat. Instead, he worked as a paymaster's clerk for just over three years. In this capacity, he would have spent his days processing payroll for soldiers and handling accounting for the unit he was attached to. It was while working with the army that he would have learned his way around a ledger book. Once discharged, Stamford's experience made it relatively easy for him to jump from payroll
Starting point is 00:35:39 to insurance. He got a job with New York Life, at the time one of the largest insurance firms in the country. Over the years, he worked up the ranks, eventually becoming a supervising accountant in 1893. The information problem comes in all sizes, and Stanford would come into contact with it at a small office in New York. In some ways, keeping an insurance company running is like working on a continuous census. Every insurance policy represents a person, and each policy has certain data points that need to be tracked. Date of birth, smoking vs. non-smoking, even facts like when the policy's payment is due.
Starting point is 00:36:18 Up until a certain point, it's fine to keep all that data in a form like loose-leaf sheets of paper, or scurried away as index cards in drawers. But like we saw with the US Census, there comes a time when plain paper just won't cut it. In 1896, Stanford files a patent for a new storage medium. Now, we have to keep in mind that 1896 is the date of filing. The idea could have been older, or it could have been in use for a number of years before Stanford put it down on paper. Crucially here, we don't know if Stanford's bright idea was inspired or in reaction to
Starting point is 00:36:58 the punch card and the 1890 census. So, what was the bright idea? The patent is simply titled information card. It's a new kind of index card. But actually, it's kind of two kinds of index cards and matching drawers to use them. The trick comes down, once again, to very carefully punched holes. We're still in this weird world of patents. If you aren't familiar, then let me explain a few things. Reading patents kind of sucks. They are super verbose.
Starting point is 00:37:31 They use very structured and almost ceremonial language. And I personally find them very grating to read. This means that their information is 100% correct, but not entirely useful. As such, I'm going to start us off with the simple card and then attempt to discuss the more complex one. The other thing to note about patents is you don't have to prove that the patent works. You just have to prove that there isn't a patent for similar technology. So, just because a patent exists doesn't necessarily mean that that technology is even real. With that aside, let me explain the first card.
Starting point is 00:38:14 In short, it's something like a tabbed index card. You know how you can get those little index cards with the tabs on the top so you can flip to the right spot in a box? These cards look like that. The improvement, the automation feature, is that those tabs each have a punch in their center. Perhaps you can see where this is going. These tabs are used to encode metadata. The example Samford uses is the month a record was generated in, so I'm going to stick to that month tab idea. That means each card has room for up to 12 tabs on the top. If you have a card that represents data from January, you would have one tab in the first position, for instance. The cards are paired up with a special drawer that has notches on its top edge. Those notches line up with the 12 possible tab positions on your set of cards.
Starting point is 00:39:12 The patent even shows little labels by those notches. A selection, then, is as simple as threading a long needle through the proper notch and into the card's tabs. Pull up, get cards, repeat as desired. That already gives us a lot to discuss. If we drop the whole physical constraints for a second, this is a very neat and elegant solution to the information problem. It lets you select and sort data from an unordered set of cards, and it does it pretty easily. It's even more convenient than Talmadge's cards. and it does it pretty easily. It's even more convenient than Talmadge's cards. The other huge benefit here is the card's face itself. Punch cards, however you slice or dice them, are a very data-poor medium. Here I mean poor in a very specific way. You can store numbers
Starting point is 00:39:59 or data encoded as numbers, and you can only store a small amount of data on a punch card. We're talking tens of bytes. An index card, on the other hand, is a very rich data medium, as in, it can have all kinds of information on its face. We're talking numbers, drawings, the sky's the limit. You can even do wild, freeform things like pasting photos or news clippings onto a card. You could have a card that's just a collage or an idea board with little images you like. The trade-off here is that automation aspect. You can't read an index card with a tabulator. The formatting isn't standardized in that sense. It's just freeform information. The formatting isn't standardized in that sense.
Starting point is 00:40:44 It's just freeform information. I propose that we look at this on a sliding scale. On one end, you have fully free expression and fully manual operation. On the other end, you have rigid structured data and fully automatic operation. One side isn't better than the other. Rather, each side has its uses, and we can come up with examples of things that would fall somewhere in between those extremes. But let's bring back in those physical constraints. Because this is really where we see the issue with Stanford's idea.
Starting point is 00:41:18 The first is the tabs themselves. These cards have to be a very custom shape. They need to have extra protrusions on the top. That could lead to some weird production considerations. Maybe you just make blank cards with all possible tabs on their perimeter, then the end user has to cut off the tabs they don't want. That's fine, but it seems kind of wasteful and time-consuming to set up. The other option is to have sticky tabs that adhere to a card, but that could lead to registry issues if the user doesn't place the tab in just the right spot. This is, actually, another little secret about the punch card's success. Registry, or the lining up of holes is a huge issue for all these card formats.
Starting point is 00:42:07 Punch cards are a very simple shape, just a rectangle with a single corner cut at an angle. That registry cut makes it easy to keep cards facing the same way. Punch registry, actually making sure that each punch lines up, has always been handled by some type of machine or jig. Once again, there is this intentional dependence on automation. You have to have very specific and precise machinery, made by Hollerith, to make punch cards actually work. For Stanford's tabbed cards to function, you would also need some solution to these registry issues. You would need something like a jig or a simple machine for making holes and cutting tabs, even if that machinery only existed in a factory somewhere. Not terrible, but something to take
Starting point is 00:42:56 note of. Now that's the first of Stamford's cards. The second card described in the patent is a little more complex. They use internal punches that are each different lengths, as in ovals. The card has this weird line of ovals at the top and or bottom. How does this help? What does this even do? Well, it allows for the same kind of selection operation as the tabs do, but in its own kind of unique way. This is the more confusing option than the tabbed version, but it ends in the same place, so you can tune out from this part if you want to. Okay, here's how I think it works. On one side of the card is a line of holes. We start in one corner with a fully circular hole, then the next hole over is a little longer,
Starting point is 00:43:52 and the next is longer still until you reach the middle of the card. That's where you have the longest hole, a very stretched oval. Then the holes start to shrink once again, and we reach the far corner where we have a normal circle. These are bottom aligned, so the bottom of each hole lines up with its neighbor. This pattern appears on the top and bottom of each card. These cards are paired with a box that has matching oval holes cut along its bottom and notches on the top that line up at the top of the upper ovals. The patent is a
Starting point is 00:44:27 little vague about how this works, and the diagrams don't really help. The trick here is, as far as I can tell, leaving some ovals as circles when you punch up a card. That would suck in practice. It's super finicky to do, but let's just go with it. That's the data encoding, is if you have an oval or a circle. To do a selection, you first insert a so-called pivot pin. This is a needle that would go through one of the corner holes. Those holes act to ensure registry and provide a pivot point. To select a class of cards, you thread a needle into the bottom of an oval and pull up. Any card with an oval in that slot would let the needle rise up with no resistance.
Starting point is 00:45:12 Once again, it's more convoluted and less convenient, but it is a selection operation for unsorted cards. Now, we know precious little about Stanford outside of what I've just said here. Around 1906, there's a lawsuit about tabbed index cards that involve Stanford. He subsequently dies in 1918. That said, what we do have is fascinating on its own. We have a mostly manual medium. It has the flexibility of blank paper, while also sporting a dash of automation. Maybe you can see the shape I'm starting to sketch out. All these cards we've discussed have a mix of automatic and manual features, some mix of flexibility and rigid standardization.
Starting point is 00:46:01 They all solve very similar problems, but they do so using different approaches. Maybe my earlier explanation of a spectrum isn't enough. Maybe we should be thinking of this as a big two-dimensional plot, with one axis for automated vs. manual operation, and another for structured vs. free-formed data. You may have noticed there's still runtime in this episode. That's because there's one more card I want to cover, and it should land on a weird point in that 2D graph. This next part is one of those stories that I just can't resist. Allow me to introduce you to William B. Hargrave. Socialist, adventurer, librettist, real estate speculator, accountant, and inventor extraordinaire. Hargrave is one of those wild people in history that I wish we knew more about.
Starting point is 00:46:58 He fell into roughly the same generation as the rest of the card inventors we've discussed, born sometime around the Civil War. The first concrete record we get is actually a newspaper article about one of Hargrave's operas. In 1897, he writes an opera titled Merry Students. It's put to music and then performed in his hometown of Colfax, Washington. It was, according to this newspaper, a roaring success. Now, already, kind of a weird story. Somehow this gets him into accounting and then he files some patents, right? Well, not yet. First, he takes a trip. The next year, 1898, he leaves for Alaska to prospect for gold.
Starting point is 00:47:45 He was on the tail end of the Klondike gold rush. This is where we actually get a splash of color and personality. So far, the only writings we've had from people this episode are from patents. Those don't really give us much color to the story. While in the Klondike, Hargrave becomes friends, lifelong friends actually, with Jack London, the author. London called Hargrave either Bert or Kid, which made finding this connection a little more difficult. I actually only recently put together this part of the story. After Jack London's death, his wife, Charmaine, wrote his biography.
Starting point is 00:48:26 That book talks about Hargrave in some detail, including direct quotes from letters and recollection. Let me pull in a small quote just to illustrate something here. From the book of Jack London, volume 1, as excerpt from a letter written by Hargrave, quote, one as excerpt from a letter written by Hargrave, quote, There were not many of us that winter in the little mining camp on the Yukon, but the isolated group of cabins housed some lovable and adventurous souls. I will tell you about them because it was about them that Jack London wrote, and because there is hardly one of them whom he has not immortalized in his writings. End quote. First, I think this gives a small feel for Hargrave's way with words.
Starting point is 00:49:14 That kind of makes the whole opera thing make sense. He's hanging out with poets and authors, and he himself is an author. But second, he claims that London wrote about he and his fellow adventurers. It's likely that there's much more here than I know of, and I'm going to need to read at least a few of London's books about the Klondike to try and figure this out. What I do know is that Hargrave's time in Alaska was short. The entire party had been fighting scurvy for most of their expedition. Fresh food was hard to come by, so one after another, folk fell ill. Hargrave was the first to become too sick from Scurvy to continue prospecting.
Starting point is 00:49:57 Before the year was out, he was shipped back to Washington to recover. Upon his return, Hargrave became deputy county auditor. This is the first time we have any record of Hargrave as a clerical worker. And honestly, it seems to come out of left field. But I guess he found a career path. In 1901, he would join up with the Whitman Abstract Company, also in Colfax, as trustee and eventual assistant manager. There, he would have been handling accounting and filing work, day in and day out, while still writing operas. In 1903, he files his first patent for a device that can remove facial wrinkles. And he keeps his day job, so I can only imagine that that doesn't actually go anywhere.
Starting point is 00:50:49 In 1908, he runs for state office under the Socialist Party. I've seen some reports that he became a socialist after meeting Jack London. But Hargrave was actually a registered Republican as late as 1901. I think that and the letters with Charmaine London point to some continued communication between the two. Then, in 1911, he files a patent titled Filing and Indexing Appliance. It's also noted in the Society papers of the time that he took at least one trip in relation to this new filing invention. Although, that could have simply been a trip to a lawyer to actually deal with the paperwork. Once again, the trail is thin, but the context around this is wild. So, what is this patent? Simply put, it's an honest-to-goodness edge-notched card.
Starting point is 00:51:39 We've reached my area of expertise and the connecting point up to hypertext. reached my area of expertise and the connecting point up to hypertext. But, and here's the crucial part, there are caveats to this technology. After World War I, in the 1920s, we start seeing real edge-notched cards, mass-produced cards that can be used to index data. These are the same designs that Doug Engelbart uses decades later when he's developing augmenting human intellect. Their indexing is fully manual. The only tools you need are a hand punch and a needle. They function off the same principles we've been discussing all episode.
Starting point is 00:52:16 Metadata encoded on the side of the card that is then used for selection, sorting, and connections. Hargrave's cards are the first that follow the more modern design. His patent describes cards that are perforated around the entire perimeter. Data is encoded by cutting those perforations into notches. Selection is done by threading a needle into one of those positions and then catching any cards with full holes, while rejecting any cards with notches. With that, you can select, you can sort, you can do all kinds of wild indexing tricks. The interesting caveat is that Hargrave's cards are mechanically automated. At least,
Starting point is 00:53:00 a little bit. Hargrave, like Stanford, pairs his cards with a special cabinet. This newer cabinet, however, is much more complex. The patent describes this box that has a compartment that can slide to the right. On the face of the box are two rows of holes. On the patent, they're labeled as A through Z, but Hargrave assures us you can use these holes to encode any data you want. One row is firmly attached to the stationary side of the box, all the way on the left. The other row is attached on the right to the sliding side. There's also a trick to indexing that makes this contraption work. The left and right sides of the cards have to be cut as an inversion of the other. Put another way, if you notch out a G on the left-hand side,
Starting point is 00:53:55 you have to have a closed hole in the position on the right side. They have to match. Selection works using two needles. One is inserted into the left-hand hole, and the other is inserted into the corresponding hole on the right. Once inserted, you slide out the drawer. The right-hand needle holds firm to any closed holes, while the left-hand side releases any cards notched in that position.
Starting point is 00:54:27 releases any cards notched in that position. That means that, in theory, selection is handled without any fiddly manual operation. You can then perform a second selection. The top and bottom edge of the cards are also perforated. You can actually encode another index on the top. When the cards are slid out, a frilled edge is revealed, making registry a breeze. Hargrave describes using this system to encode the initials of a first name on the left-right side of the card, and the initial of the second name on the top edge of the card. This would actually make selecting a card by name very quick. But don't worry, the bottom edge is also used for encoding. Hargrave was not one to waste space. This part is particularly interesting in context, so check it out. The bottom edge provides registry marks to make sure cards are inserted into the drawer
Starting point is 00:55:21 correctly. It can also be used to encode a separate index, what I often think of as a rejection index. Hargrave describes inserting and leaving rods in the bottom of the drawer that align with a set of notches on the bottom edge of cards. That way, you can only insert cards that have the right data encoded on their bottom edge. You could encode, say, a year on this edge. Then you would set up a number of drawers to only accept cards punched for certain years. That would be useful for something like tax or accounting documents. You could have a cabinet for 1911 records that would only accept cards that were indexed as being from 1911.
Starting point is 00:56:17 There are a number of other patents from roughly this time period, the 1890s to 1910s, for very similar rejection indexes. They work in the same way. You have a filing cabinet with a series of rods in the bottom, and you notch cards to align with those rods. Cards that don't have the right pattern of notches can't be placed in the cabinet. These patents are also filed by accountants and insurance agents. Once again, there's this fascinating and rich history of data management that exists totally outside of any digital lineage. Hargrave's system represents this middle ground between automated and manual operations. You're still heavily dependent on this machine, but the machine is, well, it's almost too simple to call a machine. It's a very, very simplistic device that aids in automation. It also lands at this weird middle ground between free expression and strict standard.
Starting point is 00:57:07 The face of these cards is simply blank paper. Hargrave even says in the patent that the cards can be anything, as long as they're punched up correctly. He claims the system could even work with folders, which, I mean, fair, I think it could, I've just never seen edge-notched folders. But maybe they're hiding somewhere waiting to be discovered. The data stored in these cards is totally freeform, completely generalized. The edges, the metadata, have one hard and fast rule. The left and right hand indexes have to be inversions of one another. If you don't match a
Starting point is 00:57:46 slot with a hole, then the entire system is ruined. You can actually jam the cabinet shut if you don't follow the rules. What we get is a mix of features that I think are pretty congruent when taken together. That's all I've been able to dig up in relation to Hargrave's cards, but there is a wild coda to this story. Once again, Hargrave is just a fascinating figure that I wish we knew more about. Here's a clipping from a 1929 article in the Pittsburgh Sun-Telegraph. Gustave Davidson, poet, author, and explorer, today came back from the South Sea Isles with the tale of how his two scientific conferers, W.B. Hargrave of Colfax, Washington, and P.E. Haskovich, former resident of Paris, France, disappeared.
Starting point is 00:58:36 They left the island of Rive-A-Veille in a native catamaran boat last April, bound for Toubouaille and Papeete, said Davidson. End quote. In 1929, Hargrave left for French Polynesia. He and two companions set up camp on a remote island, made contact with the locals, and started surveying the flora and fauna. According to Davidson, the only surviving member of that expedition, Hargrave and Haskovich set out on a very makeshift boat constructed from two canoes and a small hut. They were subsequently never seen again. Most of the articles that covered the disappearance claimed the duo were eaten by sharks,
Starting point is 00:59:27 but there was never any evidence of what happened to the two. Alright, that does it for this episode. But where do we stand in the larger story? We've seen three distinct and wacky card filing systems. Talmadge's cards show us how folk were trying to extend and work past the limitations of the punch card. A simple set of indexing notches allowed for quick and easy selection and sorting. Looking at Talmadge's patent alone, however, could lead to some faulty assumptions. It's simple to look at the progression of computers, identify related technologies, and assume a similar progression.
Starting point is 01:00:10 Computers went from glorified calculators, number crunchers really, to data management systems. It's a total change of character. Number crunching machines can answer totally different questions than data management machines can answer. They use different tools and different techniques, but at the end of the day, both types of machines offer solutions to the information problem. Computers made that jump over the latter half of the century, moving from systems like ENIAC and large Fortran machines into hypertext and document systems, and eventually the internet. into hypertext and document systems, and eventually the internet. So it would be pretty slick to see a similar transition in pre-digital technology. Talmadge's cards seem to point towards that hypothesis, right? But there are counterexamples to that. Oatley's Mundanium was a large-scale
Starting point is 01:01:00 data management system that was conceived of prior to the adoption of punch cards, prior to the number-crunching medium. Stanford's card system represents a contemporary of the punch card, but it takes a fully free-form approach. It's a data management system, not a data-crunching system. Hargrave's cards present a similar picture. They are freeform, with some automation and some strict enforcement of data. But crucially, these were developed well after the punch card. Hargrave's cards don't attempt to improve or update the punch card. They don't even use any of the technology developed by Hollerith. punch card. They don't even use any of the technology developed by Hollerith. I think it's especially telling that these card systems were, in large part, developed by accountants
Starting point is 01:01:50 and clerical workers. These were the exact people punch cards were developed for, yet some reached for a different technology altogether. So here's the payoff. Here's what's been bugging me, and the conclusion that I've been starting to reach. I think it's pretty clear that these indexing systems existed separately from punch cards. They're related technologies, for sure, but only in the most tenuous sense. They all solve the information problem, and they use paper, but the connection stops around there. Call them cousins, perhaps. Automated data indexing, a la the edge-notched card, is its own tradition. Digital computers first develop out of data-crunching traditions, of machines like punch-card tabulators or analog computers. As machines become more complex, the other lineage, the
Starting point is 01:02:47 lineage of indexing and real data management, gets folded in. I like this idea because it solves a few issues I've had. As we may think is cited as this revolutionary text in the history of computing, but that article doesn't even talk about numbers or math or computation at all. It talks about automating a library. It goes on to inspire a younger generation to create hypertext. It's not that As We May Think shows a next stage in the evolution of computers, but rather it's a different path, One that eventually converges with the digital computer. Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another piece of computing's past. And hey, if you like the show, there are a few ways you
Starting point is 01:03:36 can support it. If you know anyone else who'd be interested in the history of computing, please take a minute to share the show with them. You can also rate and review the show on Apple Podcasts and Spotify. If you want to support the show directly, you can sign up on Patreon Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.