Advent of Computing - Episode 107 - The Mundaneum, Part I

Episode Date: May 7, 2023

The Internet is the closest we've come to a universal store of all human knowledge. However, it's not the first pass at this lofty goal. In this episode(and the next) we are looking at the Mundaneum,... a project started in the 1890s to address the information problem. How is it connected to the larger story of hypertext? And how can this older project inform our views on the information problem?   Selected sources:   https://www.ideals.illinois.edu/items/4184 -- Selected Essays of Paul Otlet

Transcript
Discussion (0)
Starting point is 00:00:00 How would you describe the internet in the most basic terms? You have to drop all the technical mumbo-jumbo, since, well, that's not really very basic. Drop all the corporate garbage that's clogging up bandwidth, too, since, well, that defies description. How would you describe just the basic internet itself? What's the network striving for? What's the point? I think we'll all have different answers depending on how we actually use the net. It's become a very personal place. Maybe you use it to connect with friends. Maybe it's a source of entertainment. Maybe it's something you use for business. Personally, I see it as a tool. I'd describe the internet as the sum total of all human knowledge, indexed and organized
Starting point is 00:00:48 for easy access. Call me a bit of a dreamer if you want. All hypertext systems have been developed with this dream at least somewhat in mind. I think Ted Nelson's work is probably the best example to pull here. His rendition of hypertext is this thing called Project Xanadu. It's a giant evolving vision for a world-spanning network that can be used to organize all human knowledge. Nelson's step towards Xanadu haven't always been so world-spanning, though. One of these steps, a system called HES, was only ever used for editing
Starting point is 00:01:26 documents. Well, a bit more small-scale, HES did have all the technical features and tools needed for large-scale data organization. The internet follows very much in these older footsteps. It doesn't have every scrap of human knowledge. I know that first hand. I can do most of my research for the show using the web, but sometimes I have to look elsewhere. That said, the internet has all the tooling and technology for this greater task. It just needs some time and we'll get there, I'm sure. I want to note something here. I've described the internet's intent separate from its technology. The sum total of all human knowledge and all that jazz. There's nothing there about networking. There's nothing there about computers. The
Starting point is 00:02:19 internet isn't even the first pass at this lofty goal. Take Nelson as a prime example. But he wasn't even the first to hit upon the idea. We can go further back. This has been a goal humans have been chasing for centuries. Maybe even longer than that. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 107, The Mondanium, part 1. We're finally getting back into my main area of research. That is, weird pre-internet hypertext things. Today we're going to be looking into something I know almost nothing about, and something I'd wager most listeners also know nothing about.
Starting point is 00:03:12 One of the big realizations that I've hit upon is that hypertext isn't anything new. Humans have always had problems managing information. The digital age, well, that's just supercharged those problems. More data, more trouble, right? Sequential storage has been a particular bugaboo for generations of researchers. The human mind just doesn't work that way. We're not sequential machines. So there's this mismatch between trying to form cogent thoughts while also dealing with this sequentially ordered data. We want to wander, flip from idea to idea, and form connections in our heads.
Starting point is 00:03:54 Modern hypertext systems have gone a long way towards providing a more humanistic way to handle data. That said, the issue was identified long before Ted Nelson coined the term hypertext. Many will point to Vannevar Bush as the originator of hypertext. In his 1945 article, As We May Think, Bush describes the existing problem around data management, then proposes that structuring information as trails of associated data may be a workable solution. structuring information as trails of associated data may be a workable solution.
Starting point is 00:04:27 While Bush's work was groundbreaking, he wasn't the first person to discuss this type of association-based storage. He wasn't the first to describe hypertext, or even describe the information problem. There were earlier researchers that reached very similar conclusions. One among them was Paul Oatley, the mind behind the Mundanium. Oatley becomes active in the field of data management near the end of the 19th century. He believed that sequential data storage was unnatural and that information should be stored in small chunks. Those chunks of ideas needed to be organized and cross-referenced with each other to form a useful storage and retrieval system.
Starting point is 00:05:08 He was writing well before the creation of computers, the advent of computing, so to speak, and well before Vannevar Bush ever dreamed of a link. Not only did Outlet write about his grand plans, he also set them into action. Oatley write about his grand plans, he also set them into action. Oatley and his co-conspirator, Henry La Fontaine, started building a grand library in 1895. This library, called the Mondanium, was planned to store all of human knowledge on little slips of cardstock. It would be accessible anywhere in the world via wire. You could simply call up the Mondanium, place a query, and have it answered almost instantly. This sounds something a lot like the internet, and also sounds surprisingly similar to Ted Nelson's Xanadu. And thus, we've reached the end of my practical knowledge of the Mondanium. I think you can see why I'd be so interested in expanding my horizons a little here.
Starting point is 00:06:06 Oatley's work sounds like it fits snugly into the larger hypertext canon. We have links, data retrieval, and remote access. We even have little slips of cardstock, my storage medium of choice. Going into this, I have a lot of questions to answer. The main one is this. How does Oatley's work actually fit into the larger story of hypertext? What are the links here, and where do they lead? Beyond that, I want to figure out how close the Mendanium was to a full implementation. Did Oatley conceive of links in a similar way to the Mimics or NLS or Xanadu?
Starting point is 00:06:47 Or am I just barking up the wrong tree? Plus, I have one extra inquiry to make. The Medanium started being planned in the 1890s. That's the same era in which the punch card and edge-notched card are developed. Oatley and LaFontaine are using cardstock, so what do they have to say about the medium's contemporary uses? But, of course, this is just part one. This part will mainly be background to the Mundanium itself. We're going to be dealing with the run-up and the development of the organization methods used in the Mundanium.
Starting point is 00:07:25 The big problem here is that we're in a pre-digital era. We don't really get many touchstones, so I'm going to have to do some set-dressing from scratch, so to speak. Next episode, we'll be talking about the Mundanium proper and its own torrid history. Now, before we get started, I have my usual plug to make for Notes on Computer History. This is that semi-academic publication that I've been trying to put together. As has been standing, we're really close, as the editor-in-chief,
Starting point is 00:08:05 I want to get one or two more articles before I go to publish with the first issue. Now, if you want to contribute, if you've ever wanted to write about the history of computing, then please get in touch. You can find information about how to submit at history.computer. Now, with that said, let's get into the show itself. First of all, I want to be upfront about a problem with studying Oatley. At least, it's a problem for my anglophone self. Oatley was born and lived most of his life in Belgium. Most of his works were written in French. Now, this gets a little more complicated. There are two main languages used in Belgium, French and Dutch.
Starting point is 00:08:45 As a result, the majority of information about Otlay is written in one of those two languages. Translations do exist, and that's kind of the only reason I can cover Otlay at all, but I'm sure a certain fire and subtlety is lost in the move over to English. There's also the fact that I'm working out of a lot of texts with titles like Selected Works or Selected Essays, so if I miss something, it's not for lack of trying. Anyway, Paul Olet was born in Brussels in 1868. That should tell you a thing or two about the period we're dealing with. He earned a law degree in 1890, so he's technically a lawyer, but he didn't stay in practice long. His true love
Starting point is 00:09:34 was bibliography. That may sound a little weird to the modern ear. A bibliography is just a listing of sources, right? It's those things that you had to put at the end of grade school reports. Well, the word actually means more than that. Bibliography is its own entire field of research. It's primarily concerned with how information is handled, but this is in the archaic sense. We aren't talking about data scientists churning through databases. Rather, we should imagine something like a turbo-powered archivist. A bibliographer studies sources and how they're organized, preserved, and accessed. In the 19th century,
Starting point is 00:10:18 that meant, by and large, the study of books, hence, biblio. It's at this point that I should tell you, Oatley actually won me over pretty quickly. I did say there might be some fire lost in the translations, but a lot's still there. I've mentioned it on the show before, but I have mild dyslexia. I can't really spell worth a darn, I read at a glacial pace, and I tend to get lost in texts pretty easily. There are two things that aggravate this for me. Stress and books. I think the stress part is pretty easy to understand.
Starting point is 00:10:58 When I'm under pressure or trying to rush through text, it gets all garbled up. Because of that, I was never really able to cram for tests, for instance. Somehow I still passed college, but I think that might be more luck than anything. Books, however, those introduce a special fun factor to my life. I can usually get through technical stuff fine because pages tend to be broken up into sections. A table, diagram, picture, or inset creates this easy-to-see division in the text. For me, it doesn't matter if the division is meaningful. The break in the page is enough to make the text easier to read. The reason for this is, well, I can get lost in a page of text kind of easily. I'll reach the
Starting point is 00:11:47 end of a line, then think I'm going to the next only to realize, maybe minutes later, that I'm in the wrong place. I usually don't catch this right away because, well, sometimes sentences don't come together in my head when I'm reading them, so I just think, oh, that's a normal garbled sentence, that's just me having a problem reading, but then I realize think, oh, that's a normal garbled sentence. That's just me having a problem reading. But then I realize that I've actually missed something pretty big or I've reread a whole paragraph. So what do my issues reading have to do with Paul Olet, you may ask? Well, you see, we agree on something very fundamental. Books suck. His argument, however, is slightly different from mine. It's also a little more well-informed.
Starting point is 00:12:29 The way Otle saw things, books were pretty unnatural. They're a poor way to store information. A book, and here we are talking non-fiction, could contain many separate pieces of information. Hundreds or thousands of facts and ideas may be spread over its pages. How do you access this trove of knowledge? Well, you just kind of have to read the whole thing. Outlet took his first pass at explaining his issue with books in 1892 in an essay titled, kind of underwhelmingly,
Starting point is 00:13:04 Something About Bibliography. That is the entire title. It sounds more like an offbeat indie rom-com than a takedown of contemporary information management systems. This is also where I started to see something of a pattern. I've been reading Oatley's work out of a book called International Organization and Dissemination of Knowledge, Selected Essays of Paul Oatley. It's a pile of translated essays that were compiled in 1990. There are ways to get digital copies which I'd recommend tracking down because, as near as I can tell, this is the easiest way to read up on Oatley. because, as near as I can tell, this is the easiest way to read up on Outlé. And really,
Starting point is 00:13:53 I'd recommend doing so. Even as early as 1892, Outlé is presenting some ideas that sound uncannily similar to later trains of thought. The general argument of something about bibliography is that there's an issue in the social sciences. That problem is that there's simply too much information out there and no established way to deal with all of it. Oatley makes this argument by way of comparison. At the time, there were some disciplines that were working on this research problem. He points in particular to other sciences like physics, chemistry, and biology. Researchers in those fields just had a system down. To quote each of their discoveries, each new contribution to the advancement of their
Starting point is 00:14:45 science, seems to be recorded immediately and to become for everyone the point of departure for future research. Thus, does an admirable coherence of work exist among chemists, physicists, and biologists. The latest arrival among this army of investigators can immediately find a useful job to do without having to tarry long with already completed work. End quote. Now, this is a pretty broad-sweeping generalization. Alt-Lei will back this up later, so let's just go with it.
Starting point is 00:15:19 He's claiming that in these hard sciences, there are certain ways that researchers manage past data. Why does this matter? Well, here's where we hit the uncanny part. Outlay is talking about the information problem. Researchers have to stand on the shoulders of giants. That's the only way we can move forward to new discoveries and new works. The fact that we've developed ways to preserve
Starting point is 00:15:45 what we figure out, well, that really puts us above every other species on the planet. Communication, collaboration, and the ability to pass down knowledge have proven to be hugely advantageous for humans. But what if there's too much knowledge to deal with? That could lead to duplication of work. In the classic formulation of the information problem, the sheer glut of information causes this breakdown of good research. There's just too many facts and figures for any one person to remember, and the crucial information is buried in way too many places for the intrepid researcher to ever hope to scour.
Starting point is 00:16:26 Thus, they end up reinventing the wheel. That's a net zero, since somewhere, we already have the instructions to make a wheel, it's just that no one can find them right now. And that sucks. Now, that's the formulation that we see discussed by Vannevar Bush in the late 1940s. But O'Leary has a subtly different argument. Massive information is one problem, as is access to that information. If you flat out can't access data, then that has the same effect as having too much data to search through.
Starting point is 00:17:05 effect as having too much data to search through. If you aren't allowed to read the paper that describes the wheel, then you may end up inadvertently reinventing it. We've actually seen this on the show before. Independent invention is one of my favorite topics to cover because, frankly, it's kind of fun. It's also kind of funny. The computer mouse is a really good example here. It was actually independently invented at least twice. The story we all know is about Doug Engelbart and Bill English. In 1963, Engelbart designed the mouse to be used with his new graphical computer system, the online system, or NLS. English actually built the thing. This is the same style of mouse we use today, with some minor modifications. Technically speaking, though, this wasn't the first mouse. In 1946, a very similar device, a big trackball, was created by Ralph Benjamin, who, at the time,
Starting point is 00:18:03 was working for the British Royal Navy. This trackball worked off the same principle as Engelbart and English's mouse. It tracked X and Y movements independently and then sent that data off to a computer to move a pointer. However, Benjamin's trackball was used in fire control systems, as in artillery fire. That's a very sensitive application, so the trackball remained classified for decades. If Engelbart knew about this earlier work, he wouldn't have had to create a new mouse from scratch. This isn't a case of the expected information problem. This isn't a case of the expected information problem. Rather, Engelbart straight up couldn't have known about the earlier trackball. Access itself was restricted. In this case, it was a very active type of restriction, but this can happen for more passive reasons as well. Language
Starting point is 00:19:01 barriers are one example that I run into quite often in my research. Take this very episode. The fact that Oatley didn't write in English means that I have to just hope the important papers have been translated. I couldn't do much original research here because, well, I don't speak French. I can't really look at his French notes and figure things out for myself. In Oatley's analysis, data has to flow freely. It needs to be freely and easily accessible. That's actually quite a bit different than Bush's viewpoint.
Starting point is 00:19:36 And as we may think, Bush doesn't talk about anything like democratization of information. Instead, he presents solutions for so-called information workers, for people working in research only. There's something a good deal more open to Otle's outlook. Okay, so I promised some hate on books, and I've kind of hit this tangent, so let me circle back a little bit. Olay also argues that books suck because they're slow and difficult to change. His example, which I think is very fitting for this period, is the Venerable Encyclopedia, at this time spelled with a little EA symbol. This is the quintessential collection of facts and figures.
Starting point is 00:20:24 This is the quintessential collection of facts and figures. Everything is usually organized in some hopefully reasonable fashion, and they're meant to be searched for single chunks of data. On the surface, that sounds good, right? It's a book, but it's built for pulling out single ideas. Well, here's where you run into problems What happens if the data stored in that encyclopedia is out of date? Or horror of horrors? What if all the mini-facts stored within is simply wrong?
Starting point is 00:20:57 I know, I'm clutching my pearls as we speak One solution for the encyclopedist is to release regular editions. In theory, that's a really nice solution, but the reality of evil books makes this a nightmare. For one, that's pretty cost prohibitive. Encyclopedias are usually big, multi-tone affairs. A publisher needs a lot of money up front to print and bind thousands upon thousands of pages. The consumer then needs to buy a whole new encyclopedia. That's not even including the countless hours needed to edit and typeset the tomes. You don't want to be releasing new editions just for minor changes. Another solution is to run supplements.
Starting point is 00:21:47 These are editions or corrections to the encyclopedia that usually stand somewhat on their own. I gotta be a little cagey here since there are some weird exceptions. I think at one point World Booker someone sent out stickers to correct their encyclopedias in place. The more common practice, though, was to ship out extra volumes between totally new editions. For instance, in 1801, Encyclopedia Britannica produced a two-volume supplement to the third edition of their encyclopedia. It was, straight up, two more volumes to put alongside your encyclopedia.
Starting point is 00:22:26 It contained new articles which the third edition lacked. In a fun twist, there was also a second improved edition to the supplement of the third edition of the encyclopedia. That's a little bit ridiculous to say and even really think about. Now, these supplements are cool in theory, but they introduce another problem. You now have extra volumes that fall outside the encyclopedia's organization scheme. The supplemental articles exist outside the normal indexing of the other volumes. So, you have this just dumb situation where you have to search the normal volumes separately
Starting point is 00:23:06 from the supplements. You're breaking the schema and making data harder to access. Outlay had a proposed fix for the issue. And, once again, this comes with some familiar aspects and some new ones. Over time, hypertext-like systems tend to adopt the cool technology of the era. Bush's Mimex made use of fancy microfilm technology. Engelbart's NLS adapted remote terminals and timesharing. Oatley adopted the note card. His solution to the information problem? Well, it was all about little cards. To quote,
Starting point is 00:23:46 For written works, a rearrangement of their contents, not along the lines of the special plan for a particular book, but according to the genus and species appropriate to each element, does not make for any loss of substance. This systemic recording of facts, statistical data, substance. This systemic recording of facts, statistical data, and interpretations of them in the final analysis will be work undertaken by only a few individuals, the creation of a kind of artificial brain by means of cards containing actual information or simply notes of references. End quote. Here's the root of Oatley's solution. It doesn't actually matter what order you store data in. In fact, it doesn't even matter if you store all your data in the same place.
Starting point is 00:24:39 You can totally just rip a book apart and cast its contents to the seven winds, and it'll still stay useful, at least as long as you know where all of its pages have gone. The basic idea here is to extract each useful fact and copy it down to a note card. The card then gets a reference to keep track of where the fact came from. Then you file the card away by its contents, not by its origins. In this way, an encyclopedia might be reduced to cards of hundreds or thousands of categories, even though each card was derived from the same source. On its own, this is an interesting concept. I think it gets all the more interesting if we zoom out to the larger context. In the 1890s, and really in any age prior to the widespread adoption of computers,
Starting point is 00:25:27 libraries were organized using card catalogs. This is the status quo that Autelea was working off of. A card catalog is something of a capricious thing. They're these giant chests of drawers that each contain countless little note cards. Each card has information about some book in the library. It's just a small chunk of metadata. Data about your data, so to speak. Here's another hint at the antiquity we're dealing with.
Starting point is 00:25:59 At least, relatively speaking. When Otle is writing something about bibliography, the 3x5 card is new. It was standardized in 1876 for use in libraries specifically. So, while it seems really basic and obvious today, the 3x5 note card was a slick new piece of technology. note card was a slick new piece of technology. As they were mass-produced and the proper shelves were created, the card catalog became a mainstay of libraries the world over. It's this very architecture of these 5-inch wide cards and 5-inch wide drawers that Atle would soon be able to leverage. What was stored on these traditional cards? Well, not very much. Systems varied from library to library.
Starting point is 00:26:52 There wasn't a well-established standard quite yet. But we do have some common denominators to work with. The core pieces of information were the book's title, author, and where it was located in the library's stacks. That last part could be, well, it could be just about anything, since, once again, there was a fundamental lack of standardization. It might be a shelf number. It might be some pirate map explanation of how to reach the text. Sometimes catalogs included categorical or keyword information, but that was also a little unstandardized. You might literally just pull out a card that said
Starting point is 00:27:34 Sean Haas, book about computers. It's on aisle five. It's above the microwave. It's over by the sconce in the wall. Not the most useful thing in the world. Now, the final trick here was how you organized your card catalog. This seems to be the part that really riled Oatley up. One method was to organize everything by author name or by title, you know, alphabetical style. you know, alphabetical style. That, frankly, isn't very useful. It's good for keeping inventory. At a glance, you can see how many books written by Sean Haas are kept in your special vaults. But when it comes to research, well, this type of organization is kind of useless. You can't easily answer a question using this type of catalog. You run into this dumb circular issue where, to answer a question, you need to first know the author who
Starting point is 00:28:32 can answer that question, or the title of the book that holds the right information. That precludes coming in cold. One solution is to organize by subject matter. But then, well, you get into the tricky subject of how to structure categories. You have to think about, well, something about bibliography. There were isolated and, once again, non-standardized ways to build up topic-based catalogs. The standard solution, or the one we're used to today, comes about in 1876, the Dewey Decimal System. Now, this is a surprisingly fraught topic in and of itself. I don't want to get too deep into the weeds here, since, well, I want to focus us in on the Dewey Decimal System as a first stab at classifying
Starting point is 00:29:27 information. Suffice to say, there are some concerns about Melville Dewey's character that are reflected in the system itself. But there are also just some issues with the system in general. It's a bad system. If you want some more background on Melville, then Behind the Bastards did a pretty good episode on his life and times, and that links out to a lot of resources about him. I'm not going to give an exhaustive list, since, once again, it's a little outside of the scope of today's discussion. The Dewey Decimal System, or Dewey Decimal Classification, sometimes just put as DDC, it was intended as a way to streamline
Starting point is 00:30:06 the organization of books. At the time, it was customary to just keep everything in a pretty rigid order. You'd describe a book's location by its aisle, shelf, and location within the shelf. That's the good way to do it. To cast this into digital terms, think of a memory address. In old school libraries, you'd look up a book by its physical location. Aisle 5, top shelf on the left, fifth book from the end. It works, but if shelves are moved around or books are swapped, you have to redo your whole catalog. Dewey's proposed system replaces this physical indexing with a relative index. Instead of identifying a book by its physical location on a shelf, the DDC identifies the text by its topic. This is done using a hierarchical index of categories.
Starting point is 00:31:02 Initially, this is nine broad classes that span from religion to language to science to history. Each class is then subdivided into subclasses, which are further subdivided a third time. You end up with a decimal number that can describe any book. In practice, this looks like a three-digit number followed by a decimal place and some more numbers. The numbers after the decimal are used to further describe the book, but they're little things. You take these numbers from a table of keywords, and they're pretty general terms. For example, 193 is the classification for German philosophy. Adding.9 to it makes it a geographic text. So 193.9 might be a book of maps discussing German philosophy. There are some huge advantages to this system.
Starting point is 00:31:59 I mean, there's a reason why the Dewey Decimal System is still used in many libraries today. By structuring your catalog around these categories, you can freely shuffle around shelves and books. As long as your shelves still have a class label, then your catalog is valid. The 193.9 books will always be in a clearly labeled location, even if shelves are moved around. The Dewey Decimal System can also be used for blind research. Since everything is organized by category, you don't need to know author names in order to find books anymore. If you want some maps on the origins of different philosophies in Germany, just look up class 1, subclass 9, subclass 3, add the dot 9, and you're good to go. Instead of scouring hundreds of books, you need to look up the right decimal in the table and then consult the card catalog. There are some downsides to this system, however. The biggest one
Starting point is 00:32:53 is a profound lack of flexibility. You see, the DDC has fixed categories. Melville describes the initial set of categories in his 1976 pamphlet on the organizational system. That set of classes were derived from his work at the library at Amherst College. That alone represents some issues, the main one being a limit in original scope. These classes were kind of just picked out by Dewey himself. He did go view other libraries to work up his system, but ultimately Amherst was the only library that he reorganized around this new classification. That said, there's a more pernicious issue here. It's the whole base 10 thing. There are only 10 of the largest classes, what Dewey called libraries. Under those are 10 classes, under which are 10 other classes. You get, at most,
Starting point is 00:33:54 a thousand possible classifications. Dewey chose how those original classes were allocated. This is where some personal bias mixes in with the inherent limitations of the system. Let's take 400s, for instance. That's the library of philology, the study of language. You have 10 classes in the 400s. The first two are general, consisting of books on philology itself and comparative philology. The next classes, the 410s to the 480s, are devoted to specific languages. Dewey gives us space for English, German, French, Italian, Spanish, Latin, and Greek. Under each of those classes are 10 more specific categories, plus all the categories you can place after the decimal. So if you're looking for a book about the history of Greek dialects in
Starting point is 00:34:53 certain regions, then the Dewey Decimal System has you covered. It works really well for that specific type of query. Now, you might notice there is something missing here. There are over 7,000 languages spoken around the world today. You can pare that down a little bit if you want to have more strict criteria, but you're still dealing with a lot of different languages. Dewey provides classes for seven. Languages outside of those seven fall into the 490s, other languages. That gives us, and this is directly from Dewey's 1876 pamphlet, Chinese, Egyptian, Semitic, Indian, Iranian, Celtic, Slavic, Scandinavian, and other. That's still missing a lot of stuff. One fun implication here is that most languages actually fall under philology other languages
Starting point is 00:35:57 other. That includes languages like Japanese, which is the first language for 123 million people, or Portuguese with 236 million. Another implication is that texts on other languages can't be categorized with the same nuance as texts on English or Latin. The Dewey Decimal System just works better for books about the English language than it does for books about Japanese. So, we have to ask the question, why are the 400s structured like this? Part of this issue comes down to the limitations in the system itself. Due to how the system is structured, as these hierarchical groups of 10, there is a limited amount of space in each category. The system has flaws deep in its initial design. While it provides a framework for categorizing information, that framework is rigid and very limited.
Starting point is 00:36:59 The categories are also predetermined. That rigidity introduces a little snag. The system is something of a trap for bias. You have to be careful when picking your categories to make sure you're covering, well, everything. Dewey didn't really take that sort of long view. Instead, it seems to me that he just picked categories based off what he was interested in, and what he saw in libraries in American colleges in the 1870s. As such, we get a focus on very Anglo-American texts. The same problem I outlined in the 400s also shows up in other regions of the system.
Starting point is 00:37:41 Another especially notable case is the 200s, theology. That entire library focuses on Christianity, except for the 290s, which are devoted to, quote, non-Christian religions. Hinduism, with over 1 billion adherents, falls into the gap of 29, theology, non-Christian religions, other. So, let's hope you don't want to do any research on that particular belief system. By contrast, the two tens is used entirely for different types of Bibles. There's also this weird issue with fiction that I haven't seen discussed in a lot of other places. The DDC only really works for non-fiction works. There is a section for literature, but that's kind of meant as a place for classical types of work. It's organized by language and broad genre, but it doesn't work super well for contemporary fiction. Sci-fi, for instance, would all fit
Starting point is 00:38:47 in something like 8 to 9, I think. At least, that's the category for other English language literature. My local library uses the Dewey Decimal System, but only for nonfiction. They just shelf fiction books by author name. I think that's the case for a lot of libraries. The way I see it, the Dewey Decimal System represents a transitionary method for organizing information. It's very much a first pass at improving libraries, which later bibliographers can improve upon. But that didn't really happen, at least not in the Anglosphere. The Dewey Decimal
Starting point is 00:39:28 System is widely adopted and just kind of becomes the default, especially in the States. That's a whole other conversation that, once again, is better addressed elsewhere. How does this all link back to Outlay? Dewey and Otley were, in a manner of speaking, addressing a similar problem. They were both concerned with the organization of knowledge. The connection between the two stories comes in 1895. That year, Paul Otley and Henry LaFontaine devised the Universal Bibliographic Repertory. I haven't mentioned LaFontaine yet, so this is probably as good a time as any to add the next character in this play. LaFontaine was another lawyer and resident of Brussels, and this is where I hit a little
Starting point is 00:40:18 wrinkle in this part of the story. You see, La Fontaine is best remembered for his peace activism. He was a champion for pacifism in the lead-up to the First World War and only became a more ardent defender of peace after the Treaty of Versailles. His efforts won him the Nobel Peace Prize in 1913. As a result, most of his bibliographic work is seen as something of a side thing that he did with Otlet. Otlet and La Fontaine met in 1891 and became very good friends. In the coming years, the two would collaborate both on bibliography and activism. As near as I can tell, Otlet wrote mainly about bibliography, while La Fontaine published more about pacifism. This might be a little colored by the language barrier and the translations I have access to.
Starting point is 00:41:11 I bring this up as a bit of a disclaimer. It's hard to tell what ideas came from Otley alone, from La Fontaine, or from their collaboration. A lot of publications by Olet are either co-authored by La Fontaine in this period or heavily cite works done by La Fontaine that aren't always translated. In 1892, the two were hired by the Society of Social and Political Sciences to handle of social and political sciences to handle some kind of various bibliographic tasks. It seems like their main job was to find and catalog sources that the society would be interested in. This is where the story all comes together. Over a number of years, Paul and Henry deal with a huge number of sources, and they encounter the information problem in a whole new way. During the early days of this task, Outlay writes something about bibliography.
Starting point is 00:42:12 As the years drag on, the duo moves from the general complaints and ideas put forth in his paper to something more concrete. The bridge here is the Dewey Decimal System. During this period, Outlet and LaFontaine were actively looking for better ways to organize sources. They ran across a pamphlet on the Dewey Decimal System that was written in English. Outlet liked what he saw and thought that, with a few tweaks, it could be pretty slick. He also figured that the system would see widespread adoption in Europe if it wasn't for the pesky language barrier. So, Outlay wrote to Dewey asking permission to translate and adapt the decimal system. The response? Sure, just as long as you don't publish derivative works in English. This is the backdrop that leads to Outlay and LaFontaine's next big
Starting point is 00:43:07 leap. It's important to stop here and really address something crucial. The Dewey Decimal system would only serve as inspiration to the duo. From this point forward, Paul and Henry developed their own system, but they didn't base any of the technical details on Melville's earlier work. They took the idea of organization by numerical categories and ran with it. The system developed was initially called the UBR. It's also important to note the initials are a little funky here. The system is shortened to RBU, which are the initials for the French version of the name. So RBU, when expanded and translated to English,
Starting point is 00:43:53 is the Universal Bibliographic Repertory. I just want to make that clear so I can use the initials without confusion. While parts of the RBU were inspired by Melville, it's actually a very different system. It was intended for a very different purpose. The Dewey Decimal System exists in a restrictive context. It's meant as a way to organize libraries and their catalogs. In that sense, it's only ever meant to be used for a specific collection of books. Really, it was only meant for old-school American libraries. The RBU, by contrast, was designed for organizing the sum total of human knowledge. This, of course, has some pretty major implications. Alt-Lei laid out these details in an 1895
Starting point is 00:44:47 publication. It's clear from the beginning that the system has a far larger scope than anything all Melville was working on. First off, and I can't stress this enough, Alt-Lei meant the RBU to be a total catalog of all human knowledge. That means that it doesn't just organize a single library, but all libraries. It stores all information. A card in the system might lead you to a shelf nearby, or it might lead you to a collection on another continent ran by people that speak a language you don't even know. The RBU was also planned to be a non-centralized tool. Oatley writes about the RBU existing in multiple locations. Each location would have a complete repertory that would be updated on a regular basis. These catalogs would
Starting point is 00:45:41 be filed in duplicate, with one copy organized by author and another by category. Finally, and this is a pretty cool idea, the RBU itself would be a source of data. Outlay doesn't elaborate this on a lot, not yet, but he mentions that the RBU could be used as a quote, basis for intellectual statistics. By browsing the catalog, you could work up a report on, say, the distribution of types of texts by language. This is cool because, oh, it gives us a way to do meta-analysis at a grand scale. That's all pretty different from the boring, closed-minded Dewey Decimal System. That said, there are similarities. The RBU was also designed with
Starting point is 00:46:34 the concept of libraries. It has its own broad topic categories. But there is a twist from Otle. But there is a twist from Otle. By means of division of work, a new body which is distinct from all the others will henceforth be especially entrusted with the classification of written documents. The classification must be developed by specialists and not any more by those of whom universal knowledge is demanded. Under the Dewey Decimal System, categories are determined centrally. Originally, this was by Melville himself. Altlay envisioned a separation between libraries. The idea was that categorization would be handled by experts that knew the material. Altlay actually talked about this at length all the way back in something about bibliography. This was a detail I kind of glossed over. In explaining how information should be categorized, he came up with this interesting idea.
Starting point is 00:47:34 Make authors declare where their text fits in a library. The idea was that the originators of information would say how it should be organized. was that the originators of information would say how it should be organized. That either simplifies or totally removes the need for a central governing body. At most, you might need a review panel. This bottom-up approach is a really refreshing approach to the information problem, and it's something that sounds really modern to my ear. And as we may think, Bush talks about data as being organized by those who used data. Links are made by the reader, not the author.
Starting point is 00:48:15 A text could conceivably come pre-linked, but the main mode of the mimics was this reader-initiated linking. At least, that's what Bush spent the most time discussing. Both of these approaches represent a type of data democratization. In Bush's case, he's putting the power in the hands of the reader, in the hands of the information worker. A Mimex user creates links that they find useful. Alt-Lei is putting that same power in the hands of authors and researchers. The categorization should be handled by those most familiar with the material. Links and categories are different concepts, but I'd argue it's useful to look at these ideas as related. They're both ways to look at organizing information.
Starting point is 00:49:08 So, how was this lofty plan carried out? Now, I know I just spent a lot of time talking about how the Dewey Decimal System kind of sucked and how Outlay was reaching beyond the bounds of Melville's work. But the RBU is initially an augmentation atop the Decimal System. But the keyword here is initially an augmentation atop the decimal system. But the keyword here is initially. Why is Otley using a system that's known to be flawed? Well, it comes back to one of the core principles behind the RBU. It needed to be available as soon as possible. Otley viewed the information problem as a pressing one, so it was okay to cut some corners to get a quick solution. But he wasn't using the Dewey Decimal System wholesale.
Starting point is 00:49:51 There's this interesting addition that I have to discuss. Each card filed away in the RBU contained its decimal categorization, plus what Autle called a series number. I'd call it a serial number, but once again, potato potato. Time and time again, Otley stresses that the RBU has to work for an international audience, so it can't rely on any one language. That's one of the things that drew him to the decimal system to begin with. Numbers are universal.
Starting point is 00:50:26 the things that drew him to the decimal system to begin with. Numbers are universal. But there's this interesting meta-issue that shows up when using the decimal system. It's that shuffle stuff I keep coming back to. What do you do if you need to recategorize a text? What if a book moves from one decimal class to another? You have to have some way to discuss the book's metadata. That's where the serial number comes into play. Each card was given a unique number. This number was one of those incrementing fields that resets every year. The year of the series was added at the end of the number. The first book that was cataloged in the RBU for 1886 would be 1-1886, for instance. This may sound like a really small addition, but it allows for a neat trick. As Oatley explains, if you needed to move a card around in the RBU, you only had to
Starting point is 00:51:20 reference its serial number. only had to reference its serial number. You could send out a telegram telling someone to move 1-slash-1886 from the 534 class to 536. That can be formulated in a very unambiguous way, despite any language barrier. You just need some numbers and an arrow. Done. This little addition of serial numbers gives a way to discuss changes in the RBU. It also means that something like a link could, in theory, be formulated. You could have a card reference another card. But I'm getting ahead of myself. There's one more small step before we get into the next big leap.
Starting point is 00:52:06 That's the IOB, the International Organization of Bibliography. This was a group founded by Otley and LaFontaine to manage the creation of the universal repertory. But once again, the plan here was somewhat decentralized. The point of the IOB was to be something of a guiding council, coordinating the larger repertory and its organization schemes. Now, like I said, the RBU starts off using the Dewey Decimal System. That's quickly adapted into something very different. So what exactly are Outlet and LaFontaine's grander schemes? In a word, it involves syntax. The same year that Outlay starts publishing about the RBU, he laid down a track titled On the Structure of Classification Numbers. This came after there
Starting point is 00:53:00 was time for fellow bibliographers to discuss the future of the repertory. The article starts with this, When the Brussels Conference adopted Mr. Dewey's classification as a whole, it did not intend to proclaim that the classification was to be considered perfect in every respect. The conference did agree that it was sufficiently developed to be used as the preliminary basis for the universal bibliographic repertory, and was convinced that its principles were such as to assure its future development. This is going to be the foundation for what's to come. This is also where I need to throw in a bit of anachronism. The system discussed in this paper is going to be further refined and built
Starting point is 00:53:46 into the Universal Decimal Classification, or UDC. That's what it's called today. At the time, Alt-Leh wasn't using those exact words. However, I'm going to be using the latter UDC name to save space. It would suck to always refer to this as the system that will one day become the UDC. The lazy way to look at the UDC would be as a refined decimal system. However, that misses a crucial point. Outlay and his colleagues liked the idea of the Dewey Decimal System, but not so much its implementation. There's also the ideological differences between the two systems. The UDC, like the repertory it managed, was meant to be universal. So, if you're thinking that the UDC is just going to be Dewey's system on steroids, well, perish that thought. Now, to be fair, this is an easy mistake to make. I made this very assumption
Starting point is 00:54:47 when I first saw the UDC codes. Each book is identified by a string of numbers and punctuation. In effect, this looks like a weird Dewey decimal number. Initially, we're even going to be using Dewey decimal classes, but don't be fooled. The UDC is something much, much more sophisticated. You see, Outlet's new system has syntax. You could call it an international language for discussing books. In fact, that's just how Outlet describes the UDC. that's just how Alt-Le describes the UDC. Instead of simple classifications, the UDC provides a full language for describing the topic of a text. So, enough beating around the bush. Let's get to the syntax of the UDC. Exciting, right? Well, you're just gonna have to trust me on this one. Here, I'm gonna shamelessly steal from Alt-Lei's examples.
Starting point is 00:55:46 UDC codes are composed of numbers taken from tables of traits. Different traits each take on different syntax in the number. The most common and most simple is a straight-up Dewey decimal number. Early versions of the UDC still use Dewey's classes, so you can get a three-digit number corresponding to the general class of a book. You can have something starting with 709 for a book on the history of art, for instance. From there, we diverge from Dewey. One of the trait tables is for country. These are codes that correspond to some country, regardless of context. For instance, 42 is England. That doesn't mean the history of England or texts from England.
Starting point is 00:56:37 The code just means England with no context. How you combine that code with other codes, that's the powerful part. One example that Outlay gives is simply putting the place code in parentheses after a class. 709 with 42 in parentheses would mean the history of art in England. With this alone, we get some really subtle and useful results. By separating out two different types of categorization, you can better describe a text. You don't have to have a specific category for the history of art in each country. You can compose categories on the fly. This immediately solves a lot of the structural issues with the Dewey Decimal System.
Starting point is 00:57:26 This also allows for multiple types of grouping. Using this schema, you can not only group all texts on the history of art, you can also group all texts concerning England. the Dewey Decimal Classes times every country you can code for, and you now have thousands or maybe tens of thousands of ways to discuss, group, and organize data. I want to keep something in the front of your mind while we dive deeper. This is all happening in the 1890s. This is prior to the advent of computing. At least, this is way before there are any recognizable and functioning computers. This is before we have machines that can handle this type of mass data. There's only one somewhat analogous system I can think of in this era, but it's not even a good fit. Of course, here I'm referring to Herman Hollerith's punch card. I really want to draw comparisons,
Starting point is 00:58:26 but there just aren't any. The only connection is that Atle and Hollerith were both dealing with large amounts of information using slips of paper. In fact, I'd actually take the UDC over punch cards any day. Let me flesh out some things to explain my choice here. Let me flesh out some things to explain my choice here. The country coding thing I mentioned comes from an auxiliary table. In the UDC, these are tables of context-free keywords. In the place table, England is coded as number 42. There are also tables for chronological information, geography, geology, and political divisions. That's not an exhaustive list because
Starting point is 00:59:06 there are quite a few tables, and tables have also changed over time. Just to give a neat example of how granular this gets, let me point you towards Table H, the general place table. This has codes for relative directions. Center, north, south, east, west, that type of thing. Alt-Lay provides a few ways of composing numbers from these tables. One simple one, which is actually inspired by some of Dewey's later works, is just to use the table letter followed by code number. We can apply that to our earlier example to get something more refined. 709-42-H5 would mean books on the history of art in the southeast of England. Add D4 to the end and now you're coded for the early middle ages. This can continue until you
Starting point is 01:00:01 have a code that's so refined it describes a single book. There's a final syntactic flourish that makes the UDC even more useful. The colon. Simply put, a colon in a UDC number represents relation. This means you can code for texts of one category that relate to another. So we can make much further refinement. 27 codes for lakes in the geography table. So, if we want books on English art depicting lakes, well, that's easy. That's 709-42-27, the history of art in England relating to lakes.
Starting point is 01:00:43 The History of Art in England Relating to Lakes. This is still in a pretty rough state in the 1890s. Aultley and his colleagues are still hashing out the finer details of the syntax, but I think it's clear to see that there's a massive leap occurring here. The UDC represents a way of creating new classes as needed. Its auxiliary tables, combined with simple classes classes allow for a staggering amount of flexibility. And, best of all, this is still numeric. Well, numbers plus letters, but that still works here. You can sort everything in a natural and reasonable way. It's all predictable. If you went looking for one of our weird books
Starting point is 01:01:26 on the history of art, you just need to look for the leading number and then find the right sub-numbers. Once you have a code figured out, you just have to consult some organized cards. This is why I'd take the UDC over punch cards any day. Punch cards are only good for tabulating and storing data. Small chunks of data at that. Punch cards on their own don't offer a means of organizing that data, just a way to represent and hold it. Add in some machinery and you can sort and tabulate your numbers. That's nice, but it's not an overarching organizational scheme. What happens if you have a cabinet full of cards and you need to find one specific card? What if you need some census data on some dude named Sean who lives in some unspecified location? You have to first establish some way to organize your cards,
Starting point is 01:02:19 but breaking them up into smaller groups could make it harder to tabulate your data. You get all these trade-offs because of mechanization and just the realities of the medium. This was a known issue with punch cards since really early on. It's time for a quick deep cut here. There's this 1908 patent for indexing and assorting means. It describes putting notches on the edge of punch cards so those cards can be organized without having to look at their faces. This organization could be done by hand, by eye, or by machine. The main point of this patent
Starting point is 01:02:58 is that it's hard and time-consuming to sort through punch cards. By adding easily accessed metadata, those cards become a lot more useful. In other words, the patent proposed to add classification information to each card. By contrast, the UDC offers something that punch cards simply can't. It's all about classification. can't. It's all about classification. It's all about storing and finding data in a quick and reliable way. The UDC provides an entire language to describe information. That's more powerful than any medium alone. I guess it's a corollary to my sign that I tap all the time. Hardware is useless without software, and data is useless without organization. Alright, I'm going to cut us off for now. I think the stage is set for us to look at the Mundanium proper in the next episode. The UDC will be the heart of the Mundanium, but it's only part of the larger puzzle.
Starting point is 01:04:07 That said, we don't have to look at the UDC as a mere part of a larger system. The UDC is a revolutionary technology in and of itself. Alt-Lei and his co-conspirators developed a system for actually discussing books, a language-agnostic way of describing texts. That alone is wildly powerful. That will be shown time and time again as we discuss its application next episode. As I close out, I want to leave you with something to dwell on between episodes. This is one of those questions that actually keeps me up at night. episodes, this is one of those questions that actually keeps me up at night. How is the UDC any different from digital forms of hypertext? How does this very physical technology actually differ in a substantive way from something like Xanadu or NLS or HTML? If we strip away all the
Starting point is 01:05:02 physical aspects, all the digital artifice, and just look at the ideas, I think you may find something interesting. But no, I'll leave that as an exercise to the listener. Thanks for listening to Advent of Computing. I'll be back in two weeks' time with the conclusion of my Mundanium series. And hey, if you like the show, there are a few ways you can support me. If you know someone else who'd be interested in the history of computing, then please take a minute to share the episode with them.
Starting point is 01:05:34 You can also rate and review the show on Apple Podcasts. And if you want to be a super fan, you can support the show directly through adding of computing merch or signing up as a patron on Patreon. Patrons get early access to episodes, polls for the direction of the show, and bonus episodes. You can find links to everything on my website, adventofcomputing.com. If you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter. And as always, have a great rest of your day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.