Advent of Computing - Episode 40 - Spam, Email, and Best Intentions

Episode Date: October 4, 2020

Spam emails are a fact of modern life. Who hasn't been sent annoying and sometimes cryptic messages from unidentified addresses? To understand where spam comes from we need to look at the origins of e...mail itself. Email has had a long and strange history, so too have some of it's most dubious uses. Like the show? Then why not head over and support me on Patreon. Perks include early access to future episodes, and stickers: https://www.patreon.com/adventofcomputing

Transcript
Discussion (0)
Starting point is 00:00:00 Stop me if this one hits a little bit too close to home. Imagine for a moment, you're minding your own business. Maybe you're trying to focus on work, when all of a sudden your phone beeps into life. Now, it's probably nothing, but you decide to reach down and check just in case it's actually an important message. You see one new email. The subject reads, URGENT. LAST CHANCE. You're surprised the message is even in all caps. That has to be important, right? But once you open the email, you realize much to your disgust, it's just an advertisement. You've become one of the latest victims of spam. We all use email on a daily basis. Spam is just one of the unintended
Starting point is 00:00:46 consequences of that technology. Email is used for nearly everything, from billing notifications to catching up with friends, sharing small files to mailing lists. It's a really interesting case of technology that was just such a good idea it was bound to happen. In fact, there are many possible origins for the first email system. And when you get down to it, email is just a really natural response to a mundane need. It also happens to be a great example of using some of the most cutting-edge technology to do something as simple as sending out an all-caps ad. In the 21st century, email spam is kind of just a fact of life. As long as there's someone selling something or promoting something, the messages will keep flowing.
Starting point is 00:01:32 In the physical world, we have the equivalence of junk mail. And as the internet became more ubiquitous, it was really just a matter of time before these same tactics made their way into the digital world. But how did that jump occur? Well, to answer that question, we need to look at how email itself came to be. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 40, Spam, Email, and Best Intentions. We're now in October, and that means it's once again time for Spook Month here on the podcast. That means that this month's episodes will be at least an
Starting point is 00:02:16 attempt at scrounging up something close to digital ghost stories. Last year, I offered two slightly on-theme episodes, one covering the early history of computer viruses and another on the game Colossal Cave Adventure. Now, the operative issue at hand is that computers aren't actually all that scary, but they can certainly be strange and frustrating. So to kick things off, we're taking a dive into one of the more frustrating aspects of the internet. That's email. The fact of the matter is that email was a wonderful idea. But as more people hopped onto the platform,
Starting point is 00:02:51 it became pretty easy to abuse. Spam mail is just one prevalent issue. Viruses are very easily spread over email. Scams also thrive in the medium. Who hasn't gotten an email that offers you millions of dollars if you just pay some nominal pesky bank fees? Literally everyone and their mother has an email account. So there are plenty of targets for these types of messages. It's some actually scary stuff. But stepping away from all the modern issues, how did email actually come into being? away from all the modern issues, how did email actually come into being? Well, that's where things veer into the strange and slightly complicated. Broadly speaking, there are two phases to the story. Localized mail and eventually networked mail. The idea of email is really simple. It's just a means to send messages from one user to another. Something similar has been
Starting point is 00:03:43 invented, used, and then lost countless times throughout the development of computers. So teasing out the real origin of email gets a little bit dubious. It wouldn't take shape into what we know until ARPANET steps in, and from there it would spread as a core part of the growing internet. So let's take a closer look at email, from its contested origins and parallel development to its eventual standardization. And along the way, we'll try to find out just why junk mail came into the digital world. And if we're lucky, maybe even why junk mail stuck around for so long. Pinning down the first email system is actually really hard. I'd be willing to bet that it's impossible to get a 100% definitive
Starting point is 00:04:26 answer. The reason for that is that there's no single origin to point to. Email and email-like systems were developed in isolation for years, on totally unrelated computers by totally unconnected people. The idea of sending digital messages is just so natural that it seems to just pop up on its own. That being said, we can examine some of the earliest mailing systems to see where the idea started to take more of an official shape. The whole concept of electronic mail starts to form just after the creation of timesharing systems. So that's sometime in the early 1960s. These systems allowed for a single computer's resources to be shared between multiple users.
Starting point is 00:05:07 For the first time, more than one person could actually use a computer at once, something that was previously impossible. When you have a system that can only ever be used by one person, sending messages around doesn't make all that much sense. I mean, what are you going to do, leave a note for the next person to turn on the computer? You don't need special programs for that. But once you get to this point of shared infrastructure, the possibility becomes a lot more enticing.
Starting point is 00:05:34 Suddenly there's more than one person using the same computer. Maybe even enough users to hold a conversation. The other factor is how these early timeshare systems were actually used. To log into the system, you would first sit down at a personal terminal which was connected up to the computer. Some of these terminals were wired in more or less directly, but they didn't have to be. Most often, users were connected over a telephone line using a modem. Thanks to the robust phone grid in the States, that meant that a user could log in from basically anywhere in the country. Work was no longer constrained to a single server room. Instead, you could work from basically anywhere. This represented a total shift in how computers
Starting point is 00:06:16 could be used, and there'd be some big consequences. One of the first of these multi-user systems was MIT's Compatible Time-Sharing System, or simply put, CTSS. Completed in 1961, CTSS allowed for up to 30 users to connect to a shared IBM 7094 at once. While all the software that made CTSS is impressive, I just want to focus on one small aspect of the system, and that's how it handled user accounts. One of the big problems in any timesharing system is how to keep one user's data and programs from interfering with those of another. This becomes really, really important when multiple programs are running simultaneously.
Starting point is 00:06:58 If one program can interfere with another, then conceivably the whole mainframe could go down. This specific problem is usually solved using memory protection. It's basically type controls over how a program can actually read and write to the mainframe's memory. Now, a parallel problem exists for files. While one user messing around with another user's files may not harm the computer much, it could lead to some more personalized disasters. I know if someone wrote over my files, I wouldn't be very pleased to say the least. Just like with other
Starting point is 00:07:31 resources, the solution was to break storage up into chunks and control access to each chunk. In CTSS, each user had a personal directory tied to their account, and with each account being password protected, only that user could access that directory. This worked pretty well for keeping files safely isolated. Runway programs couldn't tamper with someone else's files, and prying eyes couldn't easily snoop on data. But as CTSS gained more users and became much more complex, this system ran into some weird issues. What do you do, for instance, if someone wants to share data with another user? To us today, that sounds pretty normal. We pass
Starting point is 00:08:12 around files all the time. But in the early 60s, this was totally new. Programmers were just starting to figure out how to make cohabitation on a computer possible. So even something as mundane as sharing a file, well, that was breaking new ground. CTSS implemented this in two different ways. The first and most simple was the concept of common files. The 1963 Programmer Manual described common files like this, quote, to allow convenient cooperation between programmers, such as students and classes or group projects, there's a feature which makes it possible to have files common to several different Essentially, common files were stored in a directory that anyone could access. Any user could read to common files, make new files, or even run programs stored in this shared directory. It made sharing data between programs really easy,
Starting point is 00:09:06 and it made collaboration between programmers very possible. The feature itself is simple, but it paved the way for much more. Just by having a shared digital space, it became possible to make larger community-managed datasets, or for teams of programmers to work together without actually having to be near one another. However, common files weren't a catch-all. All this data was still public.
Starting point is 00:09:31 If you had a CTSS account, you could have access to every common file. The second solution filled the gap by allowing users to securely or privately share files. On CTSS, this feature was called a linked file and allowed one user to share a file to any number of other users. Once linked, that file could be read only by users it was linked with, but could only be written to by its initial owner. So unlike common files, a linked file was relatively safe from outsiders. But it still had a problem. Linked files were only useful for sending data on a one-way trip. Now, everything I've outlined so far were official features of CTSS, things that were designed and built into the system's core.
Starting point is 00:10:16 But with so many people sharing a computer, interesting behavior and conventions start to form. Sometime in the mid-1960s, Tom Van Vleck, a programmer on the CTSS project, described a particularly strange custom forming around these shared files. Quoting from Van Vleck, quote, this new ability encouraged users to share information in new ways. When geographically separated, CTSS users wanted to pass messages to each other. They sometimes created files with names like toTom and put them in common file directories. The recipient could log into CTSS later from any terminal and look for the file and print it out if it was there.
Starting point is 00:11:00 CTSS users were doing more than just programming on the mainframe. They were starting to communicate at long distances. And really, it makes a lot of sense that this would happen. Humans like to talk, and common files were a really easy way to share data. The feature was already being used for programmed data, so why not adapt it for more personal information? But there were some issues with this ad hoc message passing. Throwing messages into common directories was really just like posting a letter on a bulletin board.
Starting point is 00:11:32 Anyone who passes by can read it, or even scribble on their own note. As long as everyone agrees to play nice, then that's fine. But with an increasingly growing system, the chance of interference was there. And what would happen if there was more than one person named Tom on the server at once? At the same time that users were casually sending around messages, a more serious need was starting to rise. CTSS was a big system, with a lot of different programmers contributing software. So as with any complicated project, documentation was very key to its success. Source code had to be documented, new commands had to be documented, and someone had
Starting point is 00:12:12 to manage all that pile of documentation. The process at MIT was pretty standard. The author of any new software on the system was required to write up the proper docs on the new code. new software on the system was required to write up the proper docs on the new code. That rough draft was then sent to an editor to, well, you know, edit the rough draft and eventually enshrine it into the authoritative manual. Sometimes, depending on what was submitted, the editor would need to get back in touch with the author for clarifying questions or critiques. And even once a new command was documented and put into the system, it was important to keep a dialogue open. Users are really good at finding bugs in code, so having some way to handle user feedback
Starting point is 00:12:54 is essential to the health of a project as large as CTSS. But with users now separated out geographically, there were some logistical problems in handling feedback. The first attempt to address these problems came in the form of a memo titled Programming Staff Note, Proposed Minimum System Documentation. Now, the actual note doesn't have a date on it, which I don't like. If you put a document anywhere, you should have a date on it for future reference by someone like me. The best guess I can give is that it was written either in late 64 or maybe early 65. The bulk of the memo is dealing with the improving of documentation workflow.
Starting point is 00:13:37 One of the improvements proposed is a little program called Mail. Quoting from the docs, quote, program called Mail, quoting from the docs, quote, a new command should be written to allow a user to send a private message to another user, which may be delivered at the recipient's convenience. This will be useful for the system to notify a user that all or some of his files have been backed up. It can also be useful for users to send authors any criticisms, end quote. useful for users to send authors any criticisms, end quote. The initial purpose for mail is twofold, get user feedback to programmers and send alerts to users while they're offline. Just some interesting things to note here, even in this rough draft state, we have a proposed robot sending emails to users. That's been around a lot longer than one might hope.
Starting point is 00:14:26 Anyway, there are a few little details that I want to tease out of that description. Firstly, mail was planned as a way for users to get messages, quote, at the receiver's convenience. As in, a way for users to be sent messages without being logged into the system. That's not new on CTSS. Users were already doing that with common files. The other interesting point is that the memo explicitly says that these are private messages. That's the part the common files were missing, and a sorely needed feature for a practical messaging system. But work on mail wouldn't start immediately. In the spring of 1965, Tom Van Vleck and Noel Morris had just joined the CTSS project. The two ran across the mail memo pretty soon after joining. But they didn't find anything else about the command on the system.
Starting point is 00:15:17 Quoting Van Vleck, When we read the PSN document about the proposed CTSS mail command, we asked, Where is it? And we were told there very much available to write up some new code. So by summer, the mail command was added to CTSS, and users were finally able to send secure private messages. And as with a lot of technology, the devil really is in the details. Even in 1965, we start to see the bones of what would become email. When you get down to it, mail was really just a codification and improvement of the existing common file conventions. But it does so in a bit of an interesting way. Earlier I mentioned that CTSS had another way to share files between users, the so-called
Starting point is 00:16:08 linked files. It seems like that could have been one way to implement a message passing system. But Morris and VanVleck didn't go that route. The issue with using linked files is that they only work for one-way communication. So to get a conversation going, you'd have to set up a series of multiple linked files. That's why common files were a much more popular option. Everyone can read and write to them. No setup needed. Mail did things just a little bit different. To send a message, you only had to type out who the message was going to and then the message to send them. Of course, it wasn't that simple. The message had to be the
Starting point is 00:16:46 contents of a file, so in practice, you first had to draft a message, save it to disk, and then send it. The recipient field was also a little bit more obtuse. You couldn't just say to Tom anymore. In CTSS, each user had a programmer number and belonged to some problem number. Think of it like a user ID and a group ID they belong to. To send a message, you had to know both of those magic numbers. The upshot was that this combination of numbers was totally unique. You can have more than one Tom, but Dan Bleck's address of M14162962, well, that's a little longer, but totally unambiguous. Another handy feature was that Mail could accept a list of recipients. You could specify those when you ran the command or store the list in a file.
Starting point is 00:17:34 In this way, you could message all of your friends at once, or even set up a crude mailing list. That all sounds pretty familiar, but Mail had a little bit of a trick hidden up its sleeve that we don't have today. The program also accepts wildcards for recipients. For a normal user, this meant that you could shoot off a message to everyone in your group. But if you were a more privileged user, then you could send a message to everyone on CTSS. Once a message was actually sent, it showed up, where else, but in the recipient's mailbox. I mean that very literally. Mail would drop all messages into a file called mailbox in a user's
Starting point is 00:18:14 private directory. Each message started with the date and time it was sent and then the user sending it, followed by the actual message body. If you already had messages in your mailbox, then the new messages would just be appended on. And, as an added feature, a user could even turn off their mailbox, by just setting the file as read-only. The final piece of the puzzle was to alert users when they got a new message. On login, the system would check if the mailbox file had any data in it. And if so, it flashed a quick, you have mailbox. That way, a user was less likely to miss an important communique. In the coming years, mail would become a staple of life on CTSS.
Starting point is 00:18:55 It deftly solved the more official problem of user feedback and documentation management. But more importantly, it gave users a way to privately communicate. This kind of digital chit-chat may seem unimportant, but systems like mail marked a fundamental change in how computers were used. It made very early digital communications possible, and suddenly a computer was more than just a machine for running programs. Morris and Van Vleck aren't necessarily the creators of email. There are a few major caveats to keep in mind. If you want to be pedantic, then at this point, it wasn't being called email. But more importantly, there were a slew of roughly similar systems being developed around that same time. Anywhere timesharing was in use, some kind of messaging
Starting point is 00:19:45 system was bound to be developed. But the MIT rendition has become one of the more well-known and influential. We can even see its impact on more modern mail systems. For instance, modern Unix-like systems handle their messages almost the same way. The biggest caveat here is actually shared by mail and all its contemporaries. It's missing one core feature that makes email, well, email. None of these programs could actually send messages to another computer. You were stuck talking with users on the same mainframe. In the 1960s, that wasn't really a big deal. Networking was still in its very earliest days. The cutting-edge networks included maybe a handful of computers at most. The idea of sending messages between
Starting point is 00:20:32 mainframes did come up, but no one really saw it as necessary. Just like how timesharing made these early message-passing systems inevitable, it would take a new technology to trigger the final push towards email. It would take the ARPANET. There is so much that we could say about ARPANET. It was the direct predecessor to the modern intranet, so a lot of the technology that we use today was codified on the earlier network. The project started in 1966, with the first part of the network going online just before the end of the decade. ARPANET was a massive project. Most of the core management and funding came from the US government, but the grunt work was actually done by a huge cast of universities and contractors. There had been other networks, both in the United States and abroad, but ARPANET was the first to really catch on, and a big part of its success was thanks to its design. The fine details of exactly how a network
Starting point is 00:21:30 works doesn't really matter that much for us today, but I think it's worth having a passing understanding at least. ARPANET worked so well because it was designed as a distributed network. There's no one hub of traffic that everyone has to connect to. Instead, ARPANET, and now the internet, is arranged sort of like a web of routers. ARPANET used a special purpose machine called an Interface Message Processor to manage all traffic on the network. When you made a request to some server, these IMPs handled finding the best route to your final destination. Sometimes it's right next door, but most likely your request has to hop from one IMP off to another until it eventually reaches its final recipient.
Starting point is 00:22:14 To make everything work out safely, each computer on the network has a unique hostname, something like Sean's Big Expensive Mainframe. When you boil it down, the network really just wires up a bunch of computers. Simple, right? But getting there was a huge undertaking. One of the key players in its development was BBN, a research and development firm that had major ties to early development of computers. In the 1960s, their research would become crucial to ARPANET. You can see it right in the network's infrastructure. The IMPs that handled all traffic on the network were designed and built by BBN. And one programmer at BBN by the name of Ray Tomlinson would develop the first network's email. How Tomlinson got involved with ARPANET is actually a bit of an interesting story in itself.
Starting point is 00:23:05 Ray almost made it through college without running into a computer. In the early 60s, he was enrolled in the electrical engineering program at Rensselaer Polytechnic Institute. Part of his studies included time working at IBM's main office. But he wasn't working on IBM's computers. Instead, he was working with analog test equipment. It sounds like some sort of thing close to intern grunt work. Tomlinson really enjoyed his time at IBM, but he could only spend so much time at the office before being drawn into the more digital realm. Quoting Tomlinson, After about two or three years of this, I finally saw a computer down the hall that was available for engineers to use, and I decided to learn to program it.
Starting point is 00:23:48 End quote. For Ray, that was the moment he was bit by the bug. It would be an experience that stuck with him for years, a small introduction to the larger world of computing. In 1963, he moved on to grad school at MIT, still on the electrical engineering track. In grad school, he started to specialize in analog speech synthesis. At the time, computers hadn't yet found their voice, so the field was still fully analog. The prevailing notion was that computers didn't really have the processing power to handle human speech. But Tomlinson, well, he didn't really buy that excuse. He decided early on that his
Starting point is 00:24:26 thesis project would be a computer-controlled speech synthesizer. The design called for a computer handling most of the work, and controlling a complex analog device to actually generate the final sound. However, a problem soon appeared. Tomlinson was a lot more interested in working on his thesis project than actually going to grad school, so much so that he stopped attending classes in favor of programming. His thesis advisor was, understandably, a little bit concerned at this. The advisor recommended that he could pick up some side work for BBN in their speech lab. Maybe some consulting would serve as a more healthy outlet for this young obsessive programmer. And it seemed to have worked. By 1965, Tomlinson earned his master degree and soon became a full employee at BBN. Initially,
Starting point is 00:25:18 he would continue his work on speech synthesis, as well as research into human-machine interactions. But a new project was looming large on the horizon, and as Beebe and Starved worked on ARPANET, Ray came along for the ride. Early on, ARPANET was, to put it bluntly, totally unrecognizable. The modern internet is all about shipping around large amounts of data, and moving any kind of data. But things started out very small and very restricted. Once packets and test messages were reliably moving through ARPANET, the first big use was
Starting point is 00:25:52 remote computer access. Using the network, researchers were able to log into any other computer connected. While useful, there were bigger plans for ARPANET, and those plans would actually take a lot of work to see through. At BBN, Tomlinson had shifted from speech synthesis work into ARPANET-related programming. More specifically, Ray worked on the team responsible for Tenex, that's BBN's own timesharing system. Tenex was actually pretty well represented on the new network, no doubt thanks to BBN's hand inside ARPANET. So it should come as no surprise that a lot of new network-specific software was being
Starting point is 00:26:31 developed specifically for Tenex. It was a platform positioned right at the center of a lot of big changes. One of the biggest challenges faced at the time was figuring out how to get the most out of ARPANET. Researchers now had access to an ever-expanding network of mainframes, but for the time being, all they could actually do was connect up and log into some far-off machine. Everyone knew that more was possible, but no one had quite gotten to the more part quite yet. That's where Tomlinson enters the picture. The next logical step after remote
Starting point is 00:27:06 login was to devise a way to transfer files around the network. This would open up a lot of possibilities. Imagine something like CTSS's common files but shared between hundreds of mainframes and tens of thousands of users. The fast and easy transfer of data between machines would be a very big deal. It was just a matter of working out the little details. Luckily, 10x proved a great playground for Ray to turn idle musings into actually working code. Quote, I wrote a little program to open a file here, open the other file in the other place, and send files back and forth, and that was called copy net and it was
Starting point is 00:27:45 pretty simple-minded you know one you put out a string this is what the file name is that you're writing to and the other one intercepts that and in turn opens the file at the other end and then it just streams the data through end quote now saying copy net was simple may be the wrong term. Rudimentary or inflexible is a little bit better. Tomlinson's program was only possible as long as both sender and receiver were on a 10x system. For a large swath of ARPANET, there wasn't much of a problem there. You can just shuttle data around between similar enough mainframes. But if you're running CTSS or Multics, then you're kind of out of luck. That being said, Copynet was a big step towards more codified file transfer protocols.
Starting point is 00:28:33 But Copynet wasn't the only part of the equation. It was a means to move data around. But what exactly could that data be? The final push that led Ray down the path to email came in a pretty strange form. In the wacky world of ARPANET and eventually internet and Minas Trivia, there aren't really memos. Instead, there are requests for comments, aka RFCs. These are essentially, well, they're just memos with a fancier name. Each has a sequential number, which makes them really easy to reference. The actual content of RFCs is something like an open letter to those managing the ARPANET.
Starting point is 00:29:15 They can range from short notes to full-on specifications to the more bizarre. Allow me to introduce the poetically named RFC-196, a mailbox protocol. Now, despite the name, this is not a proposed email system. Oh, but it's so close I can't ignore it. Reading from the introduction, quote, the purpose of this protocol is to provide at each site a standard mechanism to receive sequential files for immediate or deferred printing or other uses. The files for deferred printing would probably be stored on intermediate disk files. It later continues, quote, Multiple mailboxes, 128, are allowed at each site and are identified as described below. The default mailbox number is zero for use with the standard mail printer.
Starting point is 00:30:09 End quote. Published in 1971, just scant years after the ARPANET starts, RFC-196 is a proposal for a physical mail protocol. Now, this should sound like some high strangeness to anyone following along. It lays out a protocol for sending sequential text-based data over a network, storing it in some digital mailbox, and then finally printing out the message. Yes, printing out the message on a printer, presumably to eventually be put in some physical mailbox in someone's office. Now, all joking aside, RFC-196 is a really interesting document for a few reasons.
Starting point is 00:30:54 It actually proposes a full email system. Sort of, at least. The idea is that you could send a message over the ARPANET to a receiving server and that message would then be placed in some file awaiting use. It's just that the use for RFC-196 is printing that onto paper. The proposal is a wonderful example of researchers coming to grips with totally new technology and the strange paths they sometimes take. What's one use for computers? Well, printing. So why not let remote users do that over the ARPANET? It's this strange mix of state-of-the-art technology and bland practicality that I find really interesting. Instead of using ARPANET for something radically new, 196 describes a way to use it as an upgrade to existing
Starting point is 00:31:46 practices. Luckily, I'm not the only person puzzled by this RFC. Tomlinson was just as perplexed. It was clear that the proposal was hitting on something really important. Of course people want to send messages over ARPANET. But the design just wasn't fully formed. Interested in the prospect, Ray started to deconstruct the proposal and see if the rough idea was workable. Under 196's prescription, each mainframe would have up to 128 mailboxes, with the number on the box acting somewhat analogously to an address. Sure, that can work, but it also kind of sucks. The RFC was looking at mail more on the level of a mainframe, but you don't send mail to a computer,
Starting point is 00:32:35 you send it to a person. One of the big changes that Ray made was scrapping the whole idea of numbered mailboxes. Instead, each user on every server would have their own address. And luckily, there was already some precedent for him to work off of. As I mentioned earlier, CTSS's mail was just one example of a mainframe mail system. Most timesharing operating systems had some version of mail. On Tenex, that program was called SendMessage. While largely the same as the earlier CTSS system, there were some changes that had occurred over the years. One big one was usernames. By 1971, most operating systems had actual human-readable usernames. Sure, somewhere in the code each user still had an ID number, but you also had the option to use a more friendly name.
Starting point is 00:33:27 Instead of logging in as user number M12598, you were much more likely to just log in as Sean. That simple change was visible throughout these timesharing systems. Local mail clients now readily accepted user names instead of ID numbers, making it much easier to send a colleague a quick message. But there was a caveat. Usernames still had to be unique. So you may not be able to have your first name as your username, but you could have a recognizable one. But just a username will only get you so far.
Starting point is 00:34:02 On a single server, that was fine. But by 1971, there were around a dozen mainframes on ARPANET. So how can you distinguish between a Sean at UCLA and that one other Sean over at MIT? Once again, Tomlinson was able to draw from existing standards to make his life a little bit easier. Every mainframe on the ARPANET had a unique host name, basically an authoritative identifier for the computer. That way, if you wanted to connect to the installation at UCLA, you could just call up UCLA. No fuss or crossed wires needed. Combine local usernames with mainframe hostname, and you get a unique and totally unambiguous address. One final piece was missing from this puzzle, though.
Starting point is 00:34:46 How should those two names actually be combined? Well, for Ray, that was a no-brainer. It had to be the at sign, the curly A in a circle, quoting from Tomlinson. The purpose of the at sign in English was to indicate a unit price. For example, 10 items at $1.95. I used the at sign to indicate that a user was at some other host rather than being local, end quote. It also helped that, at least on 10x,
Starting point is 00:35:16 the at sign didn't have some other special meaning. Putting that all together, you end up with a really readable address. To email Tomlinson, you might type out ray at bbm. It's simple, you can read it aloud, and it's totally unambiguous. Today, it's just second nature to see email addresses written out in this format. It makes good sense. And frankly, that's because it's a good format. What Tomlinson did here is something that a lot of
Starting point is 00:35:45 programmers have learned and mastered. Instead of spinning something totally new, he combined existing technology and convention. Nothing in an email address was new in 1971. But putting all those pieces together led to something that was actually pretty new. Incidentally, the same user at host convention would be adopted by many other protocols on ARPANET. When it came time to implementing the software side of things, Tomlinson took a very similar tack. He already had access to send message, and he already had access to copynet, which he had just written up. Mail on 10x, and really mail on most systems, was just thrown into a file. And if you
Starting point is 00:36:27 can transfer files around on a network, then it's a very small step to transferring mail. Tomlinson sent the first networked email before 1971 was out. That first message didn't have far to travel. Quoting from Tomlinson, the first message was sent between two machines that were literally side by side. The only physical connection they had, aside from the floor they sat on, was through the ARPANET. I sent a number of test messages to myself from one machine to the other. The test messages were entirely forgettable, and I have, therefore, forgotten them. End quote.
Starting point is 00:37:02 While we don't know the actual content of the first email, we know where things went after 1971. The following year, a new version of 10x was finalized and shipped out. Notably, it included Tomlinson's new networked send message. Email was adopted at a breakneck pace. Sure, it may not have been groundbreaking technologically speaking, but it was a great use of existing systems. But more than any of that, it filled a very important need. Just like on CTSS, the ability to communicate was sorely missing from the early days of ARPANET. Email brought the human factor a lot deeper into the mix. Now you could do more than program or crunch data on the network.
Starting point is 00:37:45 It was a sign of what the internet would become years later. In 1973, a new RFC came out, number 561, titled Standardizing Network Mail Headers. The paper represented the first enshrined email standard, based off Tomlinson's earlier work. That same year, email accounted for roughly 75% of traffic on ARPANET. The fact was, people plain liked sending emails. It made work easier. It opened up a new line of communications. Everyone from the newest intern to even the Queen of England was firing off messages. Email had emerged, and it would stick with us into the modern day. I guess that leads to the final question. Where did it all go wrong? How did we go from happy programmers chatting away to a morass of spam mail and ads? Maybe unsurprisingly,
Starting point is 00:38:41 spam appeared in inboxes almost immediately. To see the first spam message, we need to head back to MIT and CTSS. One of the interesting little features of Morris and VanVleck's mail program was that it could send out mass messages. Like I mentioned earlier, the average user could specify asterisks in the programmer number field, and that would send out a message to everyone within your group. But for a privileged user, someone such as Van Vleck, you could specify an asterisk as the recipient group. In other words, if you had a privileged account on CTSS, you could send a message out to everyone connected to the mainframe. Normally, this kind of mass mail would be used for system alerts, information
Starting point is 00:39:26 about downtime, or other important and hopefully not very frequent administrative messages. No one wants to get a load of mail in their inbox, so it's the kind of feature that would have to be used sparingly and appropriately. But despite the restraint on the side of CTS's administrators, the feature was still there. It opened up a tempting possibility for anyone with a privileged account. It was only a matter of time before something happened. In 1971, every CTSS user was greeted by an alert that they had mail in their mailbox. Upon checking, they all saw the same message.
Starting point is 00:40:03 As Van Vleck recalled, it started something like this. Upon checking, they all saw the same message. As Van Vleck recalled, it started something like this, quote, there is no way to peace. Peace is the way, end quote. Of course, as with all unsolicited mail, it's in all caps. The message was sent by Peter Boss, a programmer on the CTSS team that happened to have the right access to fully unlock the power of the male command. When Van Vleck found out, he was not pleased. By this point, he was running the system programmer group on CTSS, so the misuse happened on his watch. Van Vleck eventually confronted Boss about the violation. Quote, I pointed out to him that this was inappropriate and possibly unwelcome. And he said, but this is important. End quote. Now, while this antidote is a little sparse,
Starting point is 00:40:54 I think there is something interesting here. Morris and Van Vleck had apparently worried about the possibility of something like this happening. But really, there's only so much they could do. The ability to send mail to every user at once was useful to the team, so removing it wasn't really an option. And even today, it's a little bit hard to tell spam from meaningful mail. The other thing that really strikes me is Boss's response. To him, informing his fellow users on the way of peace was important. It sounds like he had the best intentions possible. He wasn't looking at his message as junk mail.
Starting point is 00:41:35 Unlike today, where we have programs that handle sending spam, this early period had a discernibly human touch to it. But this was still just on a single machine. What about over on ARPANET? Well, jump forward a few years, and we get into some interesting territory. In 1975, RFC-706 was published, titled On the Junk Mail Problem. Now, this one falls more into the bucket of short memo. The entire thing's only a page long.
Starting point is 00:42:08 And surprisingly, 706 is concerned with a totally different kind of spam than you might expect. Quoting from the RFC, In the ARPA network host IMP interface protocol, there is no mechanism for the host to selectively refuse messages. This means that a host which desires to receive some particular messages must read all messages addressed to it. Such a host could be sent many messages by a malfunctioning host. This would constitute a denial of service to the normal users of this host. End quote. Ah, back to the lovely prose of RFCs. The takeaway here is that the junk mail problem isn't an issue of mass unsolicited mail. It comes down to the technical issue that seems to have popped up on ARPANET. If a server were to receive enough email traffic, possibly due to some faulty code,
Starting point is 00:42:56 then the computer could be rendered useless. More than anything, this RFC is talking about an early form of a DDoS attack. Once again, the sources are, sadly, pretty sparse, but from reading the memo, it sounds like the RFC is referencing some specific malfunction that was observed on the network. A day-to-day error isn't something that always makes it into the historical record. Anyway, that's just my best guess. What's important here is that 706 shows programmers on ARPANET were aware of a problem in the works. The possibility of spam, or at least overwhelming amounts of mail, was there. It's just that the recognizable form was yet to develop. But don't worry.
Starting point is 00:43:41 It wouldn't take that long. In 1978, we get our first verifiable spam email sent over a network. And, well, once again, we stray back into the realm of failed best intentions. 78 would be a big year for email, but was also a big year for Digital Equipment Corporation. A new model of the DEC System 20 was coming out, an incremental improvement but still an exciting new offering. All that was left to do was drum up some sales for the new mainframe. In 1978, the person for the job was Gary Theuric, a DEC salesman. One tried-and-true method for selling new mainframes was through demos. An announcement for the product would be sent out. A time and place would be set, and then crowds of interested people would come see the new computer in action.
Starting point is 00:44:33 Once the audience was sufficiently wowed, it was a lot easier to move units. But the system wasn't perfect. It actually took a lot of planning to make a demo work out well. It actually took a lot of planning to make a demo work out well. In 1978, it was common to invite people either by mail, through ads, or by calling them directly. These were pretty well-worn tactics by this point in history, and the way Thirk saw it, it was time for a change. Now, Gary knew roughly who he wanted to invite. If everyone came, it would be a really big crowd. The problem was
Starting point is 00:45:06 sending out letters to every prospective attendee would be expensive. Inviting them by phone wouldn't be much better. Tracking down so many phone numbers and then placing each call would take a long time. There was an immediate better solution that Dirk decided to turn to. Email. DEC did have mainframes connected up to the ARPANET, so sending out a message would be relatively easy. But there was one other key to Therk's plan, one that we can't exploit today. Back in the early days of ARPANET, there were actually printed email directories, kind of like a digital phone book. Today, the very notion of that is laughable. But the network was small enough that a directory of all active users could be compiled relatively easily.
Starting point is 00:45:52 So Dirk grabbed a copy and formed a plan. He had two demos lined up, one in LA and one in San Mateo. Leafing through his handy ARPANET directory, he found the section for West Coast users. Those would be the recipients of his email invitation. It came out to nearly 400 email addresses, all printed on physical paper. Dirk handed off the task of typing each address to one of his co-workers, and he set about work drafting the actual email. On May 1st, 1978, the message went out. And things degraded from there. The first problem was that DEC's email client wasn't actually built to handle so many email addresses. It may have been possible if
Starting point is 00:46:32 all the addresses were entered into a file to form a mailing list. But they were each input to the program over the course of a few days of work. So things go a little, well, strange. After the first 320 emails, the recipient field overflowed, putting the balance of email addresses into the body of the message. So the lucky ones who actually received the email, well, it started off with a long list of failed addresses. The following day, Thirk would correct the problem and send out the email to all the overflowed addresses. Reading past the accidental garbage, the body of the email started with, quote, Digital will be giving a product presentation of the newest members of the DEC System 20 family. The DEC System 2020, 2020T, 2060, and 2060T. The DeX System 20 family of computers has evolved from the Tenex operating system and the DeX System 10 computer architecture.
Starting point is 00:47:32 Both the DeX System 2060T and 2020T offer full ARPANET support. And it continues on like that for quite a while. As is custom, it's in all caps. The caps thing is a little strange to me. Some terminals or mainframes only handled capital letters back in the day, but this is 1978. I can't tell if the cruise control here is intentional or accidental. That part aside, the email is undeniably an ad for DEC's latest and greatest hardware. As far as I'm aware, this was the first time a large company tried marketing something over the ARPANET.
Starting point is 00:48:12 And it turns out that not many people were happy about that. Complaints were immediate, but things would escalate and get pretty out of control over the course of the coming days. This ad was a totally new and innovative use for the network, and it seemed the Thurk ad actually struck a nerve. There were two main complaints from users. Firstly, it was unsolicited junk mail, and secondly, it was way too long of a message to be sending via email. The latter actually led to some more technical problems. The aforementioned RFC-706 had never been implemented, so there was no way to deny incoming large emails. The message arrived and got pushed down to disk immediately.
Starting point is 00:48:58 And with the state of storage at the time, this led to some problems. Thirk would later recall, quote, Some people said they didn't want the email and that it ate up all of the free space on their computer. It used up all the disk space on a professor's computer at the University of Utah, end quote. It seems that the fears referenced in RFC 706 were coming true, courtesy of a mass mailer ad. There were some who defended Thirk's spam mail, but that wouldn't last for long. One early defender of note was none other than Richard Stallman, the future founder of the Free Software Foundation. Now, he was not one of the original recipients of the spam. But as with all things, Stallman has a certain style.
Starting point is 00:49:42 He has to decide and form an opinion as quickly as possible. So in a vacuum of any actual information, Stallman had this to say, quote, I didn't receive the deck message, but I can't imagine I would have been bothered if I had. I get tons of uninteresting mail and system announcements about babies being born, etc. At least a demo might have been interesting, end quote. But even RMS quickly changed his mind. A day after coming to Therx Defense, Stallman got wise to what was up, quoting again, well, Jeff forwarded me a copy of the deck message and I eat my words. I sure would have minded it. Nobody should be allowed to send a message with a header that long, no matter what it's about. End quote. Well, I think it's a pretty
Starting point is 00:50:32 accurate example of how a lot of ARPANET users felt. They objected both to the form and the function of Thirk's email. Oh, but the backlash actually gets worse than some angry programmers and academics. You see, there's a little nitpicky problem here. The ARPANET was a government-funded project. Most of the money came from tax dollars. As such, there were some pretty strict rules about its use. All traffic had to be related to government communications, military work, or research. An ad for the new DEX system 20 was none of those. Due to either lack of knowledge or lack of care, Therick had actually committed a federal crime. And it wouldn't take long for news to travel. Scant days after the message was sent out, the government was on the case. According to
Starting point is 00:51:26 one Major Raymond Zahor, quote, this was a flagrant violation of the use of the ARPANET as the network is to be used for official U.S. government business only. Appropriate action is being taken to preclude its occurrence again, end quote. Once again, just as a note, that message against spam is also in all caps. Anyway, some angry and threatening calls were made to Therx Boss, but despite the strong language, the government actually didn't do much else. The repercussions for DEC were surprisingly light. deck were surprisingly light. At the time, it was customary for ARPANET users to self-regulate the network, and in the wake of Therix's spam, that self-regulation had to be tightened. It would take years for more formal anti-spam measures to be put in place. But I think there's a bigger takeaway here. Spam mail makes a lot of people really angry, and it has since day one. So why does it persist? By 1978,
Starting point is 00:52:27 we can already see two big reasons for this. Firstly, spam was very successful. It was a great way to get your message out. The response to Peter Boss's piece mail isn't that well documented, but using CTSS, he was undoubtedly able to get more people thinking about pacifism. In Theorek's case, his spam ad made the upcoming DexSystem20 demo very successful. Sure, some people were turned off, but others were actually interested. By his own account, the controversial email netted Dex tens of millions of dollars in sales. The other reason spam keeps flowing comes down to the people sending it, and this point matters a little less now that everything in the world is automated, but
Starting point is 00:53:10 bear with me. Theric didn't have any malicious intent when he drafted his inaugural spam mail. Later interviews make it very clear he genuinely thought that people would be interested in seeing the upcoming demos. He would probably put it in different words, but share the same sentiment as Peter Boss. Why did he hit send? Well, because to him, the message was important. All right, that does it for this episode. As we've seen, email is one of those ideas that was just too good for programmers to pass up. Communication between users is such a basic need that digital mail systems crop up as soon as multi-user systems appear.
Starting point is 00:53:55 The specific catalyst early on was time-sharing operating systems. But anything that allowed for multiple users on shared infrastructure would have probably led to the same result. And once ARPANET gets going, it doesn't take long for email to adapt. By the end of the 70s, email was a staple on the network, and from there, its survival was guaranteed. And with email's existence, be it local or over a network, it was only a matter of time before unsolicited messages started to flow. network, it was only a matter of time before unsolicited messages started to flow. However, I think it's fair to say that early mail spam had a distinctive flavor to it, if you'll pardon the pun. In the more recent era of junk mail, things are just more annoying and much more malicious. But early on, it was different. Email itself was created with the best of intentions, and surprisingly so, spam was the same way.
Starting point is 00:54:49 Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another episode. And since it is Spook Month, I'm hoping to have another slightly frustrating or spooky episode on the docket. And if you like the show, there are now a few ways you can support it. If you know someone else who's interested in Computing's past, then why not share the show with them? You can rate and review on Apple Podcasts. And if you want to be a super fan,
Starting point is 00:55:15 then you can now support the show through Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access to episodes, polls through the direction of the show, and some bonus content. You can find links to everything on my website, adventofcomputing.com. If you have any comments or suggestions for a future episode, then shoot me a tweet. I'm at Advent of Comp on Twitter. And as always, have a great rest of your day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.