Advent of Computing - Episode 4 - Unix for the People, Part 1
Episode Date: May 20, 2019Many people have never even heard of Unix, an operating system first released in the early 1970s. But that doesn't change the fact that all of the internet, and nearly every computer or smart device y...ou interact with is based on some variant of Unix. So, how was such an important project created, and how did it revolutionize computing? Today we will dive into the story leading up to Unix: time-sharing computers in the 1960s. This is really just the background for part 2 where we will discuss the creation and rise of Unix itself. However, the history of early multi-user computers is itself deeply interesting and impactful on the evolution of computing.
Transcript
Discussion (0)
How would you describe our modern, digitally connected world?
I know, that's a really big question.
One way that we could try to talk about it would be to look at the hardware and software that forms the backbone of the internet.
The issue with that, though, is that you're missing a lot of the hard work and incredible amount of intent and foresight that went into its creation.
You could also try to describe it in terms of how people use the internet,
but now you're missing all the cool hardware and software
while also creating an unending soulless list of uses.
I think the best way I've heard it described is as a, quote,
system around which a fellowship could form.
But, you see, that quote wasn't originally intended to explain the best points of modern technology.
In fact, I'd say that in terms of the computing era, it's not even from the modern age.
That quote was first used by Dennis Ritchie,
a programmer working at Bell Laboratories, to describe a project he started working on in 1970.
That project would go on to transcend simple software,
and along the way, revolutionize computing in countless ways that we still benefit from today.
That project was a little something called Unix.
Welcome back to Advent of Computing.
This is episode 4, Unix for the People, part 1.
I'm your host, Sean Haas.
Now, I tried to get this down to a single episode,
but the reality is, the story of Unix is on such a grand scale that this has to be at least two parts.
Now, most people have probably never even heard of Unix, or its proverbial heir, Linux.
Usually, discussion of this kind of thing is relegated just to computer programmers
and hyper-enthusiasts.
That being said, I can guarantee that you're currently using technology pioneered by, or
more likely directly descended from, Unix.
If you're listening to this podcast, which, you know, I guess you'd have to be to hear my voice,
then you've downloaded it from a server that's running some kind of Unix.
Essentially, the entire internet is run off Unix and Linux servers of some variety.
If you own an Apple product product like a MacBook or iPhone,
then you may be surprised to hear that the software they run is also Unix-based.
The same exactly goes for Android.
I should probably back off a little at this point
and explain some things before diving too far off the deep end.
Unix is an operating system.
That's the cool software that runs directly on a computer and manages its resources and allows other programs to run on top of it.
I also mentioned Linux, which mentioning in passing is really a disservice.
Essentially, Linux is a more modern descendant of Unix, which has become increasingly popular in the last 20 or so years.
For right now, suffice it to say that when I say Linux, I'm just talking about another type of Unix, which has become increasingly popular in the last 20 or so years. For right now, suffice it
to say that when I say Linux, I'm just talking about another type of Unix. I'll get more into
the specifics on Linux later on, probably in episode 2. So, let's start this off. Today,
I'm going to tell you the story of Unix, how it's special, how it came to be, and how it changed the
world. But to do that, we'll have to go all the way back
to the 60s and the early days of computer adoption. To understand why Unix matters,
we need to first talk about where it evolved from. And the best part to start that discussion
is with the state of computers and their operating systems in the 1960s. Obviously,
computers of that era are almost unrecognizable to a modern user. The only real
computers at this point are mainframes. These are the behemoths that take up entire rooms.
They store data on reel-to-reel tapes and were interfaced with only using punch cards and panels
covered in switches. Tasks had to be submitted to computers as what were called jobs. They could
only be run one at a time. You'd
have to submit these by taking a stack of punch cards to someone who would work as an actual human
scheduler and plot out when your job could run and when you could get the results. That being said
though, the 60s were really when computers started to change. Part of that change came from the
adoption of what's known as a computer terminal.
To put it simply, a terminal is a device that uses a keyboard to input data into a computer
and provides some kind of way for the computer to return outputs. Originally, modified electric
typewriters filled this role. The problem with this idea is pretty clear. Since paper is used
for the output, you can't really edit anything
once you input it, and you burn through mountains of paper extremely quickly. However, the upshot
was that for the first time, users could interact with a computer somewhat interactively. That
meaning that they could tell the computer something to do and then see the results relatively quickly.
could tell the computer something to do and then see the results relatively quickly. This was a start towards more usable systems, but more than anything, this just brought computers up to the
next big hurdle. You see, while a single person working at a computer alone can accomplish a lot,
if you could somehow share a computer between many people at the same time, you can get even more done. The only issue with that is that
it turns out to be a really hard task to easily share a computer between multiple users.
The basic problem boils down to this. How do you share a single computer between a lot of users
at the same time without everything ending up as a mess. The idea of timesharing had been floating around since the
1950s. In this scheme, the computer would just switch between each user, but it would have to
do it really, really quickly. Ideally, this would mean that each user would think that they have
access to a whole computer. This also means that each user would essentially only have access to a much less powerful system.
To put it another way, timesharing lets you take a large, expensive mainframe that only one person can use
and break it up into a set of smaller, more personal computers that have less power.
Now, this may sound like a pretty simple solution,
but the reality is that implementing timesharing
on a system is no small feat.
It ends up working something like this.
Each user has to have their own terminal for input and output.
From that terminal, they can run programs and then read the results from the main system.
Which user is using the computer and what their programmer is currently doing on the system is called a state, and the computer has to be able to switch quickly between each
of those states, pausing, storing the current state, and then recalling the next state that
needs to be run.
This was the problem that was facing MIT programmers during 1961.
Armed with an IBM 7090 mainframe, one of the top of the line systems for the time,
a group of around eight programmers set to work trying to tackle the problem of timesharing.
What they created was first called the Experimental Timesharing System,
but later changed its name to the Compatible Timesharing System, or CTSS. Really, you have to love the early names of these
computer projects. So how did MIT go about implementing timesharing?
Basically, their solution worked something like this. Each user had their own terminal and two
dedicated tape drives. Now, we have to keep in mind that these aren't cassettes. We're talking
about full-fledged reel-to-reel drives.
A single one of these drives is the size of a cabinet.
They're about five, six feet tall, and they're by no means a cheap device to add.
One of these drives was dedicated to the user's private directory of files.
The entire other drive was used for storing the user's state data.
entire other drive was used for storing the user's state data. On each switch, CTSS would write down the computer's current state to one of the user's tape drives and then read in the next state from
the next user's tape. The reason to store state data on tape was due to the small amount of memory
that the IBM 7090 had, only 32 kilobytes. Also, there weren't really any hard drives yet at all, and tapes were slow,
but they were the only real option for long-term storage of data. Now, the solution did work,
at least in the strictest sense of the word, but it was slowed by the tape drives. Every context
swap required a complete read and write to a tape. Another problem with CTSS just came down to the
human interface, which was essentially still a glorified typewriter. The final glaring issue
was that CTSS could only share between a small number of users at once, with 30 as the system's
maximum load. So overall, while CTSS was very limited, it was a breathtaking proof of concept for the idea of a multi-user timesharing system.
While in use for a number of years, CTSS was by no means the final evolution of timesharing.
Improvements would just have to be made.
And to do that, a much larger and more ambitious project would have to be undertaken.
larger and more ambitious project would have to be undertaken.
This came in 1963 from Project MAC, an ARPA-funded research project located at MIT.
I know, despite the name Project MAC, it has nothing to do with the much later Apple project.
MAC here stands for Multiple Access Computing, an acronym that sums up the project surprisingly well.
To quote from the initial proposal, Mack's main goal was,
The evolutionary development of a computer system easily and independently accessible
to a large number of people and truly flexible and responsive to individual needs.
An essential part of this objective is the development of improved input,
An essential part of this objective is the development of improved input, output, and display equipment, of programming aids, of public files and subroutines, and of the overall operational organization of the system.
Ugh, that's dense early 60s computer writing at its best. Essentially, Project Mac was planning to push computing to its next stage,
that being interactive systems that could be personally used and interacted with.
The system that came out of this project would end up being called Multics.
So, how was Project Mac able to more fully realize timesharing?
Well, Project Mac started Multics much in the same way that any
big undertaking would begin, by forming a team to deal with it. After a considerable search,
Mac ended up settling on General Electrics for the hardware side of things. Yes, in the 60s,
GE made mainframes as well as washing machines. Go figure. Part of this choice, however, was due to the fact
that GE's current mainframes, the 600 series, just plain had more features and outperformed
the competition in a lot of regards, including IBM. The other reason for this choice was that
GE was willing to work closely with the rest of the team, offering both hardware and software development support. The other team member was AT&T's Bell Laboratories. I had some issues finding solid
information on why exactly Bell Labs was chosen as a partner in this project over other companies,
but from what I've read, I can make a few educated guesses. Firstly, at the time, Bell Labs was well positioned as a
frontrunner in the computing research field. In fact, one of the first binary calculators was
developed at Bell, a major stepping stone towards modern systems. They also had a hand in the
invention of the transistor, a device which would become increasingly core to the technology that
powers computers. And since Multics was going to be the next big bleeding edge advancement,
it makes sense that Bell would want a cut of the cake.
My other speculation is that since part of the goal for Multics
was to create a computing-as-a-utility type of system,
and Bell did have connections to AT&T's telephony infrastructure,
then it would make sense that you'd want to pair the
infrastructure for distributing that load and data transfer with the actual software that would be
behind it. So, by 1964, a team was assembled, GE, MIT, and Bell. Multics also had a targeted platform,
the GE645 mainframe. Now, to explain why Mac chose this particular
mainframe, we need to get a little deeper into the specification of Multics. I'll
try to keep the technobabble to a minimum, but there are some important things here that
I need to explain.
The first big part of the Multic specification is how it handles memory. These features are
segmented memory, virtual memory, and the idea
of sharing memory between multiple processors. Segmented memory is probably the easiest to
explain. It just means that the computer's memory is broken up into smaller chunks,
and each program is given a chunk to work with. The point of this is to keep programs from
accidentally doing something to one another's memory.
This is done by enforcing strict control.
You see, in Multics, a computer program is only able to access data within its own chunks.
Now, virtual memory is a little more complicated.
The best way I can think to describe it is as an abstraction to how memory is accessed. Basically, with virtual memory,
the program just knows that it's accessing some kind of memory. The virtual memory system takes
care of deciding if this is accessing actual RAM, data stored on disk, or really any other type of
information. This is important because it allows a computer to switch between keeping data in memory or on disk, which is really a requirement for time sharing.
The final piece is sharing memory between multiple processors.
This one is really just as simple as it sounds.
If a computer has multiple processors, then this just means that you're letting all of them access the same pool
of shared memory. Outside of the memory design, another large part of Multics was its security.
The reason to focus on this is twofold. Firstly, the primary funder of Multics was ARPA,
a government agency, so it stands to reason that if the feds ever ended up using Multics,
they'd want it to be really safe. The second reason is more on the practical side. A time-sharing
system as ambitious as Multics would have to have a lot of users on it at once, so a lot of care
would need to be taken to keep users from messing up each other's work or even taking down the entire computer. How Multics kept things secure spread over the entire spectrum of the system.
The most obvious part of this is that each user had their own password-secured account.
A user also only had access to their own files and were unable to alter files used internally
by Multics itself.
Each user account also had a privilege level,
which was used to control the amount of access any one user could have to the underlying computer.
For instance, one account may only be able to make and edit text files, while another could do
system maintenance tasks, or even more. But the security implementation went deeper into the
system in ways that no user would normally see.
Part of this connects with the memory system design.
Virtual and segmented memory makes Multics able to isolate each program from one another.
This made it nearly impossible for a runaway program to affect other user software.
Even deeper into the implementation, Multics could restrict certain
CPU and hardware operations to either the most privileged user levels or even completely lock
out certain features of the underlying computer. All of these features and more were carefully
chosen and designed to make the timesharing implementation in Multics more robust and usable
than the earlier CTSS.
Once everything was planned out, the Project Mac team was able to pretty quickly choose
what computer to design Multics on.
The GE645, still in development during the planning stages of Multics, was able to support
pretty much all of Project Mac's requirements in hardware.
Since the 645 supported things like segmented and virtual
memory in hardware, then Multics became instantly easier to program and faster to run on that
mainframe. Part of that is just because hardware circuitry and solutions tends to always be faster
and more efficient than trying to implement it in software while you're fighting against the
machine's restrictions. So something that I've been alluding to but haven't exactly addressed yet is how Multics
was programmed. Mainly, I was just trying to put off this part since it gets more into the planning
stages of software development, and that's something I like to avoid as much as I can when
I'm not working my day job. To boil it down, Multics was developed in two main stages.
You see, Multics was programmed to a software specification.
The first part of this process was to create that spec.
This just means that Project Mac sat down and planned out what features Multics would
have and how those features would be implemented before any programming started. The second phase
was actually programming the darn thing. The advantage of this approach is that Multics was
already designed before a single line of code was ever written. This also made it easier to find a
suitable mainframe and team when it came time to implement that specification. So after all this
talk of Multics spec, there is still one more important
factor to discuss, that being what Multics was programmed in. You see, Multics was one of the
first operating systems to be programmed in a high-level language, that being PLI. There's a
lot to unpack there. First of all, what exactly is a high-level language? So, generally speaking, there are two main types of programming languages, high-level and low-level.
Low-level languages are ones that are very close to the machine code that runs on the computer's actual CPU.
This includes things like assembly languages.
Low-level languages tend to be more difficult to program in, but give a developer extreme control over what the computer is doing.
Conversely, high-level languages are more akin to English, at least superficially.
They require something called a compiler, that's a program that's able to translate
the source code that a developer writes into something that can run on a computer.
High-level languages are easier to write in than low-level languages,
and they offer a lot more in the way of flexibility and features
than a lower-level language does.
This comes at a cost, in this case control.
In a low-level language, each line you write
exactly translates into a single operation on the CPU.
But in a high-level language, the compiler has to decide
how best to implement your code as a series of instructions. This can make programs written in
languages like C++ and Java run more slowly than a program written in a low-level language.
Now, if that's a lot of gibberish to you, that's fine. Just remember this. A high level language is faster to program in but slower to run, whereas a low level
language is slower to program but faster when it finally runs.
So why is this diversion important to Multics, and why is it important that Multics was written
in a high level language?
Mainly it's because it just flat out hadn't been done before.
There was an
earlier operating system that used a high-level language, but Multics was the first large
and long-lasting project to take this route. A lot of people didn't even think it would
be possible to make a fully-fledged operating system in anything but assembly language.
By using a high-level language, the Multics team could focus more on the actual implementation of the operating system instead of focusing on the underlying computer.
This also future-proofed their work to an extent, since pretty much every operating system that follows ends up being written in a high-level language as well.
But, that's not to say that choosing PLI made Multics write itself, far from it.
In fact, finishing the system ended
up being a Herculean feat. Part of this was due to the fact that they were so early in the adoption
of PLI that part of the Multics team ended up having to write their own specification and
compiler for the language. Originally, they had planned on using a subcontractor to do this for them,
but as the years rolled on, the subcontractor never delivered, so part of Multic's resources
had to be diverted to just writing a compiler. The compiler, though, wasn't the only part of
the project that got delayed. The GE645, the proverbial heart of Multics, was also delivered a lot later than the team would have
hoped for. It wasn't until 1967 that both the mainframe and compiler were in hand. That's nearly
four years after the project started. And while a prototype of Multics would first boot in December
of 67, it would take another two years for bugs and performance problems to be ironed out.
It would take another two years for bugs and performance problems to be ironed out.
That was two more years of time lost than Project Mac had accounted for.
If this was a normal project, then setbacks like this may have been fine.
But we can't forget that Multics was a government-funded endeavor.
This meant that the longer it drug on without a finished system,
the more tax dollars had to be spent, and the harder that became to justify.
Alright, so I want to try and wrap this episode up before it gets too much longer.
So there's all the groundwork to start our conversation of Unix in part 2 of this episode.
Now, the Multics team would end up delivering on their completed system around October of 1969, but only after significant setbacks. Seeing the possible failure of the
system and the seemingly endless delays, Bell Labs pulled out of the project in early 69.
But this isn't the last we hear from Bell, Far from it, in fact. Their programmers would go on to work
on their own timesharing system that would be released later on into the 70s. Those programmers
include a few names that you should be familiar with if you've ever used much C. Ken Thompson
and Dennis Ritchie. If you don't know those names, then don't worry about it. I'll be telling you a
lot more about them and much more in the next episode. Thanks for listening to Advent of Computing. I'll be back in two weeks time with
part two of the Epic of Unix. In the meantime, if you like the show, please take a second to
share it with your friends. As always, you can rate and review on iTunes. If you have any comments
or suggestions for a future show, go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter.
And as always, have a great rest of your day.