Advent of Computing - Episode 159 - The Intel 286: A Legacy Trap

Starting point is 00:00:00 Pop quiz. What did Grace Hopper think was the most dangerous phrase in the English language? The answer is, we've always done it this way. That one phrase has the power to, well, maybe move mountains is the wrong analogy here, but perhaps the power to stagnate entire ecosystems. If you're in the biz, then it won't take too long before you encounter the full power of this kind of sentiment. They're called legacy systems. This is what happens when there's some old software or even hardware that, despite its age and weariness, is still in everyday use. They're systems that have always been this way.

Starting point is 00:00:46 Support for these systems comes in a number of forms. They are a living system after all, and any system, even old ones, need to be maintained. Bugs will be found, new features will perhaps need to be added, or parts of the system may even need to be added or parts of the system may even need to be removed entirely. That's all normal for really any long-running project. What makes legacy systems scary are the constraints that come along with them.

Starting point is 00:01:16 You have to work with the existing system. Any change you make has to maintain how the system operates. Any new feature has to be compatible with older parts of the system. One of the biggest examples is IBM. Their modern mainframes are designed in such a way that you can still run code from the 1960s. That's because many of their clients are running ancient systems, some written in cobalt. These are things like banking software. To replace that software, you'd have to bring the whole system down and then back up again,

Starting point is 00:01:54 which could interrupt bank transfers. That would be catastrophic for these kinds of companies. IBM provides support for these legacy systems. They provide compatibility. And their clients, well, they have programmers that still know COBOL and are left to work on this ancient code. It pays pretty well, but it does trap you in a sort of box. This phenomenon happens in a smaller scale just about anywhere you find software.

Starting point is 00:02:25 Want to make a new user interface? Well, it has to have these five buttons that we've always had, and it has to be able to talk to this old software that we've used since 1993. We've always done it that way, and we always will, we always must. If you look hard enough, you can find this curse buried deep in every computer. And Intel, it seems, has suffered from a particularly pernicious formulation of the legacy curse. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 159, the 286, the legacy trap. It's time for a bit of a return to form, and what better topic than Intel?

Starting point is 00:03:19 For newer listeners, a few years ago I did this this long running series leading up to the Intel 8086, perhaps one of the most important microprocessors ever created and one of my personal favorites. Most modern computers are still compatible with the 8086, a chip first sold in 1978. The funny thing is that the 8086 itself was based off designs from 1969. Following Intel's early processors paints this strange and very direct line from the design of a terminal in 1969 all the way to the modern day. It's a neat story of how legacy can become a double-edged sword. That series was planned to end with the 8086.

Starting point is 00:04:08 However, I've been getting quite a few requests to keep going, and who am I to say no? So today we'll be looking at the next chip in the x86 lineage, the Intel 80286. As always, Intel's naming conventions are just not very good. Most, however, simply call this chip the 286. This is actually one of the few topics on the show that I'm directly nostalgic for. I learned to program on a 286 computer.

Starting point is 00:04:43 I have a bit of a soft spot for the old chip. That said, I don't actually know that much about its history. The Jump 2, well, 2 introduced a number of huge changes. The name for those huge changes is protected mode. At the flip of a digital switch, the 286 turns into a much more powerful processor. It can access more memory, it can handle process control, it even gets virtual memory.

Starting point is 00:05:12 It just becomes a much more capable chip. And, as is the season this year, this means the 286 can run normal versions of Unix. No caveats, no tricks, just Unix. But why do you have to flip a switch to access the full powers of this processor? Why is all this potential hidden in a special mode? It's that double-edged sword. It's the curse of the day. The 286 is the first real heir of the 8086's legacy. That can be a horrible weight to carry. This episode, we'll look at how Intel moved from their first huge success to the follow-up. Why was the 286 developed?

Starting point is 00:06:00 How was compatibility with the older chip handled? This is Intel after all, so it kind of has to be compatible, right? And ultimately, did that compatibility hold the 286 back? Before we get into the episode proper, I have my quick announcement corner. This August, I'm going to be in Mountain View for the Vintage Computer Fest West. I'm going to be speaking about some emulation software I'm working on and just having a lovely weekend. So if you're interested, I highly recommend turning out.

Starting point is 00:06:36 I'd love to meet all of you. I have a great time every time I get to meet listeners. Again, that's VCF West. It's August 1st this year in Mountain View. You can find all the details at VCFed.org. That's VCFed.org. And now, on to the show. It came as a bit of a shock to me that the story of the 286 just isn't that well documented. The chip seems to be overshadowed by the 8086 before it and the 386 that comes after it.

Starting point is 00:07:15 What I'm going to present is what I've been able to put together from a few oral histories, some papers, and a couple of magazine articles. So I guess that's all to say, this is a pretty normal episode of Advent of Computing. The first thing we have to understand is that the 286 itself was also a stopgap. In 1975, Intel had started their wildly ambitious IAPX 432 project.

Starting point is 00:07:44 It was a totally new chip that was designed to function a lot more like a mainframe than a microprocessor. It was built to run object-oriented code almost natively. It was an amazing machine. And it was more than a little cursed. Maybe curse is the wrong word. You could call it ill-conceived, or ahead of its time depending on how you're feeling. The project was famously delayed over and over again.

Starting point is 00:08:17 During those delays, the 8086 was conceived of as a stopgap, something to put to market to buy time while the 432 was completed. Intel still had to make money to funnel into this amazing new design. And it seems that the 286 was conceived for a very similar purpose. To quote John Crawford speaking during a CHM oral history panel, quote, the 432 was a very ambitious project that Intel was very firmly committed to, and unfortunately, it was also late and had slipped pretty significantly. So we had a number of gap fillers that were thrown into the breach. This was a little

Starting point is 00:08:58 before my time, but my understanding is that the 8086 microprocessor was the first of those gap fillers, followed by the 8088, the 186, and the 286." Intel was betting the entire house on this new next-generation chip, but they still had to make rent. The IAPX 432 was the entire future. Everything else was secondary. You can even see this mindset in the 80286's early name, the IAPX 286.

Starting point is 00:09:37 This chip is meant to keep the lights on as the real project wraps up. But we all know how that went. The 432 project fell apart. When the chip was launched, it was slow, expensive, and not nearly as powerful as anyone had hoped. It sounded great on paper, but the reality was far from expectation. No one wanted the new chip. That, coupled with the surprise success of the IBM PC, meant everyone wanted x86 chips. Everyone wanted chips that they could put in a new PC compatible. Thus, these gap fillers became hot commodities. But I'm getting ahead of myself.

Starting point is 00:10:28 became hot commodities. But I'm getting ahead of myself. This all happens well after the beginnings of the 286 project. So then when exactly did this new chip enter development? Well, that turns out to be hard to say. It has to be after 1978 because that's when the two principal designers were hired by Intel. It has to be before 1982 because that's when the chip launched to market. And we know that the project was delayed at least a year, something I'll get back to later. So this pushes the start date back in time to maybe 1981, but my gut says probably 79 or late 78. Whatever the exact year, what we do know is the 286 project started before the IBM PC.

Starting point is 00:11:19 And that leads to something kind of wacky. The concept of the PC doesn't exist. There's no push for a specific microcomputer that's gonna use every single one of these chips. Intel has a very small relationship with IBM, but it's nothing like what would happen once the PC launches.

Starting point is 00:11:41 So their priorities are just plain different. But even without the PC, there are certain pushes and threats that are shaping what Intel decides to do. The name of that threat, or at least the big one, is Zilog. Zilog was spawned from the very Intel engineers that created the 8080. They had gone off, formed Zilog, and started selling the Z80, which is basically an improved 8080 CPU. During that debacle, Zilog actually started to outsell Intel.

Starting point is 00:12:18 And as the 8-bit era drew to a close, both companies were trying to make the jump to 16-bit. Zilog's Z8000 came out in 1979, and it looked like a real contender. And remember, this is before the PC, it's before the x86 architecture blew up at all. Jim Slager, one engineer on the 286 project, explained an added pressure in his oral history. Quote, I quickly learned that there was this ominous dark object hanging over Intel, which was threatening to crush it, called the Zilog MMU, a memory management unit, is what

Starting point is 00:13:08 separates primitive computers from much more sophisticated machines. It's a circuit that remaps memory usually according to some configuration or table and it can be used to protect certain regions of memory from tampering. Those two features make rapid and stable multitasking possible. It allows a computer to isolate running processes and even move code as it's executing. Operating systems like Unix, classically Unix, need an MMU to really work. Zilog producing an MMU would be trouble. In theory, the new IAPX432 would just blow this out of the water, it has every feature

Starting point is 00:13:53 imaginable, but that chip didn't exist. So we end up putting more pressure on these stop gaps. This is all to say nothing of Motorola, the other competitor in the space. Their 16-bit offering, the 68000, also debuted in 1979. Externally, it was a 16-bit machine, but on the inside, it had 32-bit features. It was just more sophisticated than what Intel was selling. So Intel needed to bring something to market and they needed a certain it factor. It was decided that that would be an MMU. The solution ends up being the 286.

Starting point is 00:14:38 But its design, well, that gets weird. Let's do the top-level features and then we can dig into the details because that's where we start to see the real, real interesting bits of the story. The first is that the 286 had a built-in MMU. This would immediately blow Zilog out of the water. Most other processors in the era used external MMU chips. You'd get your processor, wire it up to this big black box that says MMU, and then your memory would plug in to the MMU.

Starting point is 00:15:19 But the 286, well, it was memory ready from the get-go. This meant you could run Unix on it, as mentioned, or maybe even something more sophisticated. That is, of course, assuming there's anything more sophisticated and understated than Unix. Next comes memory protection, although this does go hand-in- hand with the MMU. The 286 had four different execution privilege levels, called rings. This way you could control what a program could actually access on the chip and in memory. It's very fancy and a very important trick for multitasking. Then we get to the aspect of performance.

Starting point is 00:16:03 The 286 was fast when compared to the 8086. It was at least twice as fast. One big reason for this was a radically different design approach. The 8086 had, in some spots, reused much older circuit designs cribbed from earlier Intel chips, but the 286 was completely new. This includes a feature that I just have never talked about on the show, CPU pipelining. This allowed the 286 to operate quickly even when using older, slower forms of memory. The 286 also had a huge memory space. It could address 16 megabytes of memory.

Starting point is 00:16:49 That's 16 times more than the 8086. And finally, the 286 was 100% backwards compatible. But that comes with some implications. You could take code for the 8008, the first 8-bit microprocessor, and use a series of translation tools and a little bit of binary compatibility to get it to run on the 286. It'd take a few steps, but it would be possible. That works because in large part, the 286 is just an expansion of the much, much earlier chip. This part, the compatibility, will be the source of many woes.

Starting point is 00:17:32 Much of the strangeness of the 286 will stem directly from this one design constraint. Now, as for detail, well, this is where we actually get a paper trail that's not nice corporate pamphlets. Slager was one of the designers of the 286, and it turns out he liked to write about it. At least a little bit. In 1984, he authored and co-authored three papers in ACM discussing the design of the 286. It's thanks to those that we have some insight into its design as viewed by its designers. The first of those papers has

Starting point is 00:18:12 an ambitious title. It's called Microprocessor Employs Mainframe Performance Design Techniques. That name alone sounds really exciting, right? It opens like this, quote, During the infancy of integrated circuit technology, designers still grappled with the problem of getting enough transistors on a reasonably sized die to enable them to design more than fundamental types of logic chips. Little more than a decade later, designers no longer are faced with that problem. On the contrary, with the 100,000 and more transistors that we can now economically integrate, their task becomes one of deciding whether to add ancillary functions on the same die as a

Starting point is 00:18:58 microprocessor or use those transistors to embellish and speed up the microprocessor's activities." The message here is kind of clear, right? We've now hit this period where we can make bigger and better chips. There's space to grow, so let's use it. But there's something subtle in this context. Slager is speaking of this period where the IAPX 432 is still under development. That's the big project. And that project was purposely pushing the boundaries of what was possible on a microprocessor. Slager, on the other hand, isn't talking about pushing the edge of what's possible, rather working within the grand space that's now within that boundary.

Starting point is 00:19:51 So how does Slager propose to hit all of those top line goals with this wide open field of silicon? I think we need to start fleshing in some detail, and I'm going to start with something that we haven't talked about before and I've kind of been avoiding. It's on my list of possibly confusing topics. That's pipelining. There are two ways to speed up a computer.

Starting point is 00:20:22 You can crank up the computer's clock so that it takes more steps each second. Or you can design the machine such that each step it takes is more efficient. These kind of go hand in hand. As steps become more efficient, they take less time and so you can take those steps more quickly. But in general, you want to be aiming for efficiency first. If you take an inefficient chip and just crank the clock speed up, it's going to hit a limit where it can't hold on.

Starting point is 00:20:56 One method to increase efficiency is called instruction pipelining. Now, I will admit, this is one of those concepts that I've always avoided because I assumed it was super complicated. But it turns out it's actually not. The core idea is that you want to have the computer doing something at all times. It turns out that you can break the work of a computer into a few smaller steps. To process an instruction, you have to first read that instruction from memory, decode what the instruction means, execute the instruction, and then write

Starting point is 00:21:31 the results back to memory. In a pipeline machine, you have separate circuits that handle each of those steps. Then you sequence those steps so that something is always occurring. You're trying to always have the silicon working. While an instruction is being executed, you're already reading in and decoding the next instruction. While data is being written to memory, a new instruction is just beginning to be executed. This represents a huge gain in efficiency. The computer is always preparing for the next step. The 286 breaks its processing down into four cycles, which it calls fetch, decode, execute, and address an MMU. That last phase is compounded because addressing

Starting point is 00:22:21 and the tasks of the memory management unit are very closely coupled. To make this all possible, the processor is actually four separate circuits that talk to each other with an internal protocol. Now technically, this wasn't new for Intel. The 8086 is pipelined, it's just not a very sophisticated pipeline. That chip uses two components, one to load instructions and want to execute them. You load, you do. You get, you do. The 286 just takes a more well-developed approach here.

Starting point is 00:22:58 Let's follow the flow of an instruction and meet each part of the 286 along the way. Come along! An instruction is first fetched by the bus unit. That starts off the wondrous cycle. The bus unit has all the circuits needed to communicate with external memory and devices, and a little bit of memory of its own. That internal memory is used as a buffer for this trick called prefetch. The 286 actually tries to load a few instructions at a time and throws those all into a buffer. According to Slager, this actually works really well. The idea is that in most cases, when you execute one instruction, you'll then execute the next instruction in memory. Most code is sequential. You want to be running instruction 1, and then probably 2, and then

Starting point is 00:23:53 most likely 3, and so on. But this can get messed up if you have a jump instruction. That moves execution to some other place in memory. When that happens, the buffer has to be cleared, and new instructions have to be prefetched. That little jump issue could be a problem. However, Slager can hit back with actual numbers here. During the design of the 286, an analysis of existing 8086 software was carried out. It was found that control transfers, those are jumps, calls, branches, and interrupts, only account for an average of 15% of a program. So the prefetch will have the next few instructions correctly buffered and ready 85% of the time.

Starting point is 00:24:43 That's pretty good. This is one example of the good that can come from Intel's legacy. The fact that there's so much software floating around for a compatible chip means you can do rigorous analysis like this. You can actually see how people are using the processor, what real code is actually like. That data was used to help design the 286. After the bus unit, an instruction is grabbed by the instruction unit. Specifically, the instruction unit is getting data from the bus unit's prefetch queue.

Starting point is 00:25:19 All these units talk with queues because it lets them stack up work. Once the instruction unit is done working with one instruction, it can immediately turn around and grab the next one. That again, keeps the wheel spinning at all times. The instruction unit's job is to decode instructions. The deeper machinations of the 286 expect a very specific format, one that's different than the machine code stored in memory. Once everything's well formatted, it gets pushed

Starting point is 00:25:53 into, surprise surprise, another queue. That queue is read by the execution unit. That's where things get complicated. Now, this unit is where all the bits that make a computer a computer actually live. It contains the arithmetic logic unit, the ALU, that's the bit that does math and logic operations. It has registers, the tiny chunks of internal storage that are used for immediate operations. It has all the circuits for controls and signaling. It's again, where computing actually happens. The internals here are all pretty standard, so not super worth dissecting.

Starting point is 00:26:41 What I want to push us towards is the final unit, the address unit. This comes into play whenever an instruction references an address in memory. For this, let's consider writes, because Slager spills the most ink on that, so that's easy for me to steal. A memory write is usually a simple operation. You tell a computer something along the lines of, put this value at this address in memory. In any given program, most of your instructions

Starting point is 00:27:12 will be memory writes or memory reads, because really programming's about manipulating memory. When you add an MMU to the mix, however, memory becomes abstracted. You now have two types of addresses to worry about, a physical address and a logical address. A physical address is the actual physical location on the memory chip. This is the address that the memory device speaks in. The virtual address is an address mediated by the MMU. It's the address that your program device speaks in. The virtual address is an address mediated by the MMU. It's the address that your program will speak in.

Starting point is 00:27:50 When your program writes to memory, it's speaking virtually. The MMU takes that virtual address and turns it into a physical address that your memory device will respect. To accomplish that mapping, the MMU has to do a bit of logic and a bit of math. An interesting detail here, and an example of how the 286 makes use of all its silicon,

Starting point is 00:28:14 is how that math is done. The MMU actually has its own special-purpose math circuits built into it. There is at least an adder and a subtractor etched into this physical part of the chip. That means the 286 actually does have some duplicated math circuits. For writes, the whole process is, again, queued up. At least a little. During a write operation, the execution unit will ask the address unit if that write is allowed. It will say, hey, here's the address, here's where it's coming from, is this okay?

Starting point is 00:28:53 Am I violating any rules? If the answer is yes, then the execution unit just says, great, and continues working. And the address unit is left to actually perform that memory operation. The overall effect with these four units is that the 286 can operate much, much more efficiently than the 8086. Its very structure is built so that the chip is almost never idle. It can also function pretty well with slow memory, because it's constantly pulling in and filling up its own buffer. The design of the pipeline means that execution just isn't bottlenecked by slow memory devices. Again, really neat design.

Starting point is 00:29:40 It seems, however, that the finer details are where things start to fall apart, at least according to some. Lucio Lanza was an engineer at Intel during this period. He was heavily involved in the 8186, something of a smaller sibling to the 8286. In an oral history with the Computer History Museum, he pointed out some of the 286's shortcomings. The 286 was considered extremely heavy and extremely not agile. Things that people don't realize, the 286 was designed inspired by Multics. So the 286 structures were inspired by Multics. When Unix was winning, Intel was so behind that they were inspired by Multics.

Starting point is 00:30:30 That was the one we were really looking at. It was terrible, terrible architecture." Multics is the indirect predecessor to Unix. When Ken Thompson and Dennis Ritchie start Unix, well it's actually because they had lost access to a computer that was running Multics. The name Unix is even a joke about the earlier system. Lanza is basically saying that Intel is completely behind the ball. Multics never really took off because it was huge and lumbering, but the Agile Unix was a big hit. Now we have to ask, is any of this true? Is the 286 inspired by Multics, and if so, what

Starting point is 00:31:13 does that even mean? Something crucial to understand about the 286 is that it is, in large part, an example of hardware that's been shaped by the requirements of software. The addition of an MMU is specifically to make the 286 a better platform for multitasking. What Lanza is pointing to are the specifics of how the MMU is implemented. The biggest point here is the concept of protection. This has always been core to multitasking ever since the 1950s.

Starting point is 00:31:49 When a program starts up, it needs some memory. It will need a chunk of memory for its actual code, and a chunk of memory for any variables or data it will play with. The operating system is responsible for handing out that memory in a process called memory allocation. A program has to stay within its allocated memory or stuff gets bad. What makes this difficult is you can't actually assume that a program will play nice. Software can and usually does contain bugs.

Starting point is 00:32:22 That can cause a program to write to a random location in memory, well outside its allocated bounds. Software can also be written in malice, specifically to snoop on or mess with other programs. You could even conceivably have a bug in a program that can be used to malicious ends. Think of the whole class of buffer overflow exploits that plague the internet.

Starting point is 00:32:48 Heartbleed is probably one of the more recent high profile examples where a bug was used to mess up memory. These kinds of issues can be acceptable if you're the only one using your computer and if you're only running one program at a time. But once you get to more complex systems with multiple users and multiple programs, then a memory violation can wreak havoc.

Starting point is 00:33:15 It's actually very easy to crash the whole system by just writing randomly to a location in memory. Hence, it's crucial to have some form of protection. The only way to do that efficiently is at the hardware level with an MMU, something that stands between a rogue program and your precious, precious memory. The 286 implements protection using a method called segmentation.

Starting point is 00:33:44 You break up memory into a series of segments. You can mark segments as code or data segments. Code segments are only meant to hold code, and therefore, treated as read-only. Data segments are meant for data. They are read-write. A location in memory is also tagged with a privilege level. If a program tries to break segmentation rules, tries to write out of its segment,

Starting point is 00:34:12 or tries to write into the wrong privilege level, then that's a segmentation fault. Execution stops, and the computer starts running a special error handler that, in theory, will deal with the incident. But what's all this about privilege level? Well, this is where we reach the Rings of Protection. And that's literally what they're called.

Starting point is 00:34:39 Rings of Protection! I uh, I don't know what to say. Computer people are just kinda nerds. On the 286, all of these low-level memory features are controlled via these data structures. They're specific tables and specific data formats that the chip understands and can read. It then uses those tables to configure things like segmentation. All details of a segment are stored in this thing called the segment byte, which is one

Starting point is 00:35:14 of these special structures. For tasking, the 286 uses this thing called the task state segment. It holds information about the current state of a task, including what segments it can access and the task's privilege level. These levels are sometimes called, as I said, rings or rings of protection. Okay, I'll stop. You get the idea. The most powerful ring of all is ring 0. Then it goes down from there. Ring 1, ring 2, and ring 3 for a total of 4 rings of protection. Ring 0 is meant for the operating system itself.

Starting point is 00:36:00 A program can only access memory segments that are assigned to higher ring numbers. Thus, a user program is relegated to ring 3 and is unable to touch the operating system over in ring 0. This goes hand in hand with stating which segments a program can access. Intel intended ring 0 to always be for the operating system itself, Ring 3 for programs, and Rings 1 and 2 for different types of drivers. However, in practice, only Rings 0 and 3 are ever really used. One reason for this is another neat feature, privileged instructions. this is another neat feature, privileged instructions. There are certain instructions that only privileged rings can execute. Those are things like the HALT instruction, which stops the computer.

Starting point is 00:36:57 But the issue here is that Ring 3 is the only unprivileged ring. So the difference between rings 0, 1, and 2 all come down to memory access. That's how the 286 handles protection. So to check out Lanz's claim, we must ask, how does this compare to Multics? Well, it's the same. It's basically identical. Multics breaks memory into segments. Each segment has an associated privilege level called a ring of protection. A task's ring is determined by the ring level of the segment that it lives in. Rings start at zero and go up from there, and a program can only access rings that are at their own level or higher.

Starting point is 00:37:45 Violating access rules leads to a fault, a segmentation fault. Settings and mappings for these segments are even stored in little formatted bytes, just like on the 286. The canonical paper that describes this is Robert Graham's protection in a information processing utility. If you open up that paper and then hold it side by side with Intel's introduction to the IAPX 286, you'll see two identical diagrams. Each of these papers has a diagram of the rings of protection. The diagram in Graham's paper is hand-drawn, and the one

Starting point is 00:38:27 by Intel has little annotations. But they're the same diagram. So yeah, it's fair to say the 286 appears to be heavily influenced by Multics. That's something that I really didn't expect to be the case. And it sounds kind of weird, right? It's almost like Intel was designing a chip to run Multics, an operating system that in this period no one was really using. But this isn't just a superficial thing. In fact, Intel admits to the influence. In a 1983 paper, Jack Klippanoff, who is also someone who worked on the 286

Starting point is 00:39:08 project, gives this neat detail, quote, One quickly comes to the conclusion that a traditional two-level user supervisor protection architecture is not adequate for supporting secure, high-performing systems, instead a Multics-like ring architecture was chosen for the IAPX286. It even cites a paper on the implementation of security and segmentation in Multics. His argument is that to support large software systems, you need some protection mechanism and you have to have a relatively fine-grained one. The solution is just to pull from work already done on the Multics project. Is there a direct connection here? I'm not entirely sure.

Starting point is 00:39:56 The development of Multics is very well documented. We have a list of everyone who worked on the Multics project, and I have a paltry list I've compiled of people that may have been on the 286 project. The two lists don't overlap, but there are probably some Intel employees I'm missing. It wouldn't surprise me if someone sends me an email saying, hey, I worked on both projects. All right, so we have this new chip that's taking cues from an older system. But what makes that bad? A lot of projects are influenced by what's come before.

Starting point is 00:40:34 That's how we can make progress. What's the issue here? Well, it comes down in part to implementation. Again, that's the boogeyman this episode. Multics is a software system. So there's a lot of flexibility. It's meant to run on many different kinds of hardware and specifically smooth over hardware features

Starting point is 00:40:57 to provide a uniform and reasonable interface for the programmer. In Multics, a segment can be any size. On the hardware level, there are some caveats and there are many ways that hardware implements segmentation, but Multics hides that. The 286 is hardware. That means it has a certain inbuilt rigidity. This is the level where we reach rough spots. Segments on the 286 are limited to 65 kilobytes. When you set up a task, you're working with, at any one time, four segments.

Starting point is 00:41:37 Only one of those can be for code, so right away you have a hard limit on the complexity of a program. That kinda sucks. But we must remember, all these cool memory features only come into play in protected mode. Remember that thing? So far, everything I've described is pretty far outside what the 8086 was capable of. How then, does the 286 maintain compatibility with this earlier chip? This is where we see the poison of legacy creep in. The 286 had to be able to run 8086 code. It was decided that the 286 would initially load

Starting point is 00:42:19 into a compatibility mode. When you apply power to the chip, it starts off by acting like the older processor. To use the MMU and all the protection features therein, you had to switch into protected mode. Now officially this is called protected virtual address mode, but everyone just calls it protected mode. And retroactively this bootup mode has been called real mode. So why have the switch at all? Why not just have the 286 boot up with protection ready? That's one of those nasty questions that's hard to answer. I haven't found an interview or a paper that gives an explicit reason,

Starting point is 00:43:02 but you know me, I can rationalize all day. The way I see it, it all comes down to memory. That is the heart of a computer, after all. And it's the 286's big change over its predecessor. Basically, introducing all these changes in how the chip handles memory would have made the 8086 programs that existed incompatible. By having these two modes it means that the 286 can be 100% backwards compatible and then you can flip it over to have a more advanced and better design. So what exactly makes protected mode incompatible? Well let's take interrupts as an example.

Starting point is 00:43:45 What is an interrupt, you may ask? Well, welcome back to Computer Science Corner with Sean. An interrupt is a special type of routine that the processor knows about and has a special way of handling. On the 8086, they all have a number, and when they're triggered, the processor stops what it's doing, executes the interrupt handler routine, and then returns back to normal operation. An example is for keyboard input. On the PC, when you press a key, that triggers interrupt 9. The handling routines for interrupts are defined by a table, so to write some code to handle

Starting point is 00:44:23 keyboard input, you'd set the 9th entry in the interrupt table to point to your keyboard handling program. Then every key press would execute the handler. On the 8086, this table is very simple. It's a list of addresses. That works, but it's very limited in what it can do. The 286 in protected mode has a much more complex interrupt table. It's significantly more flexible. It can manage different types of interrupts, tasks, and even exceptions. It would have been difficult to just tack that on to the existing 8086 environment. just tack that onto the existing 8086 environment. So it makes more sense to have a separate mode with these separate tables and separate data structures. Interrupt handling is just one of many

Starting point is 00:45:13 of these major changes. This is, in theory, a reasonable solution. The 286 can be set to act like a fast 8086, if that's all you need. That's the compatibility side of the equation. Its real power is when you switch to the new mode, to protected mode. But that's optional. And even then, it's still a very similar chip to Intel's old stuff. You can still run a lot of your 8086 code in protected mode.

Starting point is 00:45:43 It's just that the memory is handled differently. But this leads to a strange problem. It's the true curse of legacy. To get the most out of your new chip, you have to switch to protected mode. But what if you just never did that? Everything gets much more complicated once the IBM PC hits the market. The PC is launched in 1981, and by 1986, clone sales overshoot the volume of original IBM hardware. I'd say that's the year where computing became a PC kind of world. Somewhere in the middle, the IBM AT is released, which rocks the newer 286.

Starting point is 00:46:33 There was a lot of very interesting development around different operating systems for the PC in this period. However, that's on the fringes. Very few people are running Xenix or Coherent or MPM86 or UCSD Pascal or Idris. The PC is a DOS system, plain and simple. When you get a PC, it's either running PC DOS or MS DOS, which are both actually the same thing. This is all backed up by a little something called the BIOS, the Basic Input Output System.

Starting point is 00:47:12 It's this chunk of code, basically a library, that's bundled into every PC. Crucially, the BIOS has nothing to do with Intel. This is not part of the 8086 in any way, shape or form. This is something that comes from IBM, is written by IBM, and is put into the PC when it's designed and built. The BIOS gets wrapped up with Intel as the PC becomes a bigger platform. Pretty quickly there are companies that make 100% compatible versions of the BIOS written in-house.

Starting point is 00:47:49 That's what ends up enabling clone manufacturers to make IBM compatible PCs. It's a mix of the 8086 CPU and this chunk of code called the BIOS. The BIOS provides handy functions for getting data from the keyboard, outputting to the screen, reading from the disk drive, playing sound, printing, really anything a programmer could ever want to do with a PC. And it's actually pretty easy to use as far as low-level hardware goes. The BIOS are implemented as interrupts. This makes use of a nice little feature where on the 8086, you can trigger an interrupt using software. So you can call up an interrupt just like a function.

Starting point is 00:48:34 You just use a number, not a name. Interrupt 10h, that's h for hexadecimal, is responsible for all screen functions. To print to the screen on an IBM PC, you set up some parameters and then you trigger interrupt 10h. Many programs for the PC use BIOS for just about everything. DOS doesn't provide a lot of useful things to the programmer, but it does add one feature. It provides a few functions for string handling and file management. It's called the DOS API. These are also bound to an interrupt, in this case, Interrupt 21h. Again, in hexadecimal, that's kind of the convention on the PC. Most PC software is DOS software.

Starting point is 00:49:28 Early on there are a few exceptions, the notable ones being programs that boot up on their own without DOS ever loading. The version of Colossal Cave Adventure that came, well not bundled, but as a release title for the IBM PC was one such example. But hey, DOS doesn't really give you that much anyway. but as a release title for the IBM PC was one such example. But hey, DOS doesn't really give you that much anyway. What's really more important is that the BIOS is used heavily by software on the PC. DOS's library is used to a lesser extent, but it's still a factor.

Starting point is 00:50:10 This made compatibility on the PC a totally different discussion than compatibility just for the 8086. For a computer to be PC compatible, it needed to be able to execute 8086 machine code, and it needed to have the same BIOS interface and interrupt handling. There are a few more details, but that's the biggie. And what made this worse is all of a sudden, the 8086 is the PC chip. How does this impact the new IAPX286? Simply put, it kills it. One quirk of protected mode is that it uses a different mechanism for managing interrupts.

Starting point is 00:50:49 To register an interrupt, you have to use a different table structure and it's located in a different part of memory. That means that on a PC, well, on an IBM AT or 100% compatible, when you switch into protected mode, you can no longer use the BIOS and you can no longer use DOS's API. You can work around this.

Starting point is 00:51:12 The BIOS only provides convenient ways to do things. You can write to the screen without using the BIOS. Same goes for disk drives, printing, and talking to the keyboard. But that assumes you're the one writing the software. A lot of PC software relied on the BIOS. So while the 286 could run the code, it wouldn't do anything. It couldn't provide the same environment as the original PC. Thus, almost every PC program would fail if you switched into protected mode. Software

Starting point is 00:51:48 would have to be written specifically, at least on PC platforms, to make use of protected mode's features. There was also the matter of memory protection. You know, the big important part of the 286. the actual change in the system. The alternative to BIOS is to write directly to memory and interface with hardware directly. To put text on the PC's screen, for instance, there is a specific region in memory that you write data to. Under protected mode, users' programs are supposed to run in Ring 3, which has limited hardware and memory access. So, it's entirely possible that a program that didn't rely on BIOS would immediately cause a segmentation fault, because it would just start getting nasty with RAM, and the 286 does not let you get nasty with RAM. This is made even worse by the fact that once you're switched into protected mode, there's

Starting point is 00:52:48 no going back. The 286 has to be reset on the hardware level to revert to real mode. You can't just pop over to protected mode to do some nice memory-intensive gaming and then jump back to real mode to run Lotus 1-2-3. That did not work. The reason for this makes sense on paper. It's a security consideration. If you could initiate a software reversion to real mode, a rogue process could take over the entire system. That would defeat the purpose of protected mode completely. That theoretical

Starting point is 00:53:27 swap between real and protected modes would also get pretty sloppy, since you're going from virtually mapped memory to normal physical addresses. The memory space wouldn't necessarily even make sense to the processor anymore. It would later be discovered that you could break back into real mode, but it was done via a kind of slow and kind of gnarly hack. Notably, early versions of OS2 would use this trick. It wasn't intended behavior to say the least. The fact is, you can't think of protected mode as a superset of real mode. It's not like protected mode is just 8086 plus some other cool neat features. Once you switch modes, you're on a pretty different platform. How memory is handled changes fundamentally.

Starting point is 00:54:17 And say it with me, memory is the heart of a computer. So the machine has a change of heart to deal with. This does, however, introduce some new possibilities. To make full use of the 286's power, you need to write new software for the 286. And that did happen. We've actually seen a number of operating systems specifically written with that chip in mind. The ones that come to mind are of course, We've actually seen a number of operating systems specifically written with that chip in mind.

Starting point is 00:54:46 The ones that come to mind are, of course, Unix related. Zenix, Microsoft's own Unix, gets a port to the 286. Coherent, a Unix clone, has a 286 version. There are some more out-there disk operating systems that use protected mode also. Plus, eventually we do get Windows 286. But the big problem is, DOS never takes advantage of the 286. It stays as a fully real mode operating system. x86 processors are the PC chip, and the PC is a DOS platform.

Starting point is 00:55:28 Now, if you read any of Intel's documentation it becomes pretty clear that they had a very specific vision for what kind of operating systems should be running on the 286. It's a it's Multics okay they want to run Multics on the chip. The chip isn't just inspired by Multics, but it really is built out in such a way that, I don't know, I feel like they wanted Multics or something very similar to run on the 286. By that, I mean they wanted a multitasking, multi-user operating system.

Starting point is 00:56:02 They wanted a system that separated processes into four privilege levels, and they wanted you to store task information and event information in very specific ways. It gets pretty detailed. There are even documents that explain how to implement an operating system on the 286. Was there some secret operating system inside Intel that this chip's actually designed for? Well, in fact, yes. Or at least maybe. There was an operating system inside Intel at the time. It was called RMX. It's a real mode operating system. It's multitasking, and I don't know a whole lot about it. RMX is one of

Starting point is 00:56:46 those things where we have a pile of manuals but not a lot of ancillary information. What I do know is RMX or IRMX as it's sometimes called was modular, object-oriented, multitasking, and its documents show a kind of suspiciously familiar ring-shaped diagram describing its systems. It actually sounds similar to S1 now that I think about it. But again, in practice, 286s were used for DOS, a single-tasking real-mode operating system that didn't have any concept of the chip's new features. We

Starting point is 00:57:25 can easily say why DOS was dominant here. DOS was the first default operating system for the PC. As the platform spread, the credo became compatibility. New PCs were sold as 100% IBM compatibles. That means they ran DOS. Once a substantial library of software was written for DOS, its market share was kind of assured. It's one of those feedback loops. Users use DOS because of all the software and hardware support. Programmers program for DOS because of all the users. The larger question here is why didn't DOS take advantage of the 286's new features?

Starting point is 00:58:08 That should be a slam dunk, right? We've already talked about all the issues with incompatibility, but you should be able to rectify those with software, right? Well, maybe not. I'm kind of of two minds here. I've been sitting with this for a while and I've spent a lot of time programming for DOS and programming the PC at pretty low levels and I'm a little vexed

Starting point is 00:58:34 The crux will be getting real mode programs to run in protected mode, right? In theory you could do this with some operating system tricks But DOS was never really an operating system. DOS provides so little to the programmer. I know when I write DOS, I only use its API to maybe read files and then to exit the program at the end. For everything else, I either go to the bios or just access hardware directly.

Starting point is 00:59:05 The bios thing you can solve. You can make new entries for each interrupt that would run in protected mode, do the whole new interrupt table and everything. But hardware access, I think that's the real issue here. There were some operating systems that could supposedly pull this trick off. But it seems those only worked for some programs and they had strange issues. Take OS2 as the example here. That operating system used a hack to drop back to real mode so it could run DOS programs by just running DOS. When you're in a situation where you need to hack your way into a feature, well, that's a bad situation.

Starting point is 00:59:48 I think this may be an indication that it wasn't actually feasible to run DOS programs in protected mode. But what if we look away from the world of the PC? Well, things don't look much better. The simple fact is that the 286 was kind of viewed as a hobbled chip. It was constrained by Intel's legacy. This becomes crystal clear when we look at the rest of the CPU market. Zilog didn't actually end up being the big baddie for Intel.

Starting point is 01:00:18 Instead, it was Motorola and their 68,000 series of chips. I'm gonna pull in a strange source, but I think it paints the picture pretty well. And I'm pretty sure this is sponsored content from an old magazine, but whatever, it gives us something to talk about. In 1985, Byte ran an interview with the president of Stride Micro, a company that manufactured workstations. They actually made pretty interesting machines, of Stride Micro, a company that manufactured workstations. They actually made pretty interesting machines,

Starting point is 01:00:47 but maybe that's a topic for another episode. One of these interviews asked the simple question, quote, when designing and building the Stride 400 micro computers, why did you select the MC68000 Motorola processor over the new Intel IAPX286? That's a super important question here. If you were planning a new computer in this time period, and the 286 was an option, why

Starting point is 01:01:16 should you choose that chip? What was Intel actually selling here? Stride ends up going with the 68000 for a number of reasons. Here are a few excerpts from those reasons. Quote, The 68000 is at least one generation ahead of the 286 in terms of microprocessor design. He continues, In my view, the 286 was so steeped in its own history that the architecture suffered critically. In reality, today's 286 is little more than an 8086 with a memory-managed unit tacked

Starting point is 01:01:53 on. He continues again, The 8086 design was based on the 8080, which was an extension to the world's first 8-bit processor, the 8008. Strange as it may seem, the brand new 286 has, as a subset, the registers from a processor designed back in 1972! Intel's motive was compatibility with current software. Motorola simply wanted to build the best chip possible." I think this is the curse laid bare.

Starting point is 01:02:28 Motorola's chip, and many other chips for that matter, weren't built for compatibility. They could be new. Motorola could be groundbreaking. It could have new features without drawbacks. Intel chose perhaps a less risky path. This was a stopgap after all. By maintaining compatibility, you keep courting the same users. But that meant the 286 had to work within designs from the early 70s. The other portent of doom

Starting point is 01:03:00 for the 286 was Intel themselves. In 1985, Intel launches the 8386, the next generation of the x86 architecture. Let's just peek the timeline here, shall we? In 84, the IBM AT launches, which is the first major computer to use the 286. In 1986, the Compaq DeskPro 386 is on the market. You can make a number of arguments here, but the way I see it, we get two years where the 286 is Intel's top-of-the-line PC chip. It's quickly replaced by the 386. It's just not top of the line for long. We can even see this in much

Starting point is 01:03:47 later sources. After all, most of the interviews and oral histories that mention the 286 only do so in passing. It's part of the story but really kind of a stepping stone. It's the follow-up for the 8086 and the setup for the 386. Alright that does it for our episode on the IAPX286 or the 286. For all its importance, I've found it's a strangely overlooked chip. There just isn't all that much out there on its development history, unlike its two closer relatives. As such, this is one of those stories that might change in the future if more information comes to light. What we do have is an echo of the 8086's story. This is

Starting point is 01:04:40 in a period where Intel is still betting the farm on their 432. That was going to be their new and groundbreaking chip. It was going to be the future. It was going to break the curse. As such, they just need a little something to stay in the market. Something to stop the gap. And they end up having to make multiple stop gaps. The 286 carried with it the technical legacy and the strategic legacy of the 8086. It had to be

Starting point is 01:05:13 compatible with older software. It had to carry the torch. Intel's strategy of compatibility was pretty hard set because, well, the IAPX432 was gonna fix all of that. The curse would be lifted. They can make one one more compatible processor and then it's off to the future. Then we're safe. But even within that there was still room for innovation. The 286 introduced memory management, something that would make the x86 family of chips much more capable. But in doing so, it also introduced a new mode of operation, a new layer of complexity. This is another spot where we can see issues beginning to compound. Each compatible processor has to be compatible with what came before. So as the IAPX432 fails and the x86 family becomes ascendant, things get wacky.

Starting point is 01:06:11 Every change made along the way has to be enshrined. That's how we end up with processors getting etched in 2025 that still support real mode. Thanks for listening to Admin of Computing. mode.

Advent of Computing - Episode 159 - The Intel 286: A Legacy Trap

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.