The Science of Everything Podcast - Episode 95: How Computers Work Part V - Assembly Language and the Operating System

Episode Date: February 13, 2018

In this the fifth episode of the series 'how computers work', I begin with a summary of some of the major methods of improving the performance of the central processing unit, including pipelining, cac...he memory, branch prediction, and parallel processing. Following a brief introduction to assembly langague and its relationship to the machine code, I then discuss the operating system and how it interfaces with the hardware to manage program memory, system calls, input/output, and processes.

Transcript
Discussion (0)
Starting point is 00:00:34 You're listening to The Science of Everything podcast episode 95, How Computers Work Part 5, Assembly Language and the Operating System. I'm your host, James Fodor. So in this episode, we're going to look at how we build more complex programs, building upon the basis of simple instructions set up in the instruction set architecture that we talked about in the previous episode. So in doing so, we'll talk about the assembly language, which is a symbolic representation of machine code.
Starting point is 00:01:04 And I'll also talk about the operating system, which is just once before managing multiple programs running on the system at once. Also, I'll discuss how the operating system interfaces with the hardware to perform input and output, which is obviously a very important aspect of interacting with a computer. Before we get to that, though, I'm going to talk a little bit about some of the performance improvements that can be implemented to improve the performance of processor architectures that I discussed in the previous episode. So the recommended pre-listening for this episode is, unsurprisingly, the previous one, part four of our How Computers Work series. And again, strongly recommend that because a lot of what I talk about here
Starting point is 00:01:43 won't make sense without that background. So, that being said, let's begin and talk about some of the improvements that we can use to improve the performance of our processor architecture. So in the previous episode, in discussing processor architecture, I only talked about the absolute basics that you need to get a processor to work that's true and complete and capable of running very basic programs, and capable of running in principle any program, but in practice it would be so slow that it would run only fairly basic programs well.
Starting point is 00:02:11 So in this section, I want to talk about some of the improvements that have been made and alterations and additions to the process architectures over the decades in order to increase their performance. And specifically I'm going to talk about pipelining, cache memory, branch prediction, out-of-order execution, and parallel processing. So let's begin by talking about pipelining. Pipelining is an attempt to ensure that every part of the processor is kept busy with, by executing some instruction at any given time. The idea is to divide all of the incoming instructions up into a series of sequential steps, hence the name pipeline,
Starting point is 00:02:47 and ensure that different parts of the processor are performing some parts of a given instruction at a given time. So, pipelining is a form of parallel processing that occurs within the processer. a single processor at a given time. So one way of thinking about this is in terms of the fetch, decode, execute, store cycle that I talked about in the previous episode. If you design your processor correctly, you can construct it so that different components of the processor are used and only those parts of the processor are used in order to carry out one of those functions. So a single part of the processor is involved with fetching the next instruction, a different part of it is involved with decoding the current instruction,
Starting point is 00:03:29 part again is involved with executing the current instruction and so on. Obviously, that's only a simplification in practice. The pipeline is going to be more complicated than that, and the exact task that each stage in the pipeline performs may be less clearly defined, as long as it works in terms of the instruction set architecture. But the basic idea here is to ensure that we don't leave parts of the processor idle, because that's a waste. In the previous design that I discussed in the previous episode,
Starting point is 00:03:57 The entire processor is dedicated to performing one task at one time, and that means that usually only some parts of it are going to be active or operational at a given time, say only the parts involved with loading a single register onto the bus, for example, or only the parts involved with loading the next instruction from the memory into the instruction register. The rest of it is all idle. The idea of pipelining is to ensure that all parts of the processor, or at least most parts of the processor, are active at once by dividing up each instruction into even smaller stages, which can be carried out in parallel. So what you can do is that the processor can be loading one instruction at the same time as it's currently executing a different instruction, and at the same time as it's storing the results of yet a different instruction. So different parts of the processor are actually processing different instructions in parallel with each other. And in this way, we can keep all parts of the processor active and get more processing done in the same amount of time. An obvious problem occurs here when one instruction is dependent upon the results of a previous
Starting point is 00:05:00 instruction that hasn't been yet completed during the pipelining. So if I have a conditional branch, for example, and I try and execute that, before I've actually processed the comparison that I need to perform in order to decide whether to take that branch or not, then obviously that's not going to work. And that can lead to what's called bubbles in the pipeline where essentially one part of the processing has to stall in order to wait for the results of some previous. operation that hasn't been finished yet. So that can slow it down. And it also requires extra circuitry obviously to set up a pipeline. Nevertheless, pipeline can lead
Starting point is 00:05:33 to significant improvements in process of performance. The simple example that I gave in terms of separating out the fetch, decode, execute and store segments would only give a four-stage pipeline. But high-end processes these days may have 15 or 20 stages in a pipeline, so they're very highly pipelined, which means that they can perform many instructions in parallel just by ensuring that different bits of the instruction are performed by different parts of the processor. So the next performance improvement that we're going to talk about is called cache memory or caching.
Starting point is 00:06:03 Now, to understand the benefit of caching, we first need to understand the difference between the implementation of main memory and registers that I talked about. So in the previous couple of episodes, I've discussed registers as being comprised of individual flip-flops, each of which stores a single bit, which in turn are implemented using transistors. and a single register is a bunch of flip-flops connected together that stores one word, and I said that main memory is made up of a bunch of transistors in a big long array. Now, that is not quite true, because the main memory is usually not implemented using transistors in this way. The type of memory that I described as being implemented by flip-flops is called S-RAM.
Starting point is 00:06:45 It uses flip-flops based on transistors. It is the basis for registers in the processor. So registers in the processor do work that way. The main memory by contrast is usually implemented using D-RAM, where D stands for the dynamic. DRAM stores each bit as charge on a tiny capacitor. The big advantage of DRAM is that the memory cells are structurally very simple, even simpler than S-RAM cells, which are reasonably simple. So a single DRAM memory cell only needs one transistor and one capacitor to store a single bit.
Starting point is 00:07:17 The bit is stored on the capacitor, essentially, whether the charge store there is high or low, and the transistor controls access to it. So if I want to read from or write to that capacitor, I just have to turn on that switch, and if I want to not read from that capacitor, the switch stays off. Now that contrasts with S-Ram, which requires four to six transistors per bit. DRAM, dynamic RAM, the type that's used in main memory, is structurally simpler and therefore cheaper and higher density than S-Ram. RAM. The reason it's called dynamic RAM is because the charge slowly leaks off of the
Starting point is 00:07:53 capacitors, meaning that they have to be regularly refreshed, which is what the D stands for, the dynamic need for dynamic refreshing of the data. However, because it only relies on transistors and not capacitors, S-RAM is faster than DRAM. So the type of RAM using registers is more expensive but faster, whereas the type of RAM used in main memory, that is DRAM, is cheaper but slower. Note that both SRAM and DRAM, so both these types of RAM that I've just been talking about, are volatile. That means that they require a continual source of power in order to retain their information. Flash memory, which is the type of memory that's used on USB memory flash drives and also solid-state hard drives. That's different again from either SRAM or DRAM.
Starting point is 00:08:35 It's a non-volatile type of memory that does not require a continual power source. And it has its own pros and cons, but I'm not going to really talk about that further here. So, having that background in the types of RAM, we can now understand the benefit of cacheing. A cache is a small segment of memory located on or near the processor chip, which can store recently used or anticipated data, which is accessible faster than is possible by accessing it on DRAM in the main memory. The reason that it's faster is because of two reasons. Firstly, SRAM is intrinsically faster than DRAM,
Starting point is 00:09:08 because it only involves transistors rather than a capacitor, which is accessed more slowly. The second reason that the caches are faster is because they're located closer to the processor, or sometimes on the processor chip itself, as opposed to some distance away in main memory. This means signals can be sent and received faster. A third advantage of caching is that given region of cache memory will often have its own special method of access or data transfer to the processor, or to other registers in the processor. It won't have to go through the main memory bus, which is subject to botany.
Starting point is 00:09:43 bus bottleneck, which essentially means that if you're trying to send lots of information through the bus at a given time, some of the information has to wait for the previous information to be transferred before it, and so there can be a bottleneck to the amount that you can transfer in a given period of time if you're trying to transfer a lot of data. The cache is not subject to that because it uses its own transfer lines. Modern processes usually have at least three levels of cache. So they call it level 1, 2 and 3, and each is progressively closer to the main processor, and therefore quicker to access and faster, but smaller in capacity. In order to ensure that the caches are kept stored with data
Starting point is 00:10:23 that's going to be relevant to the current program that's running, there are sophisticated both hardware and software algorithms that are implemented to try to ensure that the correct data is loaded onto the caches. The idea is to ensure that as much as possible of the relevant data that the program is going to need to load from memory is already loaded onto the caches before the program asks for it so that when the program needs it, it can just load it from the caches and thereby get the data more quickly than it would if it had to go all the way over to main memory to get it and rely on the slower d-ram
Starting point is 00:10:54 instead of the faster s-ram of the cache. If the processor tries but fails to find the data it needs in one of the caches, this is referred to as a cache miss, and it means that the processor has to go and look in main memory and maybe even a hard drive in order to find the data that it needs. So the real key here is being able to implement algorithms which are accurate at essentially guessing what type of information the program is going to need. Obviously, one simple way of doing that is just to use
Starting point is 00:11:25 to keep information that's recently been used in the cases because the processor might need it again. But there are much more sophisticated things that can be done as well, like keeping track of what the same program used last time it was run and loading all that information, for example. But we won't go into the details of how those algorithms work because it's a bit beyond the scope of what we're looking at here. But just suffice it to say that providing these cases can significantly speed up memory access times and thus help the program to run faster. The next performance improvement that we're going to look at is called branch prediction.
Starting point is 00:11:53 So just before I talked about pipelining, the process of dividing a single instruction up into small sections, each of which can be executed by a different part of the processor. But I also mentioned a potential problem with pipeline. in that conditional branches cannot be executed until the conditional part of it has been computed as to which way it goes. So if I try to execute a conditional branch before that condition has been computed, I won't be able to do that. And also any code that comes afterwards, I won't be able to execute either because I don't know which way the branch is going. I don't know whether I should execute the code in place A or place B, because I haven't decided yet which way I'm going to go in the branch. One performance improvement that has been implemented to try to get around this problem is called branch prediction.
Starting point is 00:12:40 Basically, the processor tries to guess which way the branch is going to go and just starts executing that code as if the branch has already been computed. If the guess is correct, then all is well and good. It just keeps computing the instructions along the branch that it's been doing as it had been, and you get an increase in performance because you were able to start performing the next instructions down that branch before you actually figured out that that's definitely what you needed. So correct branch predictions are able to increase performance. But what about if your prediction is incorrect? What about if it guessed wrong and you start executing the wrong instructions?
Starting point is 00:13:13 Well, then the processor has to have a mechanism for halting what it's doing, reloading the correct instructions and starting to execute those, and also undoing all of the instructions that it had executed in the meantime before it determined that its guess was wrong. So obviously that takes time and slows down the speed of the process, So the key is whether you can get branch prediction to be accurate enough so that the number of wrong guesses is not so large that this wasted time in undoing instructions outweighs the benefit of increased performance when the branch prediction is correct. If you can get branch prediction to be accurate most of the time, then having to undo instructions only occasionally will not be too much of a disadvantage, and most of the time you'll be getting the benefit of the increased performance with the correct branch predictions. So that's all well and good, but how can we just guess what the correct branch to take is?
Starting point is 00:14:06 I mean, how can you know that? The whole point of a program is that you compute the answer to some result that you need, you know, two variables and compare, which is larger, for example, and branch on the basis with that. There's supposed to be some process for determining that. How can you just sort of guess which way to go? That doesn't seem to make very much sense. The answer is that branch predictions are difficult to do, and so branch predictors in modern processes are very complicated,
Starting point is 00:14:31 and often have their own dedicated hardware and or software that try to get the most accurate predictions as possible and using very sophisticated algorithms, and we're not going to go into all of the details of that. But I would just give you a few hints as to the sorts of techniques that can be used to do this. One technique is to simply keep track of all of the instructions that the program, the current program, has performed in the past
Starting point is 00:14:57 and see which way they branched previously. So I just keep track of every conditional branch that this program has taken, and in one place I keep the address, for example, of where the instruction was in memory, and then next to that I keep a record of whether the branch was taken or not taken. I could even keep a record not just of the previous instance, but of multiple instances, and see where they all went, and on that basis I can predict whether it's likely to be taken or not taken. on the assumption that the program's probably going to behave similarly this time as it was last time that it was run. And that usually works pretty well. For example, very common in programming is to use control loops that I've talked about in previous episodes. And generally those are iterated many times. I mean, that's the whole point of a loop, right? You go through it many times before it's exited.
Starting point is 00:15:44 So if I want to repeat a loop a hundred times and then only exit at the very end of that loop, that means that some conditional branch that determines whether I exit the loop or not is going to choose one way, that is going to not exit the loop for 100 times, and only on the 101st time does it finally exit the loop? So a branch prediction that simply always kept predicting that we're not going to exit the loop and we're going to go back and execute the same code would be right essentially 99 times out of 100. So that would work pretty well.
Starting point is 00:16:17 There would be that one instruction at the very end where we'd take the wrong path and have to undo those instructions. But because we'd perform the 99 instructions before much quicker, that's probably going to be worth the price to pay. And that's only a very crude conditional branch prediction that just predicts that we've done, that we'll do the same thing that we've always have done in the past. The more sophisticated ways actually keep track of some abstract state that the program is in and predict that this such and such change in the program as a result of some instructions being executed changes the program state from A to B, for example, and that when the program's in state,
Starting point is 00:16:51 B, it's likely to choose branch option two, whereas when the program's in state A, it's likely to choose branch option one. And so on the basis of that, it can determine whether it's likely, what the likely branch will be. So you can get more and more complicated in terms of the branch prediction algorithms that you're implemented, try and be more and more accurate to predict which way the branching is going to go. And again, the whole point of that is so that we can correctly predict the results of conditional branching operations, and so that we can just start executing the next instructions before we need them and thus accelerate the rate at which the program is being executed in the computer. So branch prediction is a very useful technique for being able
Starting point is 00:17:31 to increase the performance. Now the next performance improvement technique that I'm going to discuss is called out-of-order execution. Now if you thought that branch prediction sounded a bit odd, out-of-order execution must sound downright silly because how could it make any sense to execute a computer program out of order. The whole point of a computer program is that you carry out a set of instructions in the right order to achieve the desired results. How can you just decide to do them in a different order? Well, it turns out that you can actually do that in many cases, and the reason you would often want to is because when the program needs to access memory or input-output devices, that often takes a very long time. And in that time, it doesn't make sense for the
Starting point is 00:18:13 processes to be sitting around doing nothing. It should continue to execute whatever operations that it can while it's waiting for the results to come back. So just to illustrate how important this is, process of clock cycles, which remember I've talked about, generated by the oscillating crystal that then sends out the series of square waves that drives changes in the internal state of the computer, including loading the next instruction
Starting point is 00:18:39 and bringing about the control bits that ensure that each instruction is carried out. Computer clock cycles these days roughly around a nanosecond in length. So there's a few billion clock ticks in every second. Input output operations, on the other hand, might take microseconds or even milliseconds, depending on the device in question.
Starting point is 00:19:00 Hard drives, for example, are particularly slow, especially if they have to actually sit, the head has to move to a different part of the disk to read that area. So this would be the equivalent of, if the processor would just sit around doing nothing for the entirety of the I.O. operation, that would mean it was sitting around
Starting point is 00:19:16 for thousands or even millions of clock cycles doing nothing. That's like a person sitting around for days or weeks doing nothing while they're waiting for the microwave to finish or something like that. It's obviously absurd. It's just as absurd to keep the processor waiting all that time and not doing anything when it could be using all those clock ticks to do something useful. So instead of waiting around for all of those cycles, the CPU can be made to do something useful and start executing instructions that it wouldn't need later on. A very common thing to do is simply to load a different program into the processor and execute that for a while until the other one has its I.O. Come back. But you can use out-of-order instruction to execute additional instructions even in the
Starting point is 00:19:53 same program that the program will need later on. And once again, of course, there are going to be sophisticated algorithms and hardware implementations to ensure that that's done in a sensible way and that you can put the pieces back together after the results come back. But nevertheless, it's an important technique for being able to increase performance. The final performance that I want to discuss these parallel processing. So traditional computer architectures have a single processor that executes all the instructions in a linear fashion, one at a time, one after the other.
Starting point is 00:20:23 I've already talked about pipelining as a simple method of parallel processing in which multiple structures are executed at the same time in a single processor. However, these days, when people talk about parallel processing, they mean it taken to the next step, where you actually have multiple processes or what are called cores inside the same machine.
Starting point is 00:20:42 So most processes these days on desktop will be multi-core processes, which means that on a single chip, you will actually have essentially multiple CPUs. So each of them will have its own ALU, Arithmetic Logical Unit, it will have its own program counter and instruction register and set of registers and its own main bus and all of those other key components of the processor. And there will be a bunch of those on the single chip. Usually each core will have some of its own cache memory, and then there'll also be some shared case memory that all of the cores share amongst each other.
Starting point is 00:21:16 And they'll probably also share things like a memory controller and input-output circuitry. So the idea of this type of parallel processing and multi-core processing is that some of the circuitry is shared between the cores and some of it is unique to each core so that they can carry out their own processes largely separately from each other. These days most laptops and desktops have somewhere between two, four, or up to eight physical cores on each dye, is essentially the word for like the physical silicon chip on which all of the circuitry is inscribed. So really modern computers run multiple processes and not
Starting point is 00:21:51 just a single one. Parallel processing in this fashion is much more difficult to write programs for than traditional linear processing. And how much of a benefit it actually give depends on the type of task to be performed. Some types of programs lend themselves well to lots of parallel processing, whereas other types of programs are inherently linear and don't. So you can't just assume that But if you add two extra processes, say you start with a machine with two cores, you add two more cores. That doesn't mean you're just going to double your performance on everything.
Starting point is 00:22:20 Some processes you may be able to do that, but many will only have a small or somewhere in between performance improvement, depending on the type of code and how well that code has been optimized for parallel processing. Another thing you can do with multiple processes though is simply to run different programs on each of the core. So that's obviously very useful if you're multitasking, as most people do these days, especially on desktops and laptops, run multiple programs at once. So you can have one program running on one core and a different one running on a different core. And there will be complicated
Starting point is 00:22:51 hardware and software mechanisms to determine how much processing task is done by one core and how much by another core and how to share out the jobs appropriately between them. But obviously having that extra capacity is going to help improve performance at least to some degree, but subject to the constraint that I mentioned that it does depend on how the software is written and what type of problem you're trying to solve. So that concludes all of the different performance improvements to the micro-architecture of the processor that I wanted to discuss. And at this point then, we're at the stage where we've explained the basics of how a processor can be put together from underlying logic components so that it can execute the instructions given in a particular instruction set architecture.
Starting point is 00:23:32 And we can therefore write programs in machine code. That's each line of machine code is just a series of bits of zeros and ones. that corresponds to a single instruction in the instruction set architecture. A big long list of these lines of machine code corresponds to a programming machine code that we can then load into memory and get our processor to start executing. That's the stage we're at now. However, in order to really interact with our program in any way with this setup,
Starting point is 00:24:02 we have to write the program directly in machine code as zeros and ones and load that into memory that the processor can execute. That's extremely cumbersome. early days people did use to code in machine code, but it's basically not done anymore because it's extremely cumbersome. You have to look up all of the exact sequences of zeros and ones that you need to specify op codes and memory addresses and things like that. It just takes forever. To make it easier, an abstraction known as assembly language was developed. So assembly language is basically a symbolic form of machine language. Each instruction or each line in an assembly
Starting point is 00:24:36 language program corresponds to a single machine instruction. So there's a one-to-one correspondence line-to-line. from assembly program to machine code. Each instruction set architecture will have its own assembly language, just as each instruction set architecture will have its own machine code, and the assembly language should correspond directly to the machine code in that given instruction set architecture. There are of course many commonalities across instruction sets with regard to how the assembly language works, but they will be different depending on exactly what instructions are in that given ISA. So in an assembly program, each line corresponds to one machine instruction, as I said before.
Starting point is 00:25:09 However, instead of each line or each instruction in the assembly program consisting of zeros and ones, as it does in the machine code, in the assembly program each line consists of a few symbols. So each op code in the instruction set architecture gets its own symbolic form. For example, a move instruction might get the symbolic form MOV, MOV for MOV, or an addition instruction might get the ADD for ADD for ADD or jump or JUMP or something like that. It just depends on the assembly language. So you'll give a symbolic name to the op code, and you'll also give symbolic names to the operands, which could refer to registers. So, for example, if I wanted to load something to the program counter register, I might use a symbolic name PC for the program counter or IR for the instruction register,
Starting point is 00:25:54 or just register A, or something like that. Also, I can use symbolic variables, which must be defined elsewhere in the program, which refer to a set location in memory. So the big advantage of programming in assembler, is that I don't have to memorize bit sequences for op codes. I can use the symbolic shortcuts, which are much easy to remember, like MOV and AD. But the other advantage is that I can use symbolic variables to hold a value that's computed during the program. So in machine code, you can't do this.
Starting point is 00:26:22 You can only specify a register or a memory location or a constant value, like a set number or string, because the machine code has to know exactly where it's getting the data from, or exactly what the data is. I can't define symbolic variables there. There's no way of the machine understanding that. But in assembly language, I can do that. I can just define the variable I, for example. And when the assembler goes through and puts the program together, it will just assign that variable. It will notice that that's a variable that it doesn't know,
Starting point is 00:26:49 and assign that a particular space in memory. And then the memory handling is done by the assembler. The assembler is the program that converts the assembly program into machine code. The memory handling of where the variable is stored is handled automatically by the assembler. So I don't even have to worry about exactly what the member. memory address of the given variable is. The assembler does that for me. Because usually when I'm programming, I don't care exactly what the memory address of a given variable is.
Starting point is 00:27:13 I just need to keep the given information somewhere. So that's a huge advantage of assembler programming over machine language programming. In addition to machine level instructions, assembly programs also include what are called pseudo-instructions. So these are instructions to the assembler rather than to the processor. Pseudo-instructions are not actually included in the final machine code. They're taken out because they don't correspond to any of the instructions in the instructions in architecture. However, they are useful for telling the assembler what to do as it's producing the final machine code. So, for example, you can use pseudo-instructions to define symbolic variables, as I mentioned before.
Starting point is 00:27:48 You can also use pseudo-instructions to start a macro, which is kind of like a function, a bit of code that you want to reuse over and over again, to make comments in your code, so you can help understand what it means, and also to include code from other files. So a macro is, as I mentioned, simply a bit of code that you want to reuse. So you write a bit of code in assembly, you give it a name and define it as a macro, and then you can just write the name of the macro and use a pseudo-instruction to include it somewhere else in your program, so you don't have to type out the code every time you want to use that code over again.
Starting point is 00:28:19 So pseudo-instructions don't get included in the final machine code, but tell the assembler what to do in producing that machine code to include a macro here or to delete this comment over there, because obviously comments aren't put into the final code, they're just for human use, or to include code from a file over there, or don't. defined a variable, that sort of thing. Now, the fact that we use symbols in assembly programs means that the assembly process has to occur in two stages. In the first passover the program, a table of symbols is built up, so that all variables and macro definitions and everything else is defined in a table that is stored in the memory that the assembler can access. Now,
Starting point is 00:28:57 then during the second passover of the program, the machine code is actually generated using all of the values from the symbol table. So, for example, suppose I use a variable I, a bunch of times in my program, in my assembly program. And then I have a pseudo-instruction saying that telling the assembler to define I as a variable and initially set its value to 1, say. So the assembler will see that and then put that on its table of symbols. And then when it's generating the machine code, every time it sees that variable I, it replaces it with the address in memory that it's generated for that particular variable. If I want to... to include code from separate assembly files, separate programs that are written in different files, into the one program,
Starting point is 00:29:42 I can do that using a program called a linker, which may be incorporated into an assembler program. And this just combines multiple assembly programs into a single binary executable file of machine code. A binary executable file is just a file of machine code that consists of zeros and ones that can directly be loaded into the instruction register of the processor and executed. It's the only type of file that the processor itself is directly able to understand. understand. Everything else must ultimately be converted into a binary executable file, that is, a machine code file, in order to be executed on the processor. But anyway, a linker is able to combine different assembly programs into the single binary executable, and really that's just a process of copying and pasting, but also there's a need to relocate addresses,
Starting point is 00:30:24 because if I have two different assembly programs written separately, they might be trying to assign memory to the same place. So one variable in memory location 10 is defined in one program, and a different variable in memory location 10 might be defined in the other program. The linker will separate those out automatically so that they're not trying to point to the same address, they're not trying to use the same address space for different purposes. So all of that sort of thing is handled by assemblers and linkers so that my final binary executable only has a set of consistent references
Starting point is 00:30:56 to absolute locations in either registers or main memory. And me as an assembly program, I don't actually have to worry for the most part about exactly where all that, where those memory addresses are. I just use the symbolic variables and the assemblers and link has taken care of all of that for me. So we've now reached the stage where I'm able to write a program that I can convert into machine code using my assembler, and I'm able to write that program in a bit more accessible way using symbolic variables and symbolic shortcuts for op codes and so forth, which is much easier to do than machine code. but there's still a big problem or big limitation with the design of the system that I've discussed up till now. And the three big limitations are as follows.
Starting point is 00:31:39 Firstly, memory and data addresses in our code are fixed at the time of assembly. I might not have to define what they are when I'm writing my program, but they still have to be defined and put into the code at the assembly time. That means that I can't just load things into memory as I would like. I have to make sure that I'm not going to overwrite a register value that's being used by my program. which the program expects to be able to use when it runs, because remember, the specific addresses must be hard-coded into the binary executable of that program. If I wanted to run multiple programs on that computer, I would have to keep track of which parts of memory each of each of them is trying to access. If I only wanted to run one program at a time, it wouldn't be a big issue, but trying to run multiple processes is going to be a big problem.
Starting point is 00:32:23 And that leads into the second limitation that the computer I've described can only execute one process or one program at a given time. time. In order to run multiple programs at the same time, I'm going to have to make some changes. The third big limitation, and it's probably the most significant, is that at the moment, the only way I can interact with a program, the only way I can tell my computer what to do is by writing the program and loading it into memory. After the program's been assembled and loaded into memory, the program will just run and it will perform its computations and do whatever it does, but there'll be no way for me to interact with it. I won't be able to send it instructions, to give it values or to change its execution in any way,
Starting point is 00:33:01 nor will I be able to read the results of the program in any way other than directly looking at the values from the memory. In order to interact with my program, I'm going to need some mechanism for input and output, which is called I-O. That's written I-slash-O, just short for input and output. So I-O is obviously a huge part of what we think about in terms of computers. When people talk about using a computer, what they actually mean
Starting point is 00:33:24 is giving instructions via the mouse and the keyboard or the touchscreen, that's giving input and then receiving output in the form of images on the monitor or sound from the speakers or whatever it else. Computers are useful, even if you can't do I.O. in that way, maybe the only type of I.O. you want is numbers that can be punched in a punch card, for example. That's what the other computers were used for, basically just number crunching that was outputed on punch cards, which is still a form of output, but a fairly crude form. These ads, however, we want to interact dynamically with our programs. We want to change what they do, during their execution and we want them to produce meaningful output in the process of their execution, not just at the very end once they finished running, which is again what was done in the early days, when a program was usually just used to perform a specific scientific or numerical calculation.
Starting point is 00:34:13 So in order to do that, we have to have a fairly sophisticated mechanism set up for dealing with I.O. and also for running multiple programs at once and managing the memory. So these tasks are largely the responsible of the operating system, and hence that's why I'm now going to spend the rest of this episode talking about the operating system and how it works. The operating system does interface directly with the hardware, so some of this will also include a discussion of particular hardware mechanisms that are responsible for implementing some of these as well. But the focus will be on the operating system and how it interacts with that hardware. So as I mentioned, the very earliest digital computers didn't have any operating system or any means for dynamic input and output. Input was just in the form of punch cards, which are cardboard sheets,
Starting point is 00:34:57 holes in them or not holes in them to represent binary numbers. A hole could be a 1 or a 0 and a lack of a hole would be the other one. And the output of the program would be delivered as punches in the punch card as well. That's very crude and limited the machine to a single process at a time. It was only in the 1960s that people began to write programs that address some of these issues and that formed the basis of the first operating systems. Today the key functions of memory management, process management and I.O. are all handled by the operating system in combination with specialized hardware that the operating system has access to. When most people think of an operating system, they think of the user interface and the set of tools and programs that are part of that user interface.
Starting point is 00:35:37 However, that's only a small part of what the operating system is and is able to do. That aspect of the operating system is actually just the graphical user interface. That's often abbreviated as a GUI or GUI. The very first computers didn't have user interfaces really of any sort. Later on they introduced command line interfaces and later on still, graphical user interfaces, which is what was mainly used by most consumers and users today. But the graphical user interface is only a small part of the operating system. In fact, the very first operating systems for personal computers, like MS DOS, if for all the users may have used that, were command line-based,
Starting point is 00:36:14 and in fact, interestingly, the earliest versions of Windows were largely just GUI shells that were built on top of DOS. So that includes Windows versions 1, 2, 3, right up to, I think 98, were all essentially gooey shells, increasingly sophisticated shells, that were built on top of Microsoft DOS, which was the actual operating system that ran the computer. These days, Microsoft Windows doesn't run on DOS anymore and it's its own integrated system, but it's interesting to understand the history of how that developed. So when we're talking about the operating system here, we're not primarily talking about the user interface side of it.
Starting point is 00:36:48 We're talking actually about the more underlying part of it, which is called the kernel. In fact, modern processes run in two modes, user mode and kernel mode. User mode is the standard mode, and that's the mode that the process runs when it's running regular applications. So, you know, most of the time your processor is in user mode. In user mode, the process is forbidden from accessing any memory outside that specifically assigned to that program that it's currently running. So there'll be a particular window of memory, or maybe multiple windows, that it's allowed to access. If it tries to send any addresses on the address bus outside of that memory, it will trigger an error, because it's not allowed to do that. In user mode, the process will also be forbidden from engaging in input and output operations.
Starting point is 00:37:34 That might sound a bit odd because surely user programs want to engage in input and output all the time, don't they? Well, that's true, but the design of dividing a process of functionality into user and kernel mode is to ensure that input and output and any additional memory accesses are only done in... kernel mode by the operating system and that if a program wants to engage, especially in I.O. operations it has to ask the operating system to go into kernel mode on its behalf. So this is an example of what I mean by the operating system as software working in combination with the hardware because the processor design actually incorporates hardware to physically distinguish user in kernel modes, like there'll be an enabled bit that flips the processor into kernel mode and certain operations that are part of the instructions in architecture like I.O. operations
Starting point is 00:38:21 for example, will only be able to perform when that particular bit is flipped and the processor is running in kernel mode. So that's the hardware side, but the software side is that the operating system manages when that occurs. The purpose of having these two separate modes, even though it involves a little bit of extra hardware and software processing to get things going, is to ensure that a single process cannot monopolize the system. Because, again, an operating system is useful when you want to run multiple programs on a system at the same time. If you only ever run a single program, then it doesn't really matter. And when I talk about program, I really should use a word process, because that's sort of more accurate. If you're using a Windows machine and press
Starting point is 00:38:59 control or delete, you'll, and go to Task Manager, if you're using a modern version of Windows, you'll probably come up with just a list of the active programs that you're running, so a web browser, for example, or Microsoft Word, or whatever you've got open. But if you click this little arrow that says more details, and wait a few seconds, you'll come up with this list that's much longer. And this is the... the list that I'm talking about of the actual processes that are currently running on the machine. Programs, basically user programs, like applications that you open, like a web browser, like a game, like Microsoft Word, something like that. Those are the things that initially appeared.
Starting point is 00:39:34 But those are not actually all of the processes that your machine is running. When you click on the more details of a little arrow and bring up that longer list, those are all of the processes that your machine is running. And my machine currently is running. I don't even know how many processes, maybe 100 or maybe not that many. I don't know, 50 or something. something, but way more than the just handful of applications that I'm actually running. So all of those processes need their own time on the CPU. We'll have their own machine code that's executed separately from the others. And so my processor has to allocate resources fairly between all of these separate processes,
Starting point is 00:40:08 and there are dozens of them. And your machine will have dozens of them as well. Even simple machines these days like smartphones, although not so simple these days, but compared to a desktop, they're at least smaller in capacity. will have dozens of processes running, doing all sorts of background tasks and running the OS and managing your internet connection and who knows what else,
Starting point is 00:40:26 that you don't even realize that are happening. So all of those dozens of processes need to be managed and need to have their own sections of memory allocated to them and the system resources shared equitably between them. And that's what the operating system does. That's one of the things operating system does. And so in order to ensure that that can be done properly, there needs to be a distinction made between
Starting point is 00:40:46 what the processor can do when it's just running, running a regular process and what it can do, what the operating system can do when it's trying to allocate memory and resources between those processes. So in order to ensure that a single process doesn't monopolize all of the processing power and just hog the processor to itself or hog all the memory to itself or overwrite memory that another process needs, in order to ensure that that doesn't happen, only the operating system is allowed to have full access to the machine and the other programs are locked into user mode. The operating system is stored in a special region of the main memory. The CPU always boots up, so when you first turn it on, it boots in kernel mode, and then it will begin by loading up the operating system. There are a sequence of instructions called the BIOS that is stored on ROM chips,
Starting point is 00:41:32 read-only memory chips inside your computer, which have sets of instructions to tell the computer what to do when you first turn it on, and it will just always automatically begin by carrying out those instructions, and that essentially will tell it to load the operating. system. Once it's loaded up the operating system into its special protected region of memory and got the operating system running, the operating system will in turn start by booting up programs or processes. Whenever your operating system boots up, it'll always start running processes, even if you haven't asked it to open any programs, because there's always all that background stuff that's happening. As soon as it starts running these programs,
Starting point is 00:42:09 just before executing the first instruction, the CPU will be flipped over to user mode. So there'll be some sort of enable wire that's disabled, and no longer will it be in the privileged kernel mode. It will be in user mode. And this means that certain instructions won't be able to be performed anymore. So as soon as your computer starts executing a regular process, that is not the operating system, it's already in user mode, and so it's limited in terms of what it can do.
Starting point is 00:42:38 And there's usually only one way, at least fundamentally one way, of getting the machine back into kernel mode. Obviously, a process can't just up. to go back into kernel mode because that would defeat the whole purpose of separating user from kernel mode. Basically, the way you get back into kernel mode is by a process handing over control back to the operating system. Only the operating system is allowed to operate the machine in kernel mode. Basically, the way this is implemented is that the operating system is stored in a special region of memory. So if I jump to a location in that special region of memory,
Starting point is 00:43:06 that means that I'm yielding the process, the currently active process is yielding up control to the operating system. And so only by jumping to those special memory locations, only when that is done is the kernel mode allowed to be re-enabled again. Now, if a currently running process wants to get the operating system to do something on its behalf, especially make an I.O. operation, or if it wants to get some more memory to use, then it has to make what's called a system call. That basically just means asking the operating system to do something on its behalf. So the way this works is that the way this works is that the system, the process first puts some data on some particular registers, and then it switches over to kernel mode and simultaneously yields the system up to the operating system. So it does that by placing the appropriate value in the program counter,
Starting point is 00:43:55 so that in the next instruction, the machine will jump to executing some particular set of instructions that belongs to the operating system, and that then enables the process that to be shifted back to kernel mode because it's now running operating system code. Now, once that happens, a particular... section of code stored in memory called the trap handler will receive the call that's just been made by the process and read the values that have been placed on those registers. On the basis of
Starting point is 00:44:25 the values that it reads in those registers, it will determine what type of system call to make. So that's why the process needs to put the values on those registers prior to making the system call because it has to store that information for the operating system to read. So basically it leaves a message saying, here's what I want you to do, and then hands over to control back of the system back to the operating system. Then the operating system, the trap handler, it's called the trap handler, I'm not really sure why, but that's just the name of the software. The trap handler starts running, and as it's executing its code, it starts looking at the registers where the message has been left by the process, and on the basis of whatever data is stored there,
Starting point is 00:45:01 it executes the correct instructions, memory management, I.O. or whatever it is. Once that instruction has been completed, the processor is returned to user mode and is, control is handed back to whatever process the operating system hands the control back to. So a system call just works essentially by leaving the instructions for the operating system, handing over back to the operating system, the operating system reads the instructions that have been left over, executes the appropriate instructions that only it can perform, the program can't do for itself, and then it hands back control to the process once it's done. Again, many of these system calls will be input-output, which is going to happen a lot in many
Starting point is 00:45:42 programs which have to write data to the screen, have to accept input from the mouse and keyboard and so on. So a lot of system calls are going to have to happen there, and that's going to involve input and output operations. So let me explain now how that takes place. A simple way of performing I.O. is to use what's called address mapping. Basically this means that each peripheral device, so including my keyboard, hard drive, monitor, mouse, etc., each peripheral is mapped to a unique region of the memory address space. remember, I talked in previous episodes about how memory consists of a series of registers, each of which has its own unique address consisting of a series of zeros and ones,
Starting point is 00:46:20 and when I put those relevant sequence of bits on the address bus, that and only that register in memory will be activated, and I can read off the data onto the database back to the processor. Basically, the same thing happens when the processor is doing I.O., except those particular memory addresses, instead of being mapped to the actual memory will be mapped instead to my I.O. devices. And that's possible because if you have a 64-bit processor, you'll have way more memory addresses than you could ever need to address all of your RAM. So there'll be plenty left over that you can use to address additional peripheral devices.
Starting point is 00:46:59 So done this way, getting input is then as simple as reading data from a particular region of memory. It's just that memory is no longer stored in main memory with all the other stuff. It's stored somewhere else, that somewhere else being in some sort of memory controller that corresponds to that particular device. Often many peripheral devices have their own set of registers that temporarily store data that's either just been inputted or has just been received from the processor and is yet to be outputted. And so in order to perform an IO, all I have to do is either read from or write to those temporary registers, and then the specific hardware specific to whatever the peripheral device is will take care of the rest.
Starting point is 00:47:36 So, for example, if I press a key on my keyboard, that creates a brief electrical contact as a result of the mechanical operation that sends a signal to some memory controller, either in the keyboard or in the computer itself, depending on how it's designed, which keeps track of the last key that was just pressed somewhere in the memory that corresponds to, depending on which wire was pressed down, maybe it's the Q key, or maybe it's the enter, whatever. It keeps track of that, or maybe the last bunch of, keys that have been pressed in special registers there. In order to read from the keyboard,
Starting point is 00:48:10 all of the process has to do is assert the correct address on the address bus, which is continually being read by the relevant memory controller, in this case of the keyboard. And if I've sent an address that corresponds to one of the registers that the keyboard uses, then the main memory will ignore the address because it doesn't correspond to any of the addresses that it contains, but the keyboard memory management unit will pick up that address because it corresponds to one of its registers, and the relevant register will be enabled and the data in that register placed on the data bus and then taken back to the CPU. So in this way, I.O. is just exactly like a memory access. It's just that instead of being stored in main memory, the relevant data is stored with the particular
Starting point is 00:48:53 memory controllers of the particular device in question. Now, there's one additional complication with respect to input. Outputs a bit easier because the processor will just decide when it's sends data that will be outputted by some output device. But inputs different because the processor never knows when input is going to be received, because I can press keys on my keyboard or click the mouse at any time. So the processor has to be, at least potentially, always ready to receive that input if and when it comes. And this is implemented by procedures or special signals which are known as interrupts.
Starting point is 00:49:26 Interrupts are sent from my peripheral device controller to the processor, essentially telling the processor, hey, there's some signals coming in here that you might want to read. might want to read. Depending on their priority, the processor may decide to address them immediately, or it may wait, it may wait and finish what it's doing and then address them. I'll talk about this in a bit more detail once I get to discussing the interrupt handler, which is actually a specific segment of hardware that processes these interrupt signals. But just be aware at this point that the way that the processor deals with the fact that input could come at any time is by using these
Starting point is 00:49:57 interrupts. So if the processor decides, it could take the input immediately and read it from those registers via the data bus onto the processor and then do with it whatever it needs to, or it might decide to wait and just leave it on those device registers in the memory control unit for the particular peripheral device and read it at its leisure. Now, I'm not going to discuss the details of the hardware behind mousees and keyboards and monitors and so on and how they work. I feel that that's a little bit beyond the already very ambitious scope of these series of episodes, but all we need to understand for these purposes here is that you can think of the peripheral device as consisting of this memory management
Starting point is 00:50:36 unit, which has the necessary circuitry to recognize its mapped addresses in it, and also to be able to read from and write to the relevant address and data buses that transfer data to and from the processor. That memory control unit and its connection to the bus is how the peripheral communicates with the CPU. After that, it's purely an issue of how this particular hardware is set up on the device in question, whether that be how the device recognizes which key I've pressed by the sending of special signals from each key to the registers that stores that, or how the mouse works in terms of storing or generating and storing a particular location on the screen where the mouse is currently located and keeping track of when I've clicked the mouse, or how the
Starting point is 00:51:20 information of each pixel is sent to my LCD monitor and displayed as an intensity of different colors. All of those things depend upon the underlying hardware of the peripheral device, and so a different for all of the different devices. But the basic principle as the information that's either to be read from or output to the peripheral device in question is just stored on a bunch of registers, so special memory in the memory controller of that device, and is read from and written to just using ordinary memory accesses, using the process of memory mapping. So now that we understand how memory address mapping works, let's turn to the interrupt handler and discuss that a little bit more. So there are actually different ways of handling.
Starting point is 00:52:00 I.O., and an interrupt handler is only one of them, but I think it's one of the easier to understand, so that's the one we'll focus on here. So, interrupt handlers can be implemented using either special dedicated hardware or using a special program in a dedicated special location in the operating system memory. It doesn't really matter which way you think of it, it is because functionally they can be designed to perform the same thing. So, here's how it works. When the computer receives an interrupt for a given I-o operation, say, for example, I press a key on the keyboard or click the mouse, then a special signal will be asserted on what's called the status bus. So remember we had an address bus which carries addresses and a data bus which carries actual data from those addresses. There's often another bus in computers called a status bus which carries information about the status of particular devices. And in this case, the relevant thing that it can carry is interrupt information from given devices. So my keyboard, for example, might send a signal on the status bus, giving a signal saying, hey, I've got some input here. and each of those different input signals and other interrupts as well, like errors, for example, will carry their own priority rating.
Starting point is 00:53:09 And what the interrupt handler does is, it's constantly receiving all of these interrupts from the status bus. But as the interrupts are constantly coming in, the interrupt handler is constantly comparing them and saying, well, are any of these priorities high enough relative to the process that's running at the moment and relative to each other? If an interrupt with high enough priority comes in, so if a high priority interrupt line is asserted, then the interrupt handler will send a signal to the CPU telling it to stop what it's doing and jump to the interrupt handler. So this is very much like a system call. Remember a system call is when the program that's currently running asks the operating system to do something for it.
Starting point is 00:53:50 So it, like I.O., for example, so it jumps to a particular place in the memory and starts running the trap handler program, which then processes the particular operation that the program has asked the operating system to perform. This is similar, except instead of the program asking the operating system to do something for it, it's actually the input device directly, like my keyboard or the mouse, that's sending the interrupt signal along the status bus that tells the interrupt handler to take over from the whatever the process that was doing and carry out its instructions. So if the priority of the interrupt signal is high enough, the interrupt handle will send a signal to the CPU to say, hey, stop what you're doing
Starting point is 00:54:28 and jump over to me, and it jumps over to the relevant code section. The program counter, again, is loaded with the right address to jump over to the relevant region of memory that holds the interrupt handler. And here the interrupt handler will collect the information, so it will read whatever data has been read in to say the keyboard, so it will read from those special registers that have been addressed mapped to the input device, and start running a program to execute the whatever processing needs to be done to handle that input data. So that might just be, for example, saving the input into some other region of memory that can be accessed by programs, for example.
Starting point is 00:55:04 So if I trigger the interrupt handle by pressing a key, the associated program inside the interrupt handle, or the sequence of instructions there will ensure that the information of which key that was is copied somewhere in the computer's memory, where it can be accessed by other programs. Once the interrupt handler has finished handling that particular request, it will turn off the interrupt signal that it just dealt with, so it's not immediately or not immediately re-triggered again, obviously that would be a bad idea, and then hand the system back to the process that whatever was running before, the interrupt was triggered. I mentioned that the interrupt handler decides, is constantly sort of keeping track of all of the
Starting point is 00:55:43 signals that it's receiving and deciding whether they're high enough priority to jump over and hand over control from the main processor. If it doesn't decide that, the interrupts will just sit there for a while until the interrupt handler decides, yeah, it's time that they should be dealt with. So the processor won't necessarily deal with the input that you give it instantaneously. It might wait for a while to be processed if it's doing something that the interrupt hand that regards as being more important. Now, normally this doesn't make a huge difference because my processor is running, you know, ticking over billions of times every second. And so even if it has to wait for a thousand cycles or even a million cycles, the input will still generally be processed very quickly. Sometimes, however, if my computer is lagging very badly, running very slowly for some reason, there's some error or it's run out of memory or something like that, and you've almost certainly experienced something like this yourselves, you'll see that the computer does not respond to input, or you press a key or try to click the mouse, and it takes ages and ages before you see a response.
Starting point is 00:56:47 Now, that happens because essentially the processor is bogged down with doing something else, and the interrupt handler regards that as more important than, processing whatever input that you've just sent and it takes so long in these unusual circumstances that you can actually see the delay there. But that only happens when there's something wrong with the system and usually that can be rectified by shutting down whatever program is using up all your memory or bogging down your processor. So if I didn't explicitly mention the interrupt handler, that special program runs in kernel mode. There's a given bit of hardware which is constantly monitoring all of the interrupt signals that's being received and deciding whether that's to act on them and that just exists in the background but the actual the software part of the
Starting point is 00:57:31 interrupt handler is a bit of code that exists in memory somewhere that takes over from whatever the current running process is and runs in kernel mode to deal with the i.o operation and then hands back control to the process. Now that we know how the processor manages input and output operations it's time to talk a bit about how memory is managed and allocated between processes. there's still a big gap that I mentioned at the start of the section about the operating system, which is how my operating system is able to ensure that each program accesses the right addresses in memory. Because so far, the way we have explained things, when a program is loaded from memory and begins executing, all of the memory addresses that it contains and all of the places in memory that it's going to try and look for its data
Starting point is 00:58:20 are already hard-coded into the machine language program. So each of the binary addresses are already there. That's a problem because the location of the program's data isn't actually up to the program. It's up to the operating system. The operating system decides where in memory it's going to load all of the data that the program needs. So the program actually doesn't know where that's going to be when it starts executing. So it doesn't really make sense to have all those addresses hard-coded into the binary of the program itself. And yet that's exactly what the assembler does, or what you do manually,
Starting point is 00:58:47 if you're crazy enough to write in machine code directly. So how does this actually work? How do we map the addresses as they're contained in the program instructions in those binary executable files to the actual locations in the hardware where the data has been loaded by the operating system? The solution to this is to incorporate a memory management unit into the hardware. And again, there's a combination of hardware and software that you can use to implement this, just like our interrupt manager and just like our trap handler as well. There's a combination of hardware and software that's used in order to carry them out.
Starting point is 00:59:20 But we're not worried too much about the details of the implementation. The basic idea that the memory management unit is a memory device, so it's mostly just a segment of memory, but distinct from main memory. And it holds a mapping between what are called physical memory addresses. They're the actual locations, the addresses of the actual RAM of the actual physical memory. Those are physical addresses, and virtual addresses, which are the addresses contained in the machine language of the program. So the memory management unit maps one to the other.
Starting point is 00:59:52 So what happens is that the processor gets its address from the program that it's running. That's been hard-coded into the binary of the executable program. Previously what I've said is that this address is then just placed onto the memory address and sent off to the memory leading to the activation of a particular register, which then places its data on the database and sends it back to the processor. But that's actually not what happens, not in real processes, of any system that runs an operating system. Instead, what happens is that the processor sends its address to the memory management unit for decoding,
Starting point is 01:00:21 and the memory management unit has essentially a lookup table where it sees, ah, this is the virtual address I'm getting from the program, here's the corresponding physical address, and it then sends that physical address on the address bus to read that location from the actual physical memory. So that's the essential idea of the purpose of the memory management unit. It maps from virtual memories in the program to physical addresses, in the actual memory of the RAM.
Starting point is 01:00:49 The program doesn't need to know the actual physical address of its data. It only needs to know the address in its own virtual memory, and every process that the computer runs has its own virtual memory space. And the memory management unit keeps track of this by a series of process ID numbers.
Starting point is 01:01:05 So whenever the operating system runs a new process, it gives that process an ID number, which is stored somewhere in memory. And the memory management unit, when it's sent a given process ID number, it loads from memory the mappings of the virtual to physical memory addresses so that it's got the relevant one for the current process that's running and now using that mapping it can convert any virtual address that the program uses
Starting point is 01:01:30 that the program sends it or the processor sends it and convert that into a physical address which is where the memory actually exists on the RAM and as I mentioned each process has its own virtual memory space so another big advantage of this is that when I'm writing a program I don't have to worry about the fact that the program only has a limited physical address space that it can access on memory, and maybe it's discontinuous, and maybe there's a bit in the middle that's taken up by a different program that's not running at the moment, and I can only access addresses before and after that. If I had to keep track of that while I was programming, it would be a nightmare.
Starting point is 01:02:03 But as it is, I don't have to worry about that. I just imagine that I've got all the memory in the world, and I can address it all linearly in a continuous fashion. When it actually comes to executing the program, the memory management unit keeps track, of the mapping between those virtual addresses that are actually in the program and the physical addresses that actually exist in the hardware, and it's all done for you. The operating system keeps track of all of that. Whenever I want to switch over from running one process to running a different one, that's called a context switch. And to do that, essentially, all of the mappings between virtual and physical memories that exist in the memory management unit are stored somewhere
Starting point is 01:02:40 in memory and replaced with a new set of mappings that are relevant to the new process. and again, the processes kept track of using these process ID numbers. So every time the processor wants to flip from one process to another, there's a certain amount of delay there because not only it doesn't just have to reload the registers and the program counters and someone, but it also has to reload all of the mappings in the memory management unit because each process has its own virtual memory mappings. But that's okay because this doesn't actually take that long, and we've got lots of clock ticks to use, because remember we're ticking over billions of times a second,
Starting point is 01:03:10 so we've got the clock ticks to spare to do this. A program is only permitted to access data in its allocated regions of memory. So this is what I said before. A program can access whatever it likes in its virtual memory, because its virtual memory space is essentially unlimited and it's unique to it, but it can only access physical memory addresses from the regions that it's permitted to access via the mapping given by the MMU, the memory management unit. If a program ever tries to access data outside its allocated regions of physical memory,
Starting point is 01:03:40 then it leads to an error, often, or at least in the past, I think it's been reworded, but in the past it would say something like this program has performed an illegal operation and will be shut down. But what it means by illegal operation is that it is trying to do something that it has not been allowed to do by the operating system. Remember, processes run in user mode, so they're not allowed to access I.O. devices directly, and they're not allowed to access any memory outside of the specific memory that's been allocated to them. If it tries to do any of those things that it's not allowed to do, it raises this sort of error and is usually shut down. The operating system will also, by the way, keep track of the actual usage of the physical memory and ensure that different processes are not trying to map a given region of their virtual memory to the same region in physical memory.
Starting point is 01:04:27 It doesn't matter if two processes have the same address in their virtual memory because those are separate from each other. As long as the physical memory uniquely contains or is uniquely allocated to one particular process at a given time. If all of the physical memory is used up, the operating system can engage in something called paging, whereby it basically grabs a bunch of data that's saved on RAM and moves over to the hard drive, the hard disk drive, where there's a lot more space, and uses that as sort of extra expansion memory. That's not ideal because the hard drive is much slower than RAM, but it's useful when you run out of main system memory,
Starting point is 01:05:02 and so want to page off a whole bunch of stuff onto the hard drive to free up some space on the regular RAM. and that's done in big segments called pages, which contain a certain number of bytes of data. So you can't just move off any arbitrary amount you want. It's done in specific units called pages. But the details of that I don't want to get into because it gets a bit technical and boring, which is kind of irrelevant to my purposes.
Starting point is 01:05:25 I just wanted to mention that's how memory is managed. Okay, so the final thing that I need to talk about that the operating system does is process management. So as I said, a single CPU, so ignoring multi-cores, you just have a single core, a given processor can only run a single process at once. To give the illusion of multitasking, what the processor does is continually flips back and forth between the different processes that are running. And because modern processors run at such high clock speeds, it's able to do that very rapidly. So, for example, even if the processor flips
Starting point is 01:05:57 a thousand times in a second, that still leaves a million clock ticks for each of those processes that it's running. So there's plenty of clock ticks to go around, in other words. The job of the operating system is to keep track of all the currently running processes and what their memory mappings are and where they're stored, because each process has its own memory mapping, which needs to be loaded into the memory management unit every time that process is loaded into the processor and is executed. So the operating system needs to keep track of where all that is. And it also needs to keep track of which of them is currently running and when each of them was last run and what share of the system resources they have been allocated. The operating system should ensure that the processes
Starting point is 01:06:34 are allocated memory and system resources in a fair and equitable way, so that no one process can hog all of the resources for itself. The way that it does this is interesting, because you might wonder, how does the operating system get back control of the processor after it's handed over to a process? Can't that process just decide to keep going forever and never hand back control to the operating system? Well, the answer is a bit sneaky, because there are a couple of ways the operating system can get back control from a process. One way is whenever the process performs a system call, remember that's when it's asking the operating system to do something on its behalf. For example, an I.O. operation or asking for extra memory or something like that. Whenever the process
Starting point is 01:07:20 does that, the operating system takes over. And actually, often what it does is, the first thing it does is often not to carry out the call that the process asks it to do. It actually decides, do I want this process to keep running or do I want to run something else? So that's one way that the operating system can take over again whenever a system call is raised. Another way is whenever an interrupt is raised by I.O. Whenever the interrupt handler is called, the operating system can decide whether it deals with that now or whether it just saves the current process and loads a different one. Whenever a given running process is saved, by the way, obviously all of the values of the registers
Starting point is 01:07:56 and including the program counter and the structure register and everything need to be saved to somewhere in the memory and organize so that it can be resumed and everything loaded back to their previous state once the processor is ready to resume that particular set of instructions. A final way that the operating system can regain control of the processor from a running process is just to use what are called timed interrupts. So these aren't generated by I.O. or errors. They're just on a timer. So maybe every thousand clock ticks, there'll be a timed interrupt that's always executed. And when this occurs, the operating system just automatically retakes control of the system and then decides, do I want to hand back control to the process that was running, or am I going to allow a different process to run for a while? So this ensures that even if there's no I.O. And even if the process is not making any system calls, that the operating system still has the ability to usurp control back to itself.
Starting point is 01:08:51 Because the operating system always needs to have underlying control of the system and memory management and what's running and what is using resources, and what's also accessing input and output devices. And recall also every time the OS is invoked, the processor is returned back to kernel mode so that the operating system has free rein over what it does with your system. Processes that just run in user mode don't have that control, and so are not able to take control of the system
Starting point is 01:09:14 in the way that the operating system can. By the way, I should also mention that when I talk about the processor being run in kernel mode and that giving it extra privileges, this actually means, or it can mean, under certain designs that particular instructions in the instruction set architecture, for example, I.O. instructions are only able to be performed when the processor is in kernel mode. And if the program tried to do those by itself, it would raise an error.
Starting point is 01:09:40 So one way of designing this is actually to have specific instructions that are fixed and organized in the architecture of the processor with their own control bits and registers and all that stuff that we talked about in the last episode, particular instructions in the instruction set architecture, but that can only be executed if, the kernel bit is enabled, and therefore if the extra permissions are allowed. Absent that, those instructions cannot be performed. So, that concludes what I wanted to say about the assembly language and the operating system. Hopefully you found this episode interesting.
Starting point is 01:10:09 If you enjoy the podcast, I would appreciate it if you would go on to iTunes or some other podcast aggregator that you may use and give the podcast a favorable review. That helps to spread the news and give the podcast a good reputation. Another way you can do that is to go onto Facebook, type in the Science of Everything podcast, and give our Facebook group a like. That again helps to spread knowledge about the show. And you can also send me an email. My email address is Fods12 at gmail.com.
Starting point is 01:10:35 That's FOTS12 at gmail.com. I welcome any questions, suggestions or feedback or just hearing from my listeners is always good. So thanks for listening and I'll talk to you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.