The Science of Everything Podcast - Episode 34: DNA Structure and Function Part 1

Starting point is 00:00:33 You're listening to The Science of Everything podcast, episode 34. DNA, structure and function. And I'm your host, James Fodor. In this episode, I'm going to look at DNA and genes, and particularly talk about the structure of DNA, the double helix structure, and how that's put together by the nucleotide-based pairs, and how that relates to the genetic code. We'll also discuss the central dogma of molecular biology,

Starting point is 00:00:57 which is basically that DNA translates to RNA, which then makes proteins. And in doing so, we'll look at the civil. processes of DNA replication, DNA-R-N-R-N-A transcription, and also RNA-to-Protene translation. Now, this episode has some prerequisites, if you will. It's recommended that you have listened to Episode 10 about the cell and Episode 18, Biochemistry Basics. I think those episodes themselves also have some prerequisites. But basically some background about the structure of the cell and some knowledge of biomolecules and polymer chains and stuff like that would be very helpful for this episode.

Starting point is 00:01:33 Okay, so there's a lot to get, a lot to go through in this episode, so let's get started. First of all, we'll talk about the structure of DNA. So first of all, DNA is an acronym, which stands for deoxyribonucleic acid, which is the name for the monomer, that is the molecules, that make up the DNA structure. Each of these little monomer molecules are called nucleotides, and as I said, they're a class of organic monomers. They make up both DNA and, well, the DNA and and RNA actually. Each nucleotide has a particular structure and then you sort of link them all together in a big long chain to form a polymer. Remember this from the biochemistry episode. The many monomers go in to make a polymer. The DNA itself is a polymer, but the nucleotides are the monomers that make up the polymer. But each nucleotide, the monomer, has a particular structure

Starting point is 00:02:24 unit of itself. It is composed of a nitrogenous base unit, a five-carbon sugar structure, and one phosphate group. It might be a bit more helpful if you could see a diagram of this. Okay, so let's try and break that down so we can get an image of what this looks like. At a very highly stylized level, you can think of a nucleotide as being kind of like a Mickey Mousehead. That's like one big circle with two smaller circles, sort of to the top left and right, forming the ears. Hopefully you know what I'm talking about there. Now, the head, the middle, largest, circle in the case of the nucleotide, in this analogy, is the sugar. So it's a monosaccharide, which means it's a simple sugar molecule. But basically, it's just a pentagon sort of shape.

Starting point is 00:03:11 I mean, it's not exactly like that, but that's a good simple first approximation. So it's a five-sided shape. And at each of the corners of the pentagon, there is a carbon atom. And the carbon atoms around the Pentagon are all bonded together with each other. and sort of sticking out from, well, most of the corners of the Pentagon are other atoms that are bonded to it as well. Basically, a mix of some hydrogens and some oxygens in various relationships. We don't need to know exactly how all those fit together. Now, what I just said is not quite true because one of the corners of the Pentagon doesn't actually have a carbon atom, it has an oxygen atom instead, and one of the carbon atoms is actually, so there are five carbon atoms,

Starting point is 00:03:51 but one of those five is sort of bonded outside the Pentagon. to another one. So the structure is not quite that, but for simple level of analysis, we can imagine a pentagon with five carbon atoms and some other hydrogens and oxygens sticking out the edges. That monosaccharide sugar molecule forms sort of the head or the base structure around which the nucleotide forms. Now, the sugar, backbone, if you like, of the nucleotide is basically the same in DNA and RNA. As I said, in DNA, it's actually deoxyribonucline. acid, which means it's the same as ribose, which is what makes up RNA, except it has one less oxygen, basically, or I think it's actually one less oxygen and one less hydrogen or something

Starting point is 00:04:34 like that, but it's pretty much exactly the same thing in both the case of DNA and RNA, so we won't worry too much about that difference. Okay, so now we've got our head on our Mickey Mouse figure, which is our Pentagon with five carbons plus some hydrogens and oxygens. What are the two ears? Well, you can imagine the leftmost ear is the phosphate group. What's a phosphate group? A phosphate group is just a phosphate atom in the center, bonded to four oxygen atoms all around it. One of those oxygen atoms is in turn bonded to the carbon that I said, remember I said one carbon was sort of sticking outside of the pentagon shape in the ribose slash joxyribose. One of them was outside that pentagon.

Starting point is 00:05:13 Well, one of the oxygens in the phosphate group is bonded to that carbon, and so they're connected in that way. But you don't necessarily need to know too much about that detail. Just basically you've got a phosphate group, which is phosphate surrounded by oxygens and that's connected or bonded to the sugar, which is your five-carbon sort of pentagon shape. Okay, so that's the head in one ear. What's the other ear? Well, the other ear of this Mickey Mousehead analogy we're using is slightly more complicated. This is the nitrogenous base that I mentioned before. The reason it's slightly more complicated is because this differs depending

Starting point is 00:05:45 upon essentially what type of nuclear-tide it is. The other two things, the phosphate group and the sugar ribose or deoxyribose is basically the same for every nuclear-tide. It's found in DNA and RNA. But the nitrogenous base differs. There are actually five different types of nitrogenous bases that you find in nucleotides. They are called aden, guanine, cytosine, urosil, and thymine. So we've got five types, but they are broken up into two groups called purines and pyramidines. Basically, it depends upon the shape of the molecule. I'll talk more about that in a second. But the fact that there are two groups is very important, because in a DNA molecule, as you may know from seeing these representations all over the place, the DNA molecule is a double

Starting point is 00:06:35 strand. You have two chains of nucleotides, which are sort of next to each other and curl around each other. Those two chains of nucleotides are connected together via their nitrogenous bases. So how does that relate to the two classes of nitrogenous bases that I just talked about? because one purine must bond or connect to one pyramidine. You can't have two pyramids or two purines, and in fact, if that happens, you can get problems with the kinks in the structure and so on, which can lead to problems with DNA replication and so on. So one purine must bond with one paramidine,

Starting point is 00:07:09 and that just relates to basically the hydrogen bonds, refer back to a previous podcast and biochemistry and so on, where we talked about hydrogen bonding, but the hydrogen bonds between the nitrogenous bases have to line up in the right way, in order for it to be stable. And if you don't have a paramidine and a purine, so one of each of those two groups, then it doesn't work so well. This is rather complicated, but don't worry, we're sort of building up the picture, and we'll go through these concepts again as we sort of unravel this mess that is DNA structure. Okay, so you remember I said that there are five different types of

Starting point is 00:07:43 these nitrogenous bases. Two purines, adenine and guanine, and three paramadines, cytosine, and thymine. Now, these are usually given single letter abbreviations, A for adenine, G for guanine, C for cytosine, U for cytosine, U for Eurisil, and T for thymine. So this is the source of the, well, those acronyms that you might have seen, or combinations of letters, you know, A-G-T-C, T-G-A, whatever, that you might have seen associated with genetics and so on. It's referring to, those letters refer to the order, nitrogenous bases in the genetic code, basically. And once again, and we'll talk a bit more about how that works in a moment. Now, although there are five different types of these nitrogenous bases,

Starting point is 00:08:27 or just bases, as they're commonly called, only four of them are used in DNA. DNA uses adenine, guanine, two purines, and cytosine and thymine, two pyramidines. DNA does not use urosil. Eurisil is only found in RNA, and it replaces thymine. So it's kind of like in DNA, you have A, G, C, T, in RNA you have AGCU with UUUUURSEL replacing the T for thymine.

Starting point is 00:08:56 I don't know that there's any particular reason for that. That's just how the biology turned out to work. Before we move on, just a quick word on the actual structure of these bases. Remember, there are the purines and the pyramids. The purines, adenine and guanine, are basically comprised of two carbon rings sort of connected to each other. One of the rings is actually a hexagonal ring. The other one is a pentagonal ring, and they're sort of meshed together so that they share one side.

Starting point is 00:09:24 So it kind of looks like the infinity symbol with those two chains, two rings next to each other. Except they're not exactly carbon rings, because a couple, several of the carbons at the corners of the rings have been replaced by nitrogen atoms. Plus, there's also the usual hydrogens and some oxygens sticking out from the sides of them, just like they were sticking out from the sides of the five carbon sugar that forms the, the center of the nucleotide. Those are the Pureans. Those are the ones that have the two rings. The Pramidines just have a single ring, a single hexagonal carbon ring with some nitrogens,

Starting point is 00:09:59 replacing some of the carbons, or two of the carbons, actually, inside the ring, plus some nitrogen, hydrogens, and oxygen sticking out from around the ring. And the precise arrangement of these sticking out nitrogens and hydrogens and oxygens and exactly how many of the atoms there are, depend upon which of the exact type of parameter or pyrin you're looking at. So whether it's added in or guine or guine. or cytosine or thymine or, etc. But that's the basic structure of these nitrogenous bases that you

Starting point is 00:10:25 have in the case of purines, two rings next to each other, in the case of pramonines, just a single ring, made of carbons and nitrogen. Okay, so we've got our nuclear tides. Once again, a nuclear tide is a monomer, which is comprised of a five-carbon sugar, either ribose or deoxyribose, a phosphate group, which is bonded to that sugar group, and a nitrogenous base, which is also bonded to the sugar group. And there are five types of nitrogenous bases, four found in DNA, four found in RNA, with Eurasol being the one that's unique to RNA. Okay, so we've got a whole bunch of these nucleotides now. Let's look at the case of DNA, because that's mainly what we're focusing on here. So, sort of forget about Eurasol for the moment. We've got A, G, C, and T. And we can imagine we have

Starting point is 00:11:12 a whole bunch of nucleotides, roughly one quarter of each of which have each of the four different types of nitrogenous bases. That's roughly correct. I think the percentages are not quite even, but they're pretty close to even. So roughly 25% of each of the nucleotides in our genetic code is each of the four different types of nitrogenous bases. Now, you can imagine these all, you have a whole bunch of nucleotides with the different bases, and you start connecting them together in a chain. Now, how does that work exactly? Well, basically what happens is these are called phosphodiaster bonds. So phosphodiaster bonds are the bonds that keep the nucleotides together. And you remember I said that with the five carbon sugar that's located sort of in the head or the central position of the nucleotide,

Starting point is 00:11:57 do you remember I said that that has oxygens and hydrogens that sort of stick out from the sides, stick out of the main carbon atoms? Well, when a phospho-diester bond forms, what happens is one of those oxygen molecules that sticks out from the carbon sugar is replaced by one of the oxygen atoms that surrounds the phosphate, in the phosphat, in the phosphate group, because remember I said, a phosphate group is a phosphate with four oxygen atoms around it? Well, one of, so one of the oxygens in the phosphate group replaces one of the oxygens that was sticking out from the ribos, or deoxyribose, and therefore those two molecules, the phosphate group and the five carbon sugar, are now linked because they've

Starting point is 00:12:38 got this, they share the same carbon, excuse me, they share the same oxygen atom. And this is called a phospho diastobon, phosphor, because it relates to the phosphate group. And this is a a relatively strong bond. And the thing about this bond is that you can keep iterating it as many times as you like. That is, you can replace the oxygen atom in one of the nucleotides. And so then now it is bonded to a second nucleotide, but then you can do that again for the second nucleotide that you bonded on and bonded to a third nucleotide and so on. So you can keep linking them, building up a big long chain. And there's no real limit to this. So the largest DNA molecules, I believe, in humans, there are hundreds of millions of nucleotines long. So you can do this from anything up to

Starting point is 00:13:24 a few nucleotines bonded together, hundreds of millions. It's very flexible, and these bonds, the phosphate-dose bonds are quite strong, so they keep the chain all bonded together without it falling apart. Now, there's an important distinction that I need to make here. If you remember before, I talked about the bonds that connect the phosphate group to the five-carbon sugar, the center of the molecule. and in that bond, one of the oxygens in the phosphate group replaces one of the oxygen atoms that sort of sticks out from the side of the five-carbon sugar, and in doing so, the phosphate and the sugar are bonded together. This is different, but very similar to the phosphodiester bond that we're talking about in this case. In the phosphodiester bond, it's still the case that an oxygen from the phosphate group replaces an oxygen that sticks out from the five-carbon sugar. except it's a different oxygen to the previous case.

Starting point is 00:14:18 So, very difficult to describe this without, without you seeing it as a diagram, it might be useful to Google phosphodicester bond or something like that and get a feel for what this looks like. But it's basically like you've got one, imagine you have one sugar sitting on top, one nucleotide sitting on top with its five carbon sugar there, and it's got an oxygen sticking out the bottom,

Starting point is 00:14:39 and then you've got another nucleotide sitting below the first nucleotide, and it's also got its five-carbon sugar, with an oxygen sticking sort of upwards, pointing towards the first five-carbon sugar. So these are two oxygens from different nucleotides that sort of directed towards each other. And these two oxygens are both also bonded to a phosphate atom, which is part of a phosphate group,

Starting point is 00:15:02 and it has two other oxygens also bonded to it. So remember, phosphate group is a phosphate atom, surrounded by four oxygens. Two of these oxygens are just sort of, you know, sitting there doing their stuff, but two of them are also bonded to the, five carbon sugars, one to the five carbon sugar that belongs to the top nucleotide, and one to the five carbon sugar that bonds to the bottom nucleotide. Now, in one case, that's just your normal

Starting point is 00:15:23 phosphate to five carbon sugar bond within a nucleotide. In the second case, what we actually have is a phosphodiaster bond, which is similar in that it's replacing an oxygen atom, but different in that it bonds two nucleotides together, as opposed to just keeping one nucleotide bonded together within itself. And, you know, more specifically, we can look at whether this which particular carbon atom is doing the bonding, you know, compare the 3 prime to the 5 prime carbon atom. Don't worry if you don't know what that means. It's more details about numbering the different carbon atoms, but I don't want to get into that degree of specificity in this podcast,

Starting point is 00:15:59 which is already getting complicated enough. Okay, so that's the distinction between the phosphodaster bond and the bond that just keeps the phosphate group connected to its own five-carbon sugar group. Okay, but that only explains one side of the chain, because remember I said that in the DNA structure you have two chains of the nucleotides that are curled around each other. Basically, the double helix structure, which you've almost certainly seen represented in various forms, it's kind of like a ladder. You know, on a ladder, you have sort of two sides to it, and then rungs, obviously, spaced periodically, between the two sides,

Starting point is 00:16:35 with each run being perpendicular to the sides of the ladder, and you hold on the sides and you climb up the rungs. A double helic structure is basically like that, where each of the sides of the ladder represents a different chain of nucleotides. So you've got two chains of nucleotides, one each side. And each of the rungs represents the base pairing that occurs between the nitrogenous bases on each of those nucleotides, and we'll look at that in a second. Except the difference between the double helix and a ladder

Starting point is 00:17:03 is that the double helix is kind of like a ladder that's been curled around on itself, so it spirals upwards or downwards, which of way you want to look at it. Very hard to describe that. You basically just have to look at a picture of it, but I think you know what I'm talking about. So yeah, double helix is kind of like a spiraling ladder, which spirals and curves around itself.

Starting point is 00:17:20 And we've currently explained, or up to this point, I've explained how the sides of the ladder connected to each other. That's the phosphory diastobons, involving the phosphate groups and the five-carbon sugar. But what about the rungs in the ladder? What about the connections between the two sides? How do the two long chains of nucleotides

Starting point is 00:17:41 stay connected to each other, just sort of float apart. And if that happened, you wouldn't have a double helix anymore. You just have individual single helixes. So how does that work? Well, the answer to that is base pairing. Now, if you remember me talking before about the pyramids and the purines, that the two different types of nitrogenous bases, which make up the third element of the nucleotides, do you remember I said that you needed one paramedine and one purine, and they formed hydrogen bonds with each other? Well, that's how the base pairing works. Base pairing is so that if you can imagine two nuclear tides, each on opposite sides of the latter,

Starting point is 00:18:17 and there's a rung in between those two nucleotides, well, that rung is comprised of the nitrogenous base from your left-hand-hand-side nucleotide and the nitrogenous base from the right-hand side nucleotide. And the two nitrogenous bases sort of stick into each other and are connected to each other via two to three hydrogen bonds that are formed between basically the oxygen and hydrogen atoms that form part of the nitrogen-based molecules. And as I said, for that to be stable and work properly, you need one paramine and one purine. Otherwise, basically, the two molecules don't fit properly.

Starting point is 00:18:54 And the whole double-heelic structure becomes disrupted. Specifically, it's actually a bit more specific than that. A bonds only to T and C bonds only to G. So if you remember, I'm using the letter abbreviations here for each of the different types of nitrogenous spaces. So if you had an A on the left side, you have to have a T on the right side for it to bond to. If you had a C on the left, you have to have a G on the right for it to bond to, to form those hydrogen bonds. Otherwise, the structure doesn't work. So now we can build up the whole double helix structure.

Starting point is 00:19:24 You've got our two sides of the latter, which are bonded together through phospho diastobons. And we've got each of those two sides connected together by the rungs, which each rung being composed of two nitrogenous bases, either an A and a T or a C and a G. in DNA, the two bases are then connected to each other by the hydrogen bonds, and since we've got two rungs to the, excuse me, two sides to the latter, that's our two chains of nucleotides, and those two sides or chains connected to each other via the base pairs, we've got a DNA molecule. As to why it forms that double helix structure as opposed to being straight or just some other shape, I don't really know the details of that, but it basically just, this is something that Watson and

Starting point is 00:20:08 Crick, who were among the first scientists to discover the double helix shape of DNA. They studied this and subsequent research as well. It basically just relates to the molecule, the macro-molecules as a whole, so the two strands of nucleotides, occupying its lowest possible energy state. And remember from previous podcasts that I've done, Thermodynamics, for example, when we've talked about entropy, basically that physical systems will always tend to move towards a lower state of energy or having a lower amount of free energy, which means a higher amount of entropy. That's basically what the macro molecule is doing when it adopts the double helix shape. It's just the lowest energy state that it can adopt,

Starting point is 00:20:49 and the reasons why that's the lowest energy state have to do with a very complicated, specific analysis of exactly how the bonds relate to each other and so on, which is far more detail than we need to get into in this episode. but just be aware there is a reason that it adopts that double helix shape. And the double helix shape is part of the reason why DNA is so good for storing genetic information because it helps it to be nice and stable and to maintain itself over time. Okay, so that is the structure of DNA. Let's do a quick recap.

Starting point is 00:21:19 DNA stands for deoxyribonucleic acid. DNA is a big long macromolecule, which is comprised of two interlocking chains of nucleotides, nucleotides being the monomers that make up the chains, which themselves are polymers. Now, each of these nucleotides is comprised of three subunits. A five carbon sugar, which is basically like a pentagon with carbons at each of the corners, plus some hydrogens and oxygen sticking out the edges. A phosphate group, which is a phosphate atom with four oxygen bonded to it.

Starting point is 00:21:50 And finally, a nitrogenous base of which there are five different types. Adonine, which is abbreviated as A. Guanine, G. cytosine, a C, urosol, a u, that one is only found in RNA, and thymine a T. The nucleotides link up to each other by a bond basically between the five-carbon sugar and the phosphate group of an adjacent nucleotide, and opposing chains of the nucleotide monomers connect to each other via the hydrogen bonds that form between adjacent nitrogenous bases.

Starting point is 00:22:28 And for those hydrogen bonds to be stable, you need to have one purine bondar two, one paramidine. Okay, so that's our structure of DNA. Why do we care about all this? The reason we care about it is because DNA stores genetic information. So we're now moving on to our second topic of this episode. DNA stores genetic information. What does that mean? Well, this relates to the one gene, one polypeptide, or one gene one protein hypothesis,

Starting point is 00:22:56 which is not exactly true, but it's close enough. to use in this episode. It basically means that one gene being one more or less continuous stretch of nucleotides along the DNA codes for the information that creates one protein. Now remember, a protein is just another type of macro molecule, which is comprised of different monomers, amino acids, than the nucleotides that make up the DNA. So basically the DNA contains the information that tells the body, or stores the information that allows the body to use that DNA template to produce a particular protein, a particular structure of amino acids, which then has a particular function, does whatever it does, in the body.

Starting point is 00:23:37 There are thousands of proteins that are made in the human body and, of course, in other organisms too. Most of the useful stuff or interesting stuff that happens in the body is done by proteins. Just as a few examples, proteins make up the collagen and the microtubules and other structures that give our body strength and keep the cells stiff, and so on. Proteins form the, many of the enzymes that break down nutrients that we consume and therefore provide the energy that we need to continue functioning.

Starting point is 00:24:06 Proteins act as, act as sort of gatekeepers that sit in the membranes of cells and determine which particles and other materials can enter and leave the cell according to various factors. And that, for example, the ion channels being one of these types of protein, the protein receptor, on the cells, forming a crucial element of neurons, which are the cells that make up our nervous system. And so without these protein structures, our brains would not be able to function because we wouldn't have the neurons sending electrical signals and so on. And we'll cover that in a future episode. I'm just trying to illustrate that proteins do a lot of the crucial work of keeping an organism alive and, you know, doing the interesting stuff. So to build an organism, you need to be able to build

Starting point is 00:24:52 proteins and you need to be able to build those specific proteins with the amino acids in the right order because otherwise the protein won't be able to fold up in the right shape, and if it can't fall up in the right shape, it can't do its job properly. In order to be able to get the right proteins, you need to be able to have the DNA, which codes for that protein, which has the specific order of amino acids, sort of specified in the genetic code, and then the body can read that, produce the right protein, and then the protein can go off and do its work. If you get the wrong protein, well, bad things happen. One example of that is sickle cell anemia, which is a a disease particularly prominent in sub-Saharan Africa, southern Africa, whereby I think it's just

Starting point is 00:25:29 one amino acid in a protein that's related to red blood cells. One amino acid is in the wrong spot, which means the protein falls up in the wrong way, which means that it's much less efficient at carrying oxygen, which means that it's much more difficult, which basically means you're anemic, your body has difficulties in transporting and absorbing the oxygen, to keep your body functioning. So that's caused by one amino acid in the wrong spot. And many diseases that would do similar things to that aren't even diseases, because basically the organism, if they had that amino acid in the wrong spot, doesn't actually survive. It's inviolable. So it's crucial that the genetic code be preserved

Starting point is 00:26:09 and has the, or preserved over time, and also has the right information ordering of the amino acids so we can build the right proteins. Anyway, so that's the motivation for this genetic code and what it all means, why we care about the structure of DNA. Because in order to understand how DNA contains this information, contains this genetic information that we need, we need to understand the structure of DNA. So, as I said, back to the one gene, one protein, or one polypeptide hypothesis.

Starting point is 00:26:35 Polypeptide, by the way, is basically a fancy name for protein. It's pretty much means the same thing. One gene, one section, one chunk of nucleotides, produces one protein. And if you want to produce a different protein, you need to pick, or need to store that information in a different section, a different stretch of nucleotides. And, you know, the size of this may vary from a few dozen to hundreds or thousands of nucleotides,

Starting point is 00:26:59 which may sound like a lot, but remember, we have a few billion nucleotides in the entire human genome. That's the entire set of our DNA. So there's quite a lot of space in there for this information. Okay, so where does that leave us? Well, it leaves us with the idea that the genetic information, the information that we're interested in figuring out how this works, is stored in the order of the amino acids that comprise the DNA. So that's what the 1-gene-1 polypeptine hypothesis tells us,

Starting point is 00:27:26 and it's more or less accurate, but it doesn't tell us how the information is stored in the DNA. I mean, excuse me, how the information is stored in the order of nucleotides. We know it is the order of nucleotides, and the order is the crucial thing, because if you change the order of nucleotides, then that's no longer codes for the right protein, but how is the information stored in the order of nucleotides?

Starting point is 00:27:43 And actually, this is quite recent stuff. really being able to unravel the secrets, if you will, of how DNA codes for proteins in the last four decades or so, maybe a little longer than that. But how does it work? Well, when scientists were trying to figure this out, they were basically presented with a problem, because remember, there's only four nucleotides in DNA. There's only four different types of nucleotides, corresponding to the four different types of the nitrogenous bases. But if you only have four different types of nucleotides, how can you code for information about proteins and about specifically the order of amino acids

Starting point is 00:28:18 that make up those proteins, when there are about 20 different types of amino acids that are used in the human body to make up proteins. So we got 20 different amino acids, but only four nucleotides. I mean, you could say, okay, one nucleotide says if we have this nucleotide, it says we use this amino acid, if we have a different nucleotide, it says we use a different amino acid. But that doesn't work because there are way too many amino acids for nucleotides or for the different types of nitrogenous bases

Starting point is 00:28:40 on those nucleotides. So how do we square that? Well, you could say maybe it's combinations of two nucleotides that code for amino acids. So we need to have these two, say an A and a T or a G and a C next to each other, those two nucleotides in that order next to each other, in order to code for this one amino acids. So the order of two nucleotides codes for the amino acids. Turns out that doesn't work either because that would only allow for a maximum of 16 amino acids to be coded for.

Starting point is 00:29:09 And as to say, we have 20 amino acids. we need a code of three nucleotides. That can code for a maximum of 64 amino acids, which is far more than we have. Although it turns out you need a few extra ones, and there's some redundancy in the system. So if you're going to have a code like this where the order of a certain number of nucleotides corresponds to a particular amino acid, you're going to have to have three, or at least three nucleotides in a row coding for the single amino acid, because otherwise there's just not enough different types of amino acids for it to work. This sequence of three nucleotides, so this is three nucleotides in that particular order bonded next to each other in one of these chains on the DNA molecule

Starting point is 00:29:45 is called a codon. So, as I said, there are 64 possible combinations or possible different types of codons you can have, depending upon the four different types of nucleotides you have and the order that they're placed in, and 20 different amino acids that they code for. And so many amino acids are coded for by multiple codons, because there's more possible combinations of codons than we need in order to code for just 20 amino acids. So this is the answer to how the, the, you know, how the DNA, or specifically how the order of the nucleotides in the DNA codes for the amino acids used to make proteins. It's the specific order of these codons or these units of three nucleotines in a row. Each codon specifies a particular amino acid that goes into the protein,

Starting point is 00:30:27 and then the codon next to that specifies the next amino acid that goes into the protein, and so on for as many amino acids you just need to build the protein. So if you knew the code, and we now know the code, and if you knew the order of the nucleotides and the DNA molecule, then you could work out exactly what protein would be coded for by a particular stretch of DNA. And if you could figure that out, then you actually might be able to figure out what the purpose of that section of DNA is,

Starting point is 00:30:54 because if you know what the protein it codes for does, then you might be able to work out, then you've worked out essentially what that gene is for. So, for example, if we figure out that this gene, excuse me, yeah, If we figure out that a particular stretch of nucleotides codes for a protein that is used, let's say in digesting a particular type of polysaccharide,

Starting point is 00:31:20 if we figure out that this gene codes for protein that digests this polysaccharide, then we've worked out that the gene, that is that section of nucleotides, basically is responsible for digesting that particular polysaccharide or that particular food substance. So we've worked out what the function of that gene is. Now, in practice, it's a lot more complicated than that, and I'll start to explain some of those complexities a bit later on.

Starting point is 00:31:46 But that's the basic idea. If you can figure out the order of the DNA, and if you know the code, and we have discovered the code, then you can work out hopefully what protein... Well, then you can work out what protein that gene codes for. Then hopefully, if you can work out what that protein does, you can work out what the gene itself is for. Now, this code that I've mentioned, basically,

Starting point is 00:32:05 that's the mapping of codons to which amino acids that they code for is very, very consistent throughout, not just animals or anything like that, but the entire, throughout really all of life, including bacteria and plants and fungi and everything. Some minor variations have been discovered, including in human mitochondrial genes that use a slightly different code. So remember the mitochondrial, mitochondria, organelles found within eukaryotic cells, which are responsible for energy production, well, primarily responsible for energy production, they possess smaller amounts of their own DNA.

Starting point is 00:32:41 Well, it turns out that some of this DNA has slightly different codons or a slightly different coding code than the normal DNA does. And a few other slight variations have been found in some species of bacteria, for example, and plants and a few other things. But for the most part,

Starting point is 00:32:56 the code is very strongly conserved across all of life. And this is actually a very strong piece of evidence in favour of the common descent. Some people say in favour of evolution, that's not really accurate, because it doesn't say anything about evolution per se. It just says that it's very likely that All Life on Earth has a common ancestor

Starting point is 00:33:14 because the genetic code of All Life is so similar. And if you had different origins, there's no reason for supposing why... There's no reason why that would be the case. One other important point that I wanted to make, remember I said that in a DNA molecule, it's essentially like a ladder with two sides and then rungs connecting the sides.

Starting point is 00:33:31 the two sides corresponding to the two separate chains of nucleotides. Each of those chains of nucleotides is going to have a different order of nitrogenous bases. I mean, the order will be related because, remember, I mean, if you have an A on one side, then you have to have a T on the other side. If you have a C on one side, you have to have a G on the other side. So if you know the order of one side, you can predict exactly what the order of the other side will be. However, the exact codon that one side corresponds to you will be different to the that the other chain corresponds to, because the codon is determined by the exact type and order

Starting point is 00:34:07 of the nucleotides. In other words, what I'm saying is if you took all the information from one chain of nucleotides on one side of your DNA ladder, you'd get one stretch of codons and therefore one stretch of amino acids and therefore one protein. But if you took it from the opposite chain, then you'd get a different stretch of codons and a different stretch of amino acids and therefore a different protein. So the two strands of DNA sort of code for the same information, because You know, as I said, if you know one, you can predict exactly what the other one will be, because there's a one-to-one correspondence in bonds. However, they won't make the same protein.

Starting point is 00:34:39 So you always need to have the right side of the DNA in order to get the right protein. If you copy the complementary strand, which is the other one that bonds to, you'll get the wrong protein, and it'll be useless. So all of this information that I've been talking about, about the structure of DNA, about codons, and so on, is sort of summarized in what's called the central dogma of molecular biology. Now, the naming's a bit unfortunate. It sounds like some religious doctrine or something like that.

Starting point is 00:35:06 It's not a dogma. A better name for it might be the central hypothesis of molecular biology. Crick, who made up the term, said basically, didn't know what the word dogma meant, and he thought it just sounded cool, basically. But think of it as a hypothesis, and it's a well-supported hypothesis, or at least mostly well-supported. But anyway, the central dogma says that DNA is used to make RNA, which is used to make proteins.

Starting point is 00:35:26 So, specifically, we take the codons in DNA. these are transcribed into RNA, which then move to a different spot in the cell, and that RNA is used by various cellular structures in order to actually produce the protein itself. So there's a sort of an intermediate step. You don't go directly from DNA to proteins. You go from DNA, make a sort of a copy of that in the form of RNA, and that in turn is used to make the protein itself. And this process is irreversible.

Starting point is 00:35:54 You can't go from a protein to DNA. You always go from DNA to RNA to protein, and that's it. directional. There is such a thing as reverse transcription where you go from RNA to DNA, and this is what retroviruses do. That's the reverse of normal transcription, and we'll talk a bit more about that later. And in vitro, that is in the test tube, we've also observed direct translation of DNA to proteins, which contradicts what I just said, that DNA doesn't go straight to proteins. But I'm just mentioning these to illustrate that there are always exceptions to pretty much anything I say, especially in regards to things like molecular biology, where things are so complicated. But for the most part,

Starting point is 00:36:28 you start with DNA, you transcribe that to RNA, and from that you translate that into the amino acids of proteins. And sometimes you can go from RNA to DNA, but that's about it. So that is what the central dogma says. There's one important final concept that I want to discuss, and this is the concept of the reading frame. Remember, we've said that codons are how the information is structured, basically, in the DNA to code for RNA.

Starting point is 00:36:55 That is, distinct units of three nuclear tides, forming a codon correspond to a particular amino acid. But you might be thinking, well, how does the machinery, or the cellular machinery that mediates this whole process, how does it know where one codon starts and the next codon ends? Because to the cell, essentially,

Starting point is 00:37:13 it just looks like there's a whole string of nucleotides, and it doesn't know where one codon starts and one codon ends. I mean, if you could imagine you've got an A, T, a G or C, an A, a G or C, whatever. If you start at one point, if you start at an A and count three down from that, you'll get one codon, but if you start at the next, you know, if you start at the T just after the A and then count three down from that, you'll get a different codon. And not just for that

Starting point is 00:37:36 codon, but the entire nucleotide strand will now be sort of divided up differently. And so you'll have a completely different stretch of codons, and therefore you'll get a completely different protein from it. So this is, this concept refers, is characterized by the term reading frame. It's like how you read the order of nucleotides, how you break up the codons from each other. and, well, the answer to this is somewhat more complicated, which we'll look at it into more detail when we discuss replication and especially transcription, but the idea is there's no actual requirement to only have one gene in the same spot.

Starting point is 00:38:10 So some viruses, hepatitis B, for example, use overlapping genes in different reading frames. So that is, there are basically three different reading frames, depending on where you divide the codons up. But you could imagine, well, I mean, this is what happens in hepatitis B, the same sequence of, or the same stretch of nucleotides, contains more than one gene, depending on whether you start counting here, and that's one reading frame, or whether you start counting one nucleotide down, which is a different reading frame.

Starting point is 00:38:38 I don't know if that's been observed in humans, I don't think it has, but it could well be the case. And this is another example of biology being very messy, because you can have multiple overlapping genes in the same spot. And so this is another sort of exception to the central dogma, where it's, excuse me, an exception to the one gene, one polypeptine hypothesis. This would be a case where essentially there's, well, you could say there's two genes because it depends upon what reading frame you're using, but they're in the same spot on the DNA strand,

Starting point is 00:39:09 and they code for different proteins. So that's an extra complication. And this is also important, the existence of a reading frame, it was very important for the discovery of the codon, because what they did was that, you know, they started with a stretch of DNA, and got the, you know, transcribed and translated that into a protein, so worked out what protein it produced, and then added one nucleotide, so they shifted the reading frame down by one, and then got nonsense,

Starting point is 00:39:34 or, well, they got another protein, but it was completely different to the one before. Then they added a second nucleotide, shifting the reading frame down again by a single unit, and they still got more nonsense. But then when they added a third nucleotide, they got back the original protein that they had produced. when they hadn't, before adding any additional nucleotides, except there was just one extra amino acid stuck in at the start there where they'd added those extra nucleotides in. So what that means is that the first two additions of nucleotides shifted the reading frame, but the third edition shifted the reading frame back to the initial reading frame, and so you got the initial

Starting point is 00:40:09 protein back. The only difference was there was now one extra codon in there, so you got one extra amino acid out. And that was basically how they proved that this codon concept and the 1-gene-1 polypeptide hypothesis held. The way they were out the genetic code, by the way, was basically that they just got segments of DNA that were all the same codons and figured out what amino acids these corresponded to. So, you know, you just get a whole section of you, you, you, you, you, another codon. So they're all the same codon, and you figure out what amino acid that produces, and then you try G, G, G, G, G, G, G, and what does that produce. And you go on for all the different combinations, and that's how we've worked out

Starting point is 00:40:46 what the genetic code is. Obviously, the details of how that done are somewhat complicated, but that's the basic idea. You just try it all the combinations of codons and see what amino acid they produce and that tells us part of the code. Okay, so I expected this would happen. I'm already basically running short on time and I'm only about maybe halfway through my notes. This is a pretty complicated topic with a lot to it. So I'm going to split this podcast into two. So in this episode I've covered DNA structure and the genetic code, including the central dogma, reading frames and so on. In the next part to this episode, I'll talk about replication, transcription, and translation. So that's going through the central dogma that DNA makes RNA makes proteins in more

Starting point is 00:41:25 detail, looking at each of those steps, how you go from DNA to RNA to proteins. And so far I've said that, you know, the molecular machinery reads the DNA and uses that to translate a particular codon into an amino acid, which then forms the protein and that goes off and does the stuff. I haven't explained how that happens. And that's what we'll look at in the second episode, that the processes of replication, transcription and translation at a biological, biomelecular scale. Okay, so it's a pretty dense episode. Hopefully you enjoyed it, or at least your brains didn't fall out too much. Once again, if you like this podcast, send me an email, Fons12 at gmail.com. And I also have a very important announcement. The Science of Everything podcast is now on Facebook.

Starting point is 00:42:07 Go to www.com slash the Science of Everything podcast. That's all one word, The Science of Everything podcast. And on that Facebook page, you can get updates about future episodes and what topics are going to be discussed, and I'll also post additional links to audio and potentially pictures and video content that will supplement the podcast material. Also, you can visit the podcast website at FODs12.podbean.com. On that website, I've just posted up a new page called Links, which you can take a look at to the right sidebar of the website, and on that page I'm posting up links to various other websites that have video lectures and audio lectures about science topics. There's similar sort of things that I talk about in the podcast. Also, in the future, I'll be

Starting point is 00:42:54 putting up more links to various websites that cover themes like astronomy websites that have pictures or information or chemistry and all those sorts of things. There are also a bunch of online textbooks with information about those topics that I'd like to post up in the future. So keep an eye on that and check those out. So thanks for listening and I'll talk to you next time.

The Science of Everything Podcast - Episode 34: DNA Structure and Function Part 1

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.