Dwarkesh Podcast - David Reich – Why the Bronze Age was an inflection point in human evolution

Episode Date: May 8, 2026

David Reich is back.He and collaborator Ali Akbari just published a paper that overturns a long-standing consensus about human evolution — that natural selection has been dormant in our species sinc...e the agricultural revolution.By scaling ancient DNA sequencing and developing a new statistical method, they found that selection has actually sped up.Selection went especially bonkers during the Bronze Age (around 3,000 years ago).That’s when gene frequencies for everything from immune function to body fat to intelligence were most in flux.Over the last 10,000 years, selection pushed the genetic predictor of cognitive performance up by roughly a full standard deviation — most of it between 4,000 and 2,000 years ago.After we finished recording, David sketched out on a whiteboard his new heretical model about who the Neanderthals really were. Luckily, I took out my iPhone and managed to record it.He thinks the standard story (that Neanderthals are some separate archaic lineage we interbred with a little) just doesn’t fit the evidence. Instead, he proposes that Neanderthals are essentially genetically-swamped modern humans.A small population somewhere around the Caucasus invented Middle Stone Age technology roughly 300,000 years ago and expanded outward. The ones that moved into Europe interbred with local archaic humans, got genetically swamped, and became Neanderthals. The same expansion went into Africa, met much more diverged archaic Africans, and that mixture became us.This means Neanderthals and modern humans share the same cultural ancestry — the only difference is which archaic humans they mixed with afterward.David is a brilliant and rigorous scholar. It was a real delight to learn from him again.Watch on YouTube; read the transcript.Sponsors* Cursor was super useful as I prepped for this episode. Whenever I had a question, I’d have Cursor kick off a few different models simultaneously and then compare their responses. I found that this led to better results than I could get out of any individual LLM. If you’ve only used Cursor for coding, you should try using it for research. Check it out at cursor.com/dwarkesh* Jane Street uses an internal currency called “hive bucks” to allocate compute through a real-time auction – and anyone can change anyone else’s bids or even kill their jobs! Everyone just trusts each other to act in the firm’s best interest, which is what lets the system work in the first place. If this weird and high-trust culture sounds like your kind of thing, Jane Street’s hiring at janestreet.com/dwarkesh* Crusoe’s ML infra team built fastokens, an open-source tokenizer that delivers a ~9x speedup over Hugging Face and up to 40% faster time-to-first token – on real production workloads! Crusoe achieved these results by parallelizing things and using some clever engineering to handle duplicates without cross-thread coordination. Learn more at crusoe.ai/dwarkeshTimestamps(00:00:00) – Ancient DNA suggests strong selection over last 10,000 years(00:15:45) – Natural selection intensified during the Bronze Age(00:35:02) – Why didn’t evolution max out intelligence?(00:57:21) – Evolution is limited by time, not population size(01:09:02) – Why no farming before the Ice Age?(01:17:13) – The Neanderthal puzzle David can’t stop thinking about(01:54:10) – The methodology behind this breakthrough Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Transcript
Discussion (0)
Starting point is 00:00:00 I am back with David Reich, who is a professor of ancient DNA at Harvard. How do you describe what it is that you study? I'm a geneticist, and I work on human history and how people relate, ancient people relate to each other and people living today. Great. And so we did an interview, was it two years ago at this point, which ended up being one of the most popular interviews I've ever done. I think people just found really compelling that there's so much about human history
Starting point is 00:00:29 we don't know and are just learning about now as a result of the kinds of techniques that your lab is using. And you have a new preprint that's very exciting, and I wanted to talk to you about it. So let's begin. Can you give me a little bit of context and what we're talking about today? Well, the dream was that when this field started, this ancient DNA field started more than 16 or 17 years ago, that we were going to learn a lot about biology, learn about how people's biology changed over time by getting DNA out of ancient human remains and tracking changes over time. And that dream has really not been realized since the beginning of this field. So while the field's been a big success with regard to learning about human history, it's resulted in surprising findings about human migrations,
Starting point is 00:01:18 people not being descended from the people who lived in the same place hundreds or thousands or tens of thousands of years before and mixture being common in human history, sex bias processes being common in human history and things that were not expected from archaeology. And so the field's been a big success from that perspective. But what's not been successful is learning about biology and biological change. And one big reason for that has been that the sample sizes have been too small.
Starting point is 00:01:45 So when you have a single person's DNA, it provides a tremendous amount of information about history. And that's because when you look at one person's DNA, it's not a single person. It's many people. It's your two parents. It's your four grandparents. It's your eight great-grandparents and 16-great-grandparents and so on.
Starting point is 00:02:03 And going back in time, thousands, tens of thousands, even hundreds of thousands of ancestors going back in time contributed to people today. So when you look at the DNA of a single person's genome or a Neanderthal genome, you have effectively tens of thousands of ancestors all represented in your data. And you can position that individual exquisitely with respect to other people from whom you have data. But when you are interested in how a particular genetic variant that affects something like your skin pigmentation or affects your ability to digest cows milk into adulthood or affects a behavioral trait, when you want to see how that changes over a time, a single person gives you only one sample or maybe two samples, the one that is in their mother and the one that's in their father. And so to get a high resolution picture of how the frequency changes over time, you need to have very big sample sizes of truly very large numbers of people. and we just didn't have that until the last few years. So what motivates this study that we're, I think, talking about today and the work that hopefully another number of groups will be doing in the coming years
Starting point is 00:03:06 is the fact that we now finally have those numbers and we can do something with the data to see how frequency changes over time. Can I ask a question? I'll be asking a lot of value questions through the next few hours, but why are frequency changes especially interesting? So what we're interested in is using the experiment of nature that's occurred in our history over the last tens of thousands of years to understand what's biologically significant in our DNA. And if there has been a change in environment that
Starting point is 00:03:39 a population has experienced, for example, people have shifted to agriculture or begun living close to domesticated animals or move to a new environment from a cold place to a warm place or a low place to a high place, then there's pressure on the population to adapt to a to these new stresses, these new needs. And the way you're going to detect that is you're going to see that the frequency of a genetic variant that, for example, might allow you to live at higher altitude, for example, or that might sort of nudge you
Starting point is 00:04:09 to have a different behavioral pattern that might be advantageous in the new situation. That genetic variant might push systematically in some direction in a way that is enough that you can detect it. Now, it's very hard to detect slight shifts in frequency by a few percent or a 10 percent unless you have a very, very big sample size. And so what we're looking for are those changes and frequency that are too extreme to be due to chance, and that will tell us that there have been pushes against the biology as a result
Starting point is 00:04:37 of the changes in environment that people have experienced. Interesting. Okay. So what did you guys find? So seven years ago, Ali Akbar, who at the time was a postdoctoral scientist in my laboratory, and a few years later became a permanent staff scientist in my lab. laboratory set out to use the data that we were producing to learn about biological change over time. And I think the reason he was interested in our laboratory rather than other places was that a focus of our laboratory has been generating truly large amounts of data from ancient humans. We've been really trying to industrialize the process, make it very inexpensive, make it high
Starting point is 00:05:14 quality, and generate large numbers of samples with lots of good data for this purpose. So there's been this large amount of data that we've generated, and it made it possible to conceive again of asking the question about whether there's been frequency changes over time. So the mainstream view in human evolution in the last several decades has been that natural selection has been pretty quiescent over the last several hundred thousands of years of human history. And there's several lines of evidence that have been deployed to document this. One is that if you compare diverse populations from different continents around the world, for example, Europeans and East Asians, and you look at mutations that differ in frequency between these groups, all mutations differ a little bit in frequency, sometimes a lot. You can say, what are the most different mutations in terms of frequency between Europeans and East Asians? And there's almost no genetic changes that are 100% different in frequency between Europeans and East Asians. So Europeans and East Asians descend from a common ancestral population 40 or 50,000 years ago, that came out of Africa in the Middle East. This population had a set of gene frequencies, genetic frequencies, and these variants bopped around randomly,
Starting point is 00:06:26 the process known as genetic drift, or perhaps under selection in one direction or another. And the time that's passed since 40 or 50,000 years ago is sufficiently small on an evolutionary time scale that there's just not much genetic differentiation on average between these two groups, Europeans, and East Asians. But however, if there has been natural selection, for example, to help people in one place digest alcohol better, or, for example, digest milk better,
Starting point is 00:06:53 or do something else better. What you might expect is that there would be some mutation that would have rocketed up to very high frequency. And 40 or 50,000 years is a lot of time. It's maybe 1,500 or 2,000 generations. And so that might be enough time easily to see 100% different in frequency. And yet you don't see any more compared to what you'd expect by chance. So this made it seem that just selection has been quiescent, maybe a few hundred thousand years ago, the ancestral human population got to some kind of optimum. And after that, there hasn't been much genetic change in one way or the other. And there's been small amounts of natural selection, or there's been selection to remove bad mutations that are constantly raining down on the genome, but not what we call directional
Starting point is 00:07:34 selection, which is newly arising mutations or mutations being pushed in a systematic direction to help the population get to a different adaptive set point that's more favorable for the conditions that population is living in. So we were able to partition how much of the changes and frequencies of all the mutations that we're seeing in the DNA, we're looking at about 10 million positions that vary, is due to directional selection, adaptation, versus other factors, especially genetic drift.
Starting point is 00:08:07 And 98% of it is other factors, especially genetic drift. So it's overwhelmingly migrations and population structure causing fluctuations and frequency. And as a result, it's super hard to actually detect the signals of natural selection, in adaptive natural selection because there are a tiny fraction of the total frequency change. The vast majority of it are these migrations and mixtures. Nevertheless, there's so much natural selection as our study thinks has shown that, in fact, it's been rampant in the genome. Can I ask a clarifying question here? So why are we discounting population?
Starting point is 00:08:40 admixture or replacement as selection. Because if you think about it at a group level, if one population replaces another population, isn't that selection? I remember from the last episode, you were explaining how there's been huge changes in what kinds of people are in a specific area. One population came in and kind of replaced the previous one,
Starting point is 00:09:00 and then a new population came in and replaced the previous one. And to the extent that the genetics are relevant to why that population replaced the other one, why should that not count towards, what we understand to be selection over the last 10,000 years. It could count and may count and probably should count in some respects. But it could also be that this population replacement is due to some cultural phenomenon, technology held by one of these groups, not others.
Starting point is 00:09:27 And maybe there's the genetic mutations that are contributing to this. Who knows, it's possible. But what you're seeing is a whole genome shift. And so what we're looking to see is whether there's one place in the DNA, that is driving the change in a way that's different from the rest of the genome. And really, from a statistical point of view,
Starting point is 00:09:45 what happens at these times of migration is there's just huge fluctuations and frequencies, and these are extremely uninformative times for looking and detecting natural selection. The best moments to detect natural selection is when migrations and population admixtures are not happening for a few hundred years.
Starting point is 00:10:03 And during these times, you can actually see the mutation slowly blowing in one direction as a result. Really, the way we think about the history of Europe and the Middle East and the way we think about it for the purpose of this study is as an archipelico of little populations in space and time, each of which are pretty isolated from each other. So a little population in Britain, isolated for a few hundred years, a little population in Hungary, isolated a few hundred years between big events of migration and mixture. And in each of those little experiments of nature, we can ask, does this mutation slightly increase in frequency? Does that same mutation slightly increase in frequency? And if all the arrows point in the same direction, we win.
Starting point is 00:10:41 And they're telling us that natural selection is occurring. So, for example, 4,500 years ago in Europe, almost all mutations go through huge frequency changes. And that's not because of natural selection. It's because of the step migration from the step north of the Black in Caspian Sea, 40%, 50%, 80% of the DNA becomes Yamaya from steppestar lists. And their frequencies of mutations were different, not because of selection. necessarily, but just because they had evolved in different places for thousands and tens of thousands of years. And then if you look at the descendant populations, there's huge changes in frequency. And it's very, what you need to do is see, oh, is natural selection explaining a shift more than you would expect by chance?
Starting point is 00:11:21 Okay, in this next section, David explains the nitty-gritty of the methodology of this paper. It's honestly a bit technical, and I wanted you to get a sense of the results first. So I've moved that section to the end. If you want to understand the methodology, just stick around for the full episode. Okay, you found these locations that seem to be under selection. Another clarifying question. So you have, you say, 3,800 locations which we were 50% confident are, have been under selection in the last 10,000 years.
Starting point is 00:11:51 7,200, which where were 50% confidence. Oh, sorry. So I think we're getting about 7,200 positions in the DNA that have 50% confidence of being real. Yeah. So only half of those are real. So 3,600, which don't know which ones. So 3,600 of them are real.
Starting point is 00:12:07 Okay. And does that also mean that outside of those 70200, you're confident the other location that genome are not under selection? No. Okay. So if you look at the 25% probability cutoff, there will be tens of thousands. And there will be many real ones there too. In fact, multiple analyses we do suggest that the genome is vibrating with natural selection.
Starting point is 00:12:31 And there's all sorts of weaker effects that are there that would be picked up in larger studies even than we've done. And that, in fact, almost every position in the DNA is correlated to a position and being dragged in one way or the other by natural selection. Instead of being quiescent, natural selection is everywhere. Even though it's only 2% of the frequency change, it's tugging the positions in one direction or the other everywhere. So we analyzed these positions that we had identified, these hundreds of positions, the ones we were super confident about. And we looked to see whether they were randomly distributed in the DNA or whether they had patterns. And what we did is we looked at maybe 100 or so traits where there had been genome-wide association studies for all sorts of different traits, like ones associated with immunity or autoimmunity or behavior or metabolism and basically other things.
Starting point is 00:13:27 And for each of these, we could ask, are the genetic variations that are known to affect these traits from genome-wide association studies? Do they have an unusual number of genetic selection signals? And what we found is there was a vast enrichment by about a four or five-fold for immune traits. That is, there was a super-concentration of selected signals in immune traits. Whereas also we saw a strong enrichment for metabolic traits, things that might have impacted obesity or fat traits or type 2 diabetes, and really almost no detectable enrichment, as far as we could tell, for behavioral traits or for psychiatric traits. And just to make sure I understand, this is not to say that behavioral traits or psychiatric traits or cognitive traits are not under selection. It's just that the individual sites where such traits are controlled are not especially likely to be among the locations that you've identified as under selection. Yeah, that's exactly right. So it might seem from the results of that analysis that,
Starting point is 00:14:33 that in fact, immune traits are highly selected and that there's been no selection for behavior in the last 18,000 years in this part of the world. But in fact, that's a wrong conclusion, and in fact we have evidence that that's a wrong conclusion. And in fact, there's clear evidence of selection also on behavioral traits. And the reason we think we see, and we have evidence that this is so, much weaker signals for behavioral traits, is that behavioral traits we know from other studies, medical studies, are underpinned by much larger numbers of genes than immune traits, which are underpinned by relatively small numbers of genes of strong effect. Behavioral traits are shaped genetically by very large number of genes of weak effect, and we just don't have the statistical
Starting point is 00:15:17 power to detect these very weak signals there. So when we do an analysis where we look at our very strong signals of selection, that collection of very strong results is very effectively querying the immune traits, but is not very effectively querying the behavioral traits. It may still be the case, and I guess it is, that immune traits are the most selected category, but it is not at all the case. And in fact, we can prove it's not the case that behavioral traits are not selected. So we think there's two reasons why natural selection has, we've been able to prove, really, that there's two reasons why how to reconcile the previous observations with our new observations.
Starting point is 00:15:58 Remember, the previous observation is that natural selection seems to have been quiescent over a time scale of hundreds of thousands or many tens of thousands of years. Reason that you don't see 100% different in frequency variants across Europeans and East Asians. So now we're seeing hundreds of positions that are rocketing up in frequency with selection rates 1% or more in a lot of cases. So 1% or more selection rates will mean that, there'll be a rapid doubling over periods of dozens of generations. And so over 1,500, 2,000 generations, like you see separating Europeans and East Asians, shouldn't you see many genetic variants that are 100% different in frequency across populations? So we were able to show that
Starting point is 00:16:38 this is explained by at least two factors. So one of them is that we actually, in this part of the world, Europe and the Middle East, are in a period of accelerated natural selection. And one way to see this is to look at this enrichment pattern that we're observing, where immune traits are unusually associated with these selection signals. And we could compare the last 5,000 years of our time period, what's called the Bronze Age and further onward, to the previous 5,000 years. And what we see is that this intensification of selection around immune traits, similarly, the intensification around metabolic traits, has accelerated over this time period. So it's not like natural selection has been at the same rate over all places and times. In fact, it's increasing over the time period we're analyzing. And so plausibly, the whole time period is increased compared to previous periods.
Starting point is 00:17:27 So we're in a period of intensified selection. That's not implausible because this is a population that went through a huge shock in terms of the way people live in the culture. So this is a population that almost everybody were analyzing our farmers or food producers in one way or another. Farming was invented for the first time anywhere in the world in the Middle East, 11 or 12,000 years ago. The people who invented farming exploded into Europe after 8,500 years ago and spread across Europe and expanded rapidly. In the Bronze Age, there was an intensification of how people lived with much higher population densities, people living more and more next to their animals and getting their diseases and exchanging their diseases with them and with each other.
Starting point is 00:18:08 And so this is a period of rapid, rapid change in terms of how people are living, resulting in. in different biological needs of this population. So it's not surprising, perhaps, that in the context of these dramatic changes, the biology of the population might be not in the ideally adapted position. That is, that there might be what some people call an evolutionary mismatch, where you take a genetic variation that's evolved in hunter-gatherers and put it into farmers or pastoralists,
Starting point is 00:18:42 and it's not exactly right. And so what you're seeing is the DNA of this population, which is descended from Hunter Gathers only 10,000 years ago, reacting to the shock having been moved into an agricultural and Bronze Age and high population density in urban environment. And a hypothesis is that what we're seeing is the adaptation that occurs as a result of that. Interesting. Okay, so it might be helpful to, in a paper, you have many examples of this intensification of selection around the Bronze Age.
Starting point is 00:19:10 And so feel free to navigate it to your stuff. but it might be helpful to go through some of these examples. So we look, one of the things we do in this work is we look carefully at many, many of these positions in the DNA. We actually have an internet browser that you could look at called the Aege's browser that Ali and a colleague who's a co-author of our paper built that allows you to query each of these 10 million positions and see the trajectories at each position and the evidence for selection.
Starting point is 00:19:39 And one of the things that we see is that, well, for the most part, the signals of natural selection we detect are consistent with being constant natural selection over time. In a handful of them, we're able to see that there's been a reversal or a radical change in natural selection. And very often that occurs in the period between 5,000 to 2,000 years ago, which is the Bronze Age and the Iron Age, a period of rapid population growth and rapid movement to intensive use of. of many technologies that were not used that way before. So an example of this is the tick two genetic variant. That is a major risk factor for severe tuberculosis, which is the major infectious disease, the most important infectious disease killer in the world today.
Starting point is 00:20:26 And if you look at this major risk factor for tuberculosis, this variant rockets up in frequency from eight or six thousand years ago to maybe nine or ten percent in this part of the world. and then it rockets down in frequency in the last 3,000 years. In both cases, there's very clear evidence of natural selection in the first case to increase in frequency, and then in the next case to decrease in frequency. And a possible reason for this is maybe the spread of tuberculosis
Starting point is 00:20:53 maybe becomes endemic in the population two or three thousand years ago that's potentially consistent with a pathogen sequence data and other lines of evidence, and maybe this variant was protecting against something before then, but then tuberculosis became significant after that point, and it was so bad that it pushed in the opposite direction. That's speculative. Oh, interesting.
Starting point is 00:21:14 And the thing it was protecting against was probably another disease. Maybe. Repting for this episode required a full lit review. I needed to understand why other methods had failed to find evidence of natural selection over the last 10,000 years. What exactly did Reich and Akbari do differently? Honestly, this was quite subtle, because the most important points were distributed across a bunch of different
Starting point is 00:21:36 papers. And it was frustrating to talk to other lums about it because they kept getting confused. One of them would fail to understand an important crux. And so I switched over to a different model and that one would get tripped up on the very next point. I ended up using cursor to kick off a handful of models at the same time and compare the results after. I could have one model critique the response of another. This was super useful because while I'm not a geneticist, I do have enough taste to be able to say, hey, this answer makes sense. These ones don't. I also had cursor turn this work into a flash cart so I could retain what I learned. Cursor started as a programming tool, but I found it really great for this kind of research.
Starting point is 00:22:12 There's no other interface where I can get answers from a bunch of independent LLMs, all while reading the relevant paper on the same screen. Go to cursor.com slash thwartash to try it out. One of the big takeaways for me from the paper was just that something weird happened in the Bronze Age, and that, as you said, we're not, like, across trade after trade, the selection intensifies during the Bronze Age. And this makes sense for some things. For example, why do we see lactase persistence?
Starting point is 00:22:44 We're adults in process, milk. Why is that intensified during this period? Oh, well, it makes sense. This is the time when we start using cattle, not just for the meat, but then also for milk and wool and other secondary products. So it makes sense this is why lactose would matter. Lactase persistence would matter more. But then there's other things which seem like they should have been
Starting point is 00:23:05 relevant since the dawn of agriculture. I forget the exact name of the allele, but was it Fad S1? Yeah. Which helps convert plant fatty acids into long-chain fatty acids that your body needs. And that's absolutely irrelevant when you move from a diet of meat as a hunter-gatherer to a diet of cereals. But why that is also one I think you found was under a special selection, or especially high selection during the, you know, 5,000, 3,000 years ago. Yeah.
Starting point is 00:23:34 So what's going on? Why is the bronze jades so special across all these different traits that you're observing? Right. So Fads 1-2, this variant, it's sort of a vegetarian-slash-meeting adaptation. And already in work prior to this, actually Ian Matheson, who was a former colleague who worked with me in 2015, identified this as a very strongly selected variant. And it's actually been ancient. You see copies in archaic humans, too.
Starting point is 00:24:04 One of the findings of our paper is the ABO blood system. You know, you get your blood typed, it's AB&O. The B variant has increased up to 10% at the expense of A. But previous work has shown that A and B were both already present in the ancestor of humans and gibbons, you know, other other apes. And so these mutations, some of them have been going back and forth and fluctuating over time in different time periods. But we're talking about changes in the Bronze Age. So this tick two variant for tuberculosis risk, multiple sclerosis risk variant, inflected and increased in frequency before the Bronze Age, and then two or three thousand years ago reversed at that period. And there's differences in Northern Europe where this process is super strong, very strong positive selection, very strong negative selection.
Starting point is 00:24:52 And then in Southern Europe, only a little bit and not even very strong negative selection. for hemachromatosis, which is iron-pathogenic iron build-up that causes problems in Europe. That, too, has reversed around this period. In some of the complex traits that maybe we'll talk about later, these traits too have periods of intensification, of natural selection. For example, depigmentation, which is the Europeans have depigmented, gotten lighter skin over the last 10,000 years. You can see it in our data. The period of strongest depigmentation is between about 4,000 to 2,000 years. years ago, and then after that it's much less. And so this seems to be a very impactful, eventful,
Starting point is 00:25:32 important period where a lot of the processes that we are seeing become very powerful. And it's surprising on first principles. You might think, before you walked into this genetic data, that the big change is going to be starting to grow plants and maybe farm animals. That happens in the Neolithic, you know, beginning 11 or 12,000 years ago and spreads into Europe after 8,500 years ago. But actually the intensification happens like 5,000 years ago, 4,000 years ago. And so it's really interesting. This observation of that being a key point, that being an inflection point, tells us something about when humans, at least in this part of the world, were wrenched into a way of living that was so different from how the hunter-gatherer ancestors lived, that
Starting point is 00:26:16 the organism had to adapt very strongly. And that may be the degree of that wrenching process, moving into the Bronze Age was qualitatively greater than the degree of the wrenching process that happened from the initial transition to growing plants. Which is surprising because our cartoon picture is that the big transition is farming, but the genetic data, the biological readout, is saying our genome is reacting much more strongly to these events that happened 5,000 years ago. So you did some work with Batia and many other colleagues in 2014. You were looking at 20,
Starting point is 00:26:52 30,000 African-American genomes today. And you were saying, look, there's some percentage, 80% West African DNA and then 20% European DNA. And can we look at their genomes today? And do we see that their allele frequencies are much different than what you just expect from this ad mixture? And you find, correct me if I'm wrong, but you found that they weren't. That is to say that over 200, 300 years of extremely intense environment change, you know,
Starting point is 00:27:22 going from, you know, chattel slavery and, yeah, completely new environment. There's no effect of natural selection. And so we see episodes like this where we don't see natural selection, but then the Bronze Age apparently must have had an even stronger effect where the change in environment is even stronger than what we see from Africans in Africa than being transmigrated to the new world and then living under slavery. That may be the case. It also may be the case that that period is just too short to see much effect.
Starting point is 00:27:57 So what you're looking for in the Bata'a adol paper, where we looked at about 30,000 African Americans and looked to see whether there is, instead of the average percentage of maybe around 80% West African ancestry, in some places in the DNA more than 80%, in some places in the DNA less than 80%, significantly, as you would expect, if there was natural selection from some genetic variant from Europeans or from Africans,
Starting point is 00:28:22 we didn't see any place in the DNA that was significantly different from what you'd expect by chance. And so one possible explanation for that is just that there's only a handful of generations, maybe five, over which the natural selection would operate. And so maybe if the selection was 2% a generation, you would still only see maybe a 10% compounded effect,
Starting point is 00:28:42 and there's just not enough time to detect it. But the Bronze Age is not 300 years, It's 3,000 years. It's the power of compound interest, and you have enough time to begin to see a strong effect. But this really, really, really does seem to be a very impactful time in terms of human history. And you can see it in our complex traits. So, for example, if you look at pigmentation, for example, which is the strongest signal of selection for a complex trait in our data set. So you look at genetic mutations that are known to affect pigmentation.
Starting point is 00:29:16 you add up their effect across all of the DNA, so there's dozens or hundreds of them, and you look to see in what time are as the natural selection strongest, and the time period is really 2,000 to 4,000 years ago. And for some of these other traits as well, you see, again, the time period over which the selection is strongest is 2,000, 4,000 years ago. So, for example, if you look at genetic variants that affect measures of cognitive performance, for example, such as performance on intelligence tests in people in white British people today. So this is, of course, a very strange trait to measure in the past
Starting point is 00:29:57 because there were no intelligence tests and there was no school. But it is a predictor today, and you could look at how it's changed in the past. And we see very strong natural selection for this combination of genetic variance that predicts people's performance on IQ tests and also is highly correlated to the predictor that predicts the number of years of school or the household wealth of people. all crazy traits in the past because there was no wealth in the past, there was no school in the past. But if you look at the predictors today, there is a strong movement in a systematic direction,
Starting point is 00:30:28 a large effect about a standard deviation on the scale of modern variation. Then we can do this trick of looking to see whether there's periods of time when this natural selection has occurred more intensely or less intensely. What we do is we drag a 2,000-year window through our data and we repeat our whole analysis, not on 18,000 years, but just on a short 2,000-year window. And we can measure the strength of selection in each of these 2,000-year windows. And what you see when you look at intelligence is you see that this maxes out in the Bronze Age between 5,000, 4,000, 3,000, 2,000 years ago.
Starting point is 00:31:01 And the impact in the last 2,000 years, is almost nothing. There's no evidence of natural selection at all. You might think your bias coming into this, my bias perhaps, if there's any signal of natural selection on this trade at all, might be that it would be unusually strong in the last 2,000 years. Maybe this is a time of industrialization. Maybe this is a time of greater need for this particular trait. But in fact, there's no evidence of natural selection at all in the last 2,000 years, but there's very strong evidence in between 2,000 and 4,000 years ago, where instead of a one standard deviation
Starting point is 00:31:32 strength of selection, it's a two standard deviation strength sort of averaged over this time period. And the standard deviation here is how much the polygenic score for the trait, it's self-move's? How much the polygenic trait moves over a 10,000-year period within a population that is held constant in terms of its ancestry? Because what's actually we're doing is we're looking in our data set at a kind of heterogeneous group of people. There's Southern Europeans and Northern Europeans and Hunter-Gatherers and
Starting point is 00:32:05 farmers, and at different times in the past, those groups are more or less represented. So the whole strength of the methodology, Ali Akbar developed, is it corrects for that changing ancestry over time? And as I mentioned before, really what's being asked here is we've divided up our whole data set into an archipelago of little populations in different places in space and time. And we're asking in each place and space and time a little pocket of people in Britain from 4,000 years ago to 3,500 years ago, a little pocket of people in Hungary, a little pocket of people in Italy from 2,000 years ago to 15,000. 1,500 years ago, in each of these places where the ancestry is relatively similar without being
Starting point is 00:32:47 too disrupted in that short period by migrations, we watch to see if the genetic changes blow in the same direction. And what we're doing here is we're measuring the strength of selection at each point in time after correcting for the big population changes that have occurred. Okay, so the effect here is huge then, because like if you're saying one standard deviation, a standard deviation above the median would be somebody in the 85th percentile. So you were saying that the effect of selection
Starting point is 00:33:17 has been so strong that compared to 10,000 years ago versus now, you know, the median has gone to the 85th percentile. And that's just like a huge effect over the last 10,000 years on something like intelligence or the thing that predicts household income or whatever. So these things like, especially given that this is only,
Starting point is 00:33:40 2% of the change in the low frequencies, and then the 98% is coming for migration. So then it's sort of stupendous to think about, like, well, what is the impact of migration then? If this alone can explain, or is driving a standard deviation change in these kinds of qualities, at least among the kind of variation we see in the world.
Starting point is 00:33:58 One thing you can see in the data is the migration impact is huge. So, for example, if you look at the trajectory for measures of cognitive performance like scores on intelligence test in white British people today, but you look at the predictor of that in people in ancient times, the estimate for the hunter-gatherers of Europe
Starting point is 00:34:16 is like three standard deviations below the modern mean. So that's hugely different. And then you see a huge jump from them to the hunters, to the farmers who are like at the mean, at zero. And that's migration. So what you're seeing is those two groups had different set points for those traits. And then the step-pass store list have a lower set value of this.
Starting point is 00:34:39 And so you see huge fluctuations in the predictor of this trait over time. That doesn't prove selection. What that is just telling you is migration. But what our test is telling you is, in addition to those fluctuations due to migration, is there a consistent effect of natural selection blowing the trait in the same direction over all places at times? And that's what we're detecting. Yeah. So there's this person who has a theory collective intelligence hypothesis,
Starting point is 00:35:05 which is this idea that, um, the selection for intelligence has actually been in the opposite direction that as society has developed, there's been more specialization. If there's more specialization, each person only needs to understand a smaller and smaller part of the world. And therefore, actually, the ancients were much smarter than us. And we've sort of evolved out in intelligence. And your results seem to point in the opposite direction that, although there's not been a selection in the last 2000 years as society's
Starting point is 00:35:39 gotten more complicated, at least when society began, there was more need for the kind of thing that predicts intelligence today. And the reason that's surprising is if you think about hunter gatherers, yeah, reading your colleague Joseph Hendricks' book, the amount of information
Starting point is 00:35:54 that they needed to hold onto and assess everything from how to process food to how to build shelters, fire, etc. compared to my world where I got to know how to set up mics and ask questions. It's just like, it seems like the demands of intelligence should have been like way higher in the ancestral environment. And so it's
Starting point is 00:36:15 very surprising that the beginnings of civilization increase the selection on intelligence. Right. So, you know, this is the power of data, right? Like, you know, I think Joe, if you asked him prior to this work, what the hunter-gatherer selection would be and where their set point for, you know, this particular trait would have been, you know, I think he probably wouldn't have made a very strong prediction, but he would have said, well, maybe you would have expected to have a high predicted value of this trait because these people were really having to do a lot of things and figure a lot of stuff out, maybe. And that maybe once you have more complex societies, there would be more of a collective brain, and maybe there'll be selection against this trait. And in fact, it's sort of
Starting point is 00:36:58 the opposite in some ways. So it's the power of data. It's not what you expect. And, you know, after looking at this data, it's actually the value of data to try to make sense of all these things. It's very interesting, like the genetic predictor of intelligence, there's lots of kind of things that are confusing about it, so it's actually worth talking about it, or the genetic predictor of years of schooling, which is highly correlated to it and has measured even better. So if you look at the genetic predictor of years of schooling, there's another amazing study from 2017 from a group in Iceland that looked at this measure over the last hundred years, in Iceland. And it looked at older people, and it looked at younger people, people born more
Starting point is 00:37:37 recently in Iceland. And there's an estimated 0.1 standard deviation decrease in genetic predictor of intelligence in Iceland just within one century is an absolutely huge effect over a short period. And this is selection against years of schooling. If I said intelligence, I didn't mean to. It's selection against genetic predictors of numbers of years of school. And so one possible interpretation of this sort of hand wavy, is that actually what's being measured here is not selection for years of schooling or for actually real intelligence, but for another trade altogether that's correlated to both of them. So, for example, the predictor of numbers of years of schooling is very, very strongly correlated to the age at which women have their first kid. And if you control
Starting point is 00:38:23 for that for numbers of years of schooling, all of the signal of years of schooling goes away. So maybe what you're measuring is women's decision about when to have children. And if you have children earlier, you don't go to school as much. If you have children later, you go to school more. Maybe it's some kind of measurement of delaying gratification or putting things off or planning. The same trait is correlated to body mass index, to obesity or to walking pace. So is this really like intelligence as we think about it? Or is it something else that manifests itself differently in different times?
Starting point is 00:38:59 in the past. Yeah. Okay, so obviously a trait like years of schooling was not itself a meaningful thing in the past. And the underlying things for it seem to have been under strong selections. Whatever in the genome predicts, years of schooling, seems you have been under strong selection. And how should we think about this? Like, what is the actual thing that's changing in the genome? Yeah.
Starting point is 00:39:23 Well, I think that there's two things going on that you need to think about. So one of them is that years of schooling is connected to so many other things genetically. So if you look at the genetic predictor of years of schooling, this trade has been measured in millions of people now, it's actually correlated to really, really surprising things. It's correlated to the age at which women have their first kid. It's correlated to people's obesity. It's correlated to people's walking pace. It's correlated to people's household wealth. it's correlated to a variety of other traits that seem quite different from it.
Starting point is 00:39:58 So if you think you're actually measuring years of genetic prediction of intelligence or actual studiousness or something like that, you should think again because there's many things that it's correlated to. There seems to be some kind of general trait that maybe you could think of as executive function or maybe propensity to defer gratification or something. or I'm just waving my hands that is under selection and it pushes all these traits
Starting point is 00:40:26 in the same direction one way or the other and in different times in the past it's advantageous or disadvantageous. But when we found this signal of years of schooling being increased the genetic propensity
Starting point is 00:40:41 to go to school for more years as it manifests itself in people in white British people today, when we found the signal we were sort of incredulous like how could this be maybe this is a problem So we did a few tests to try to figure out whether this was real.
Starting point is 00:40:55 And one of the tests we did is we looked for a study where this measurement of the numbers of years of school was done not in Europeans, but was done in Chinese people in China. And we looked at variants that had the effect size of many variants as they affected the number of years of school in China. And we saw whether they had a relationship, a correlation to the trajectory of those same genetic variants in Europeans over the last 10,000 years. So these are two parts of the world where the populations have been essentially completely disconnected. And there's no way by chance that the trajectory in Europeans over the last 10,000 years, will have anything to do with the number of years, the effect on the years of schooling in China today. But there's actually a huge statistical correlation of five or six standard deviation correlation between the effect size of variance,
Starting point is 00:41:43 a number of years of school in China today, and the trajectory in Europe, just as strong, actually, as the effect size of variance in Europeans to the trajectory in Europeans. So we just could not see a way this could happen by chance. And once we saw that, we really felt quite convinced
Starting point is 00:42:01 that this was a real signal and that really somehow there has been natural selection to increase the genetic changes that today manifest themselves as predicting more years of schooling. Okay, just to make sure I understood you're saying you're looking at this ancient DNA in Europe
Starting point is 00:42:21 and you're saying, well, it seems to predict years of schooling for modern people in Europe, or at least a selection on those ancient DNA, that ancient DNA seems to predict more years of schooling in modern Europe. And then you also find, well, it also predicts how the same variants predict more years of schooling for Chinese people in China. Yeah.
Starting point is 00:42:44 And so this is not just some weird artifact from the way these GWAS were done in Europe. These parts of the genome seem to robustly predict the kind of thing that actually leads to more years of schooling, at least in people today. Correct. Jane Street is pretty secretive, but I did learn about one internal mechanism, which illustrates how high trust and weird their is researchers aren't given compute allocations. Instead, Jane Streeters use an internal currency called hive bucks to bid for compute in real-time auctions. Everybody can spend as many hive bucks as they want, but your hive buck bid is meant to represent the real dollar value of the experiment that you want to run. Now notably
Starting point is 00:43:24 during the auction, anybody can change anybody else's bid. And after the auction, people can even kill each other's jobs. People just trust each other to do this in a way that benefits the whole firm. As a result, Jane suits allocations reflect a near real-time consensus on the highest priority uses of compute. As Axel, one of their ML engineers put it. I think Jane Street is like pretty bottom up in terms of we have lots of different researchers who are all training their own models, sequence models, all sorts of other weird and wonderful things. By the way, with their new compute deal, they've just added a $6 billion high buck stimulus to their internal economy. Jane Street is hiring researchers, engineers, and interns.
Starting point is 00:43:58 Go to jane street.com slash thorcash to learn more. Okay, so stepping back, I want to I think there's this question about what does this tell us about what actually changed in our environments over the last 18,000 years. And we talked a little about what happened after the Bronze Age. I want to understand it's surprising to me, we're talking about this during the collective intelligence part of the conversation. But it's surprising to me that things like intelligence or lack of schizophrenia or so forth, things just seem kind of robustly good. were not maxed out before the Bronze Age. And in fact, there was so much, the diversity among different populations was so big
Starting point is 00:44:46 that you have the European hunter-gatherers having three standard deviations less predicted value for what they would score on intelligence test if it existed. But, you know, they were existing in the real world in a place where intelligence matters. And so how can it be that this was not a true? You just look at the human body or any animal. It's just like evolutionism and acting on it so strongly to make it functional
Starting point is 00:45:15 of the things it needs to do. And this one thing, which seems like so relevant, especially to what human hunter-gatherers needed to do, is not under, doesn't seem to have been under that strong selection in the Mesolithic or Paleolithic or those eras? I think that that's a great question. And like, as we talked about before, the human selection is very effective. It can move the mean value of traits within hundreds or thousands of years in one direction or the other if that's adaptive in a particular environment.
Starting point is 00:45:47 And so you might wonder, isn't intelligence good in all contexts and places in time? And I think that there's a number of ways to think about that. First of all, I think we are speaking from the point of view of a society which intensely values this particular. trait, you know, ability to score well on IQ tests or things like them or to go to school for a long time or whatever it is. And I think this is unprecedented in human history that we live in a time like this. Like if you look at the, you know, Hebrew and Christian Bible, and you look at how much intelligence is valued, it's basically not at all. Wait, but that, when the Bible is being written, especially the Old Testament, is exactly what selection
Starting point is 00:46:27 for intelligence is the highest point. Yeah. It's apparently ever been. Yeah, exactly. But like, It's about strength or courage or religiosity or, right, those are the values, right? Or if you read Homer or the other texts of other religions, it's not intelligence, it's beauty, it's like other things. And so this value system, which has a hyper focus on, you know, smarts is not obviously trait value that's been common in the past. You might think that in certain communities like, you know, some communities are not, there might be valuation of things that are more proximate to, you know, years of schooling.
Starting point is 00:47:05 But really broadly, it's not been a high value in the population. But obviously, the thing we're referring to is not, or the thing we care about is not direct performance on an IQ test, especially in the past. I think the thing I'm trying to understand better is this is intelligence more broadly. And maybe just that IQ test intelligence is not that correlated with here is a new world environment and go figure out how to process food there. make shelter and everything else. All the things which, you know, your colleagues like Joseph Hednerker talked about, like, how modern people underestimate the difficulty of doing this kind of thing with a small band of people. Anyway, this is say, like, maybe that's not IQ test intelligence, and that's why we don't see that strong a selection effect on this thing. But I just intuitively,
Starting point is 00:47:51 it seems like regardless of the value system, it just seems very valuable to have this trait maxed out. So I'm being very speculative. And let me give you two examples. about in my head how I'm thinking about this, and not that I'm a particularly good authority on these things. But as I mentioned, a lot of these traits, which are quite disparate, are highly correlated to each other. Obesity, years of schooling, walking pace, performance and IQ test, household wealth,
Starting point is 00:48:17 all these crazy traits all seem to be governed to a substantial extent by a shared combination of genetic variants. And let's just think about what this might mean. So in Iceland, in the last hundred years, there's been selection against this combination of variance. And one possible interpretation is it's basically selection for two ways of investing in your children, having many kids and not investing a lot in them, or having few kids and investing more in them. So if you invest in deferring having kids but becoming, you know, having more wealth,
Starting point is 00:48:50 having more resources and putting more into each kid, you're going to have a lower fertility and you're going to have fewer kids. And that's going to result in lower fertility, but those kids might survive more and do better in society. Alternatively, you can just have as many kids as you can and invest less in them. They might have individually less good outcomes, but in a time of plenty, which is potentially Iceland in the 20th century, it might make sense to have more kids and invest less than them. And so there's a toggle between having more kids and investing less in them and having more
Starting point is 00:49:20 kids and investing less in one's life and having fewer kids and investing more in excelling in various ways or something like this. And so you can imagine that actually at different times and in different places, in ecology, there's resource, there's different ways like mammals often invest a lot with a pregnancy and a small number of children, whereas fish will spawn huge numbers of offspring into the river, the great majority of whom will be eaten. But that is an effective way to produce offspring in certain conditions. So there'll be a toggle, depending on the environmental conditions, back and forth, between investing in large numbers of offspring, with fewer and less investment, or smaller numbers of offspring with more investment, and maybe we're just seeing that move back and forth over different places and times. Similarly, for schizophrenia and bipolar disease, how could this ever be advantageous? But maybe what we're seeing with these diseases is a kind of readout of some kind of spectrum of traits that actually in some context might be advantageous.
Starting point is 00:50:20 Maybe being anxious or being imaginative or being neurotic might be helpful in a shamanistic tradition, you know, in a religious tradition which values people who can have visions or values people who can be creative. And maybe these are subclinical versions of schizophrenia or bipolar disease that in certain times may be advantageous and in other times may be disadvantageous. Maybe you're just seeing selection from different types of creativity or other thinking that can be valuable in different contexts. I'm waving my hands here, but my sense is that these complex traits have not pushed in one direction because there's advantages, there are spectrums where there's advantages to both ends of the spectrum and there's multidimensional, you know, impacts
Starting point is 00:51:08 of these different traits. Julian Janes has this famous theory in the origins of consciousness in the bicameral mind that I'm bushering this, but fundamentally, the way I understand it is that up until Homer, basically everybody was schizophrenic in the sense that. that people genuinely thought that gods or whatever were real people that you're communicating with. And his claim is that ancient text seemed to show people behaving in this way. You're being asked to believe in visions. Yeah, exactly.
Starting point is 00:51:38 You know, and even today, I think, you know, there's valuation in some religious communities and, you know, communicating with God and having visions and having supernatural communions. And so I just don't know. Yeah. But I think it's super interesting to imagine, to ask the question. why certain traits are not always advantageous. For schizophrenia and bipolar disease, there is a sense in which most of the mutations are disadvantageous.
Starting point is 00:52:02 We can see that from the patterns of variation where the variants that are risk factors tend to be low frequency, and they tend to be small effects. So another trade you find under selection is the trend away from body fat since the agricultural illusion. Why is that? So this is what you see as a reduction in the combination of genetic mutations
Starting point is 00:52:22 that make you at risk for obesity, body mass index. And similarly, and very correlated to it, higher fat mass, higher waist to hip ratio, higher type 2 diabetes risk. And so there is clear selection by about a standard deviation on the scale of modern variation for these traits, reducing about 10,000 over the last 10,000 years in this part of the world. So what can be going on there? Why was there not selection for this combination of traits before? There's a longstanding idea known as the Thrifty genes hypothesis. The idea is that once you have hunter-gatherer populations that move into a farming environment where there's plentiful food, there is no longer a need to the same extent
Starting point is 00:53:03 to be able to build up body fat to sort of survive in times of stress because there's more constant stores of food. And so as a result, there will be natural selection against body fat, which can be, once you move into an agricultural environment and two periods of food plenty. And so maybe what you're seeing is that this group of people in Europe and the Middle East over the last 10,000 years, has moved into a period of relatively more stable food where building up stores of fat are not as advantageous, and there's been selection against this combination of traits. Europeans actually are relatively better protected genetically against type 2 diabetes than some
Starting point is 00:53:43 other populations around the world like African Americans and Native Americans that have perhaps not been as exposed to agriculture for as much time. So you may be seeing the effect of more exposure to more stable food accessibility. This is also another way in which the data goes against a common story. And the common story is that hunter gatherers actually had much more stable diets because they were more buried. And so they weren't reliant on a single cereal or a single crop for their calories. And if, you know, if one game went away, they had other things that they could scout for, they could move locations more easily because they weren't tied down to the land. And so they were more food stable. But in fact, if there's been selection against
Starting point is 00:54:26 storage of body fat, that suggests that as unstable and as common as famines might have been, in agricultural societies, it's at least more stable than what the hunter-gatherers had. I think there's a timescale issue. You're absolutely right. So I think, as I understand, and I'm no anthropologist and no, but my understanding is that when there's a hunt in some of these in traditional societies or communities that hunt, people will often gorge themselves and eat a huge amount and build up a sort of temporary store of fat and then go with multiple days without eating meat sometimes until the next hunt. And so there is this sort of boom bust access to high value nutrition that is not true to the same extent in farming communities. On the flip side of this, these long, these famines are, I think, something that occurs more commonly in agricultural societies. But the time scale and the tempo of them is very different from the hunting tempo. So maybe there's a famine every three years.
Starting point is 00:55:26 And indeed, if you look at the bones of farmers, at least in some communities, there's more stress in them. Maybe due to a famine every three years or a famine every five years. But selection might not be acting on that three-year time period. your fat store from, you know, the latest hunt is not going to carry you through to the famine three years later. And so survival of famines is a different thing than building up body fat for being able to survive two weeks later. A kind of random question I have is if you were mentioning, look, as compared to these other
Starting point is 00:56:00 things which matter much more for fitness and the ancestral environment, the immune system, especially after the Bronze Age, all these other things. have mattered more than intelligence. And so they've been under much more selective pressure than intelligence. Right. That makes you wonder whether there's much more room at the top for intelligence, as in if humans had been selected, especially for intelligence, they could have been much smarter.
Starting point is 00:56:21 And the reason that's relevant is we're currently building AI systems, which are trying to make as smart as possible. And in fact, the only goal of the training process is intelligence. We don't have to worry about also at the same time making their immune systems powerful. We have lots of energy to spend on it, right? And at the same time, making sure they're not schizophrenic. I guess we kind of worry about that. But if intelligence has not been the dominant trait underselection for humans over the last 10, 20,000,000 years,
Starting point is 00:56:48 does that mean that there's more room at the top for this trait? I think there's more room at the top for a lot of these traits. I think that you can move height very extremely in one direction, much more than it is today. You can move any of these traits very much more extreme in other trends. There's probably very strong negatives to doing that. You're probably sacrificing other things. and I think that there's trade-offs probably. But I think it's highly likely that if natural selection was pushed any of these traits in more in one direction than it is, the mean would move.
Starting point is 00:57:21 So all of this evolution since out of Africa is acting on alleles that already existed in the pool of human variants from that first group, which we were talking about last time, on the order of 10,000 people that exploded out of Africa. And it's a, is it surprising that across all these different traits from cognitive profiles to, resistance to different kinds of diseases, to, um, height to whatever, that that one pool of people contained so much latent variation that they could supply the, you know, enough, you know, stretchyness to accommodate all these different traits that you're studying now? That's a rich question. And I think that the human population has, for complex traits, a tremendous amount of variation.
Starting point is 00:58:20 So within the human population, there's a huge amount of variation that affects height. There's a huge amount of variation that affects body mass index. If you take all these mutations and all set them to the high height variant, a person will be extremely tall, like as tall as a tall building. You know, if you, of course, which will never happen. But if you take all these variants that affect schizophrenia risk, they will, and you point them all in the same direction, there will be extreme risk or extreme protection for schizophrenia. So for complex traits, ones underpinned by many mutations, all the variation already exists to move the population to a different adaptive set point that's optimal in the environment which it's in. So if you push the population into a new environment within hundreds or thousands of years, the population can rapidly move to a new adaptive set point.
Starting point is 00:59:09 There are some unusual traits like ability to digest cows milk or protection against sickle cell anemia that require a single very important mutation that may not yet exist in the population. And then you have to wait for the mutation to occur in some people. And when the populations are relatively small, only 10,000 people, you might have to wait dozens or hundreds of generations. for that mutation to arise. But when the populations are large, there's not mutation limiting anymore. Every mutation that can occur does occur. There's 8 billion people in the world. There are maybe 30 new mutations every generation.
Starting point is 00:59:44 So that's like, what is it? It's like 240 billion new point mutations every generation. There's only 3 billion DNA bases in the genome. So every mutation that can occur does occur about 100 times every generation. And we're not mutation limited anymore. And so it's not like you have to, that the mutations can arise again. They do arise again. But when the population is only 10,000, you have to wait dozens or hundreds of generations
Starting point is 01:00:08 sometimes for the new mutation to occur. And so how likely is it that the thing that changes the bronzes is just that the human population was big enough? So in 3,000 BC you go to, I think, a population of 50 million-ish people. The population is big enough that and the gene flow between different areas is high enough, such that things which don't have an overwhelming selection coefficient, which aren't overwhelmingly favored by evolution, are finally visible to selection. I think that's not likely to be true, but it's extremely interesting thing to think about.
Starting point is 01:00:39 So I think already when population sizes are on the order of a million or so, every mutation that can occur, does occur within a few generations. And so that's well before the Bronze Age, if you take the population even of a place like Europe, but also of other places, or maybe it's at the dawn of the Bronze Age or the farming period. So the question you ask is maybe when the population is small, natural selection doesn't work effectively. So a common thing that people think about with natural selection, and that is true, is that in small population, selection doesn't work effectively. And that's because mutations bop around in frequency from generation to generation a lot in a small population just randomly. So if you have a population of size 1,000, populations, mutations will bop around by a frequency of 1 over 1,000 every generation.
Starting point is 01:01:29 And if the selection coefficient is less than that, it will be drowned in the random bopping around of frequencies due to genetic drift. But that is already for a population of 1,000.1% selection coefficient is very weak. We're talking about 1% effects, and that's much very strong. It will work very well even in a population of a size 1,000 or 10,000. If you are talking about mutations of the type that will start rising only in large populations but not small populations, those are selection coefficients that are on the scale of 1 over 10,000 or 1 over 100,000, and those ones will take 10,000 or 100,000 generations to rise in frequency, which is hundreds of thousands or millions of years.
Starting point is 01:02:09 So that's not going to do anything over the timescale we're talking about. There's just a timescale issue. So we're talking about strong measurable selection coefficients on the order of half a percent. or more in this study, and all of those are going to work in small populations or large populations. It's not going to be affected by the population size. Interesting, but you're saying more generally, once you hit a given threshold of population, the dominant factor is time span, not population size. Correct.
Starting point is 01:02:33 Okay, interesting. It's very interesting. And it's actually not widely understood. Yeah. Okay. So speaking of data contradicting what you might have otherwise assumed, one of the papers you sent me beforehand, Malik 2016, found that, there are not fixed differences between modern and archaic humans 50,000 years ago.
Starting point is 01:02:57 And of course, we know this is the period in which the so-called cognitive revolution happened and modernity started and people are making art or whatever. Does this suggest that nothing biological change to make modern humans modern and the thing that happened with some cultural change? How do you understand what this data tells us? Right. 50,000 years ago or so, or maybe 100,000 to 50,000 years ago, there's a quickening of the pace of change in culture. So people, you see the first extensive representational art and, like, bead necklaces and drawings on the wall and so on and so forth.
Starting point is 01:03:39 And also rapid, increasing pace of innovation, the types of tools that people use. And so the thought might be that there was going to have been some kind of genetic switch, a kind of important genetic change that was occurred in the population, and that swept to high frequency, and that everybody suddenly had, soon had, and that made it possible to do these things, maybe some genes that allowed people to have complex language, representational language, for example. And so one thing that we did in 2016 in this paper by Shot Malik and colleagues, is we looked across the DNA for places that might be expected to look like this, that where all people living today, or nearly all people living today,
Starting point is 01:04:23 share a common ancestor maybe 100,000 or 200,000 years ago. And we looked really hard, and right across all the DNA we could look at, we couldn't find anything more than 4 or 500, more recent than 4 or 500,000 years ago. This is like a crazy result because it looks like there's no key selective sweeps that have occurred in this period that is ancestral to every. one living today. We talked before about no selective sweeps between Europeans and East Asians, but there don't even seem to be any selective sweeps between, like shared between all humans in this really important period when a lot of evidence in the material culture record appears.
Starting point is 01:05:00 And so it could be that there's biological adaptation in this period, but it's polygenic. There's lots of mutations that all shift in the same direction to help the population to move to a new set point, but there's no key biological change that rises to high frequency in this time. And this group, 50,000 years ago, there are the ancestors of everybody out of Africa or also some Africans? So this is 100 to 50,000 years ago,
Starting point is 01:05:27 and this is the population that's ancestral to West Africans, to most East Africans, to all non-Africans. And there's a couple of populations in Africa that have substantial ancestry that comes from more divergent groups. For example, Khoisan from southern Africa or Central African rainforest hunt-gatherers have substantial fractions of their ancestry from groups that diverged maybe 200,000 years ago from the other lineages. But all of these groups today are able to go to college, do everything everybody else does. And so there is like no evidence that there is any key mutation lacking in some groups that are not present in the others.
Starting point is 01:06:07 So the differences we see between different groups of people, especially if this group of people, 50 to 100,000 years ago, had a very small population size. I think the last time we were discussing on the order of 10,000 people. Yeah. So basically, everybody in the world or almost everybody in the world, or the variance we see between different humans today, was latent in this group, which sort of seems. And I guess your point to that, well, if you just stack up. different things across the genome, then stacking them up really has a big effect.
Starting point is 01:06:45 But it's interesting that, like, we have so many different groups in the world today, and all that diversity comes from very small population size. I think a lot of us in human genetics think that our population contains within it the clay that's needed to make almost any trait. And that, depending on environmental conditions or selection conditions, the mean value of these traits will move in different directions.
Starting point is 01:07:11 There's an empirical question, a real question about how much selection there's been in different human populations over time. One of the things this new work that we're involved in is doing is showing that at least in the last 18,000 years, 10,000 years, 5,000 years in this part of the world, there actually has been significant movement, at least for a handful of important traits. We looked at more than 500 traits, about 100 of them, complex traits. rates showed significant movement in systematic direction over this time period. So it really does seem that there is a response to the environments people are living in that has occurred over this period and is potentially stronger than in previous periods.
Starting point is 01:07:52 Crusoe has an amazing MLInfor team that keeps finding clever ways to squeeze more performance out of their hardware. For example, tokenization has become a real bottleneck for agendic workloads. Argentic prompts are often extremely long. They tend to have high KV cash rate rates, which shrinks the GPU's pre-fill work. This means that the tokenization step, which is traditionally sequential, is a much larger fraction of time-to-first token. To solve this, Crusoe built fast tokens, an open-source Rust-based tokenizer, which paralyzes things in order to take advantage of all the cores on modern CPUs. Crusoe had to get creative here because the naive approach doesn't work.
Starting point is 01:08:26 For example, for pre-tokenization, you can't just split your text into chunks and run Rejects, because you'd end up with issues whenever a word straddled the split. Crusoe solve this by giving each thread an authority zone plus the ability to read one kilobyte past its own edges. This one kilobyte buffer guarantees that you won't misprocess a token, and the authority zone guarantees that you won't end up with duplicates. No cross-thread coordination required. Crusoe combined this optimization with a handful of other smart tweaks in order to get up to 40% faster timed to first token on real production workloads.
Starting point is 01:08:57 To learn more, go to crusoe.a.a.i slash Thor Cash. We were talking earlier of how there's no fix. differences between humans 50,000 years ago and humans today. So if there's no genetic basis for the kind of thing that allowed humans to have more symbolic representation, have farming, etc., I think I asked you this question last time we talked, but especially with this context, why no farming before the ice age? Genetically we're there. That is such an interesting question. Right. Genetically, we're there. The common ancestral population has all of the ingredients for farming 50,000 years ago.
Starting point is 01:09:35 And these people are distributed into different parts of the world. The Americas 15,000 years ago, or whatever it is. New Guinea, 40,000 years ago, East Asia, Europe, you know, West Africa. No farming develops before, you know, 12 or 11,000
Starting point is 01:09:53 years ago. It only develops in the last 12,000 years, the period known as the Holocene, which is sort of the end of the ice age. And if you talk to climate scientists, and archaeologists, you know, I keep asking people this question every time I meet someone who's an expert in this, is like, how can this be that farming develops in all these places? Are we really living in such an unusual time? And people tell me, indeed, we're living in an very unusual time on a scale of two million years. That is, 12,000 years ago, we switched into this period of not just warmth, but climate stability. And that actually this is true and sort of hard to believe that we're living in such a special time. But if you look at, for example, data from the bottoms of ponds where you can measure the fluctuations of temperatures using isotopic signatures. Apparently, we're in a period where it's just fluctuating a lot less year to year and 10 years to 10 years and 100 years and 100 years.
Starting point is 01:10:48 And it's just a period of relative stability that we are miraculously living in. And that when this period of relatively stability happens, somehow it follows that multiple groups independently turn to agriculture, even though the genetic complement, you know, all of whom have the same genetic complement that arises 50,000, 100,000, 200,000, 300,000 years ago. It's kind of a crazy observation that people just accept, but it's like unbelievables. Oh, so you increased the range there. So you said 100,000, 200,000, 300,000 years ago. And we, based on the genetic differences between modern people and people from even 300,000 years ago, you think basically there's, they're modern 300,000 years ago?
Starting point is 01:11:32 I don't know. I'm thinking about this all the time right now. This is actually actively what I'm thinking about right now. And there's a big transformation in terms of the culture of humans 300,000 years ago, this invention of level technology, the ability to make stone tools out of course, the Middle Stone Age Revolution or the Middle Paleolithic Revolution, depending on what you call it in Africa or Eurasia. And this is a revolution, a new way of making stone tools that's shared by Neanderthals,
Starting point is 01:12:02 and by modern humans, but is not shared in East or South Asia. And it's a big change, and it involves a cognitive change, presumably, in order to make this sort of technology. And then there's a further change to the Upper Paleolithic later Stone Age, maybe 100 to 50,000 years ago
Starting point is 01:12:18 when there's this second transition where the new type of toolmaking, but not as revolutionary as the earlier one. So when the cognitive leap happens is unclear. The diversification of the lineages leading to people living today, like Coisand, Southern Africans, and rainforest hunter-gatherers, and that all occurs more on the timescale of 300,000 or 200,000 years.
Starting point is 01:12:41 And all of these people are capable of going to college and doing everything. And so, you know, it's not obvious that all the toolkit, the cognitive toolkit, the behavioral toolkit, the genetic abilities were not all in place two or 300,000 years ago, and that even Neanderthals had them, right? So it's not obvious that this was not the case. And so, like, I just don't know. You sort of distribute these people descended from this diversification that happens 200, 300,000 years ago to different parts of the world. And then, Bing, you know, after 12,000 years ago, you start having agriculture popping up in different places.
Starting point is 01:13:17 It's kind of an outstanding mystery of human history. And, you know, I find it unbelievable that we live in a time period that climateologically is so unique on a scale of 2 million years. but my colleagues tell me it's true. The climate thing seems surprising given there were so many different environments in which agriculture was independently developed. Now I understand that across environments, the variants could have gone down.
Starting point is 01:13:44 But it just like, if it only had happened in one place at one time, I could have bought that explanation. But the fact that they're making maze in the new world and they've got, you know, cereals in the old world and so forth, and just in very different environments makes it surprising. It's very, very surprising. I think we accept it, but it's just like a crazy observation that most normal people don't realize. You know, the thing that basically everybody accepts is that the common ancestral population
Starting point is 01:14:14 of almost everybody in the world except for Rainforest Hunter Gathers and Coyson is like around 70,000 years ago. And everybody accepts that these people all have in place the cognitive, behavioral, intellectual ingredients that are necessary for the farming revolution and building state societies. Because when these descendants of these people get distributed to West Africa, to East Africa, to the Americas, to Europe, to South Asia, to East Asia, to New Guinea, and so on,
Starting point is 01:14:39 their descendants all do this, like independently or semi-independently or completely independently or demonstrably, completely independently in all these different parts of the world. So the cognitive resources for doing this must have all been in place, but it's a very long fuse.
Starting point is 01:14:53 Like it delays for 40,000 years for 60,000 years in all these different places after the common ancestral population splits up and then ignites into like agriculture and all these other things after that point. It's kind of a crazy claim. And then you could argue about whether the actual fuse is 300,000 years, you know, from when Neanderthal separate and from when different lineages of extant modern humans separate. And that's also plausible. So it's kind of a crazy sort of set of things that we're being asked to Is it possible that agriculture existed, but you didn't have modern metallurgy or whatever it was that allowed populations to explode starting in 5,000 BC with the bronzes? Because population-wise, it doesn't seem like 10,000 BC to 5,000 BC, the early Neolithic, much is happening.
Starting point is 01:15:43 It's as possible that they had farming, but they didn't have copper, they didn't have tin, which you needed to go to, I guess, the Middle East for, to develop the civilization. that could make use of bronze at a large scale. And so they just disappeared from the historical record. I think we would see their archaeology. And, like, you know, the extraordinary developments in the Americas, which are entirely Stone Age. You would see them today if they had gone completely vanished. Oh, yeah.
Starting point is 01:16:12 I mean, there's like, you know, we should go for a trip to Teot-Wakan in Mexico. And it's like so impressive. Like, you know, when I went there, when I was 20, you know, it's just like, it's totally as impressive as ancient Egypt. You know, it's like huge. It's massive. It's without metal. And it's, um, it's even more impressive because it's not only without metal, but it's without animals and without wheels, which is crazy. Like the, the marvel is just like, hauled without wheels. Right. Like take any person who has like an old world's superiority and, like, take them to these places and they will not have it anymore. It's just extraordinary what's in these
Starting point is 01:16:49 places. And these are people who separated 20,000 years ago, at least, from the answer. of East Asians and 40,000 years ago from the ancestors of West Year Asians and, you know, just had the same biological, you know, cultural shared toolkit from then, but there's just a fuse, a long fuse delay until all this stuff happens. It's kind of like an amazing thing and we don't question it. What are other questions you have that people, yeah, you're either are investigating right now or want to investigate these kinds of big picture questions of human history? I think that I'm, I mean, I'm perplexed.
Starting point is 01:17:30 I don't know if we talked about it before, but like I remain very, very confused about the relationships between archaic and modern humans. We have genome sequences now from archaic humans who lived in Europe and the West Eurasia and Central Eurasia and the Neanderthals. We have archaic sequences from these enigmatic Denisovans who we now have a skeleton for since we last talked. There's now a skull from a denisivin that's been shown to be a denisivin. And we have data from lots of modern humans.
Starting point is 01:17:59 And there's really big mysteries about the relationships amongst these groups. So genetically, the Denisovans and the Neanderthals are sisters. They descend from a common ancestral population five or six hundred thousand years ago. And that group descends a couple hundred thousand years before seven or eight hundred thousand years from the common ancestors of modern humans. And so genetically, the whole genome data says that Neanderthals and Denisovins are archaic humans from a common ancestral archaic population. But there are so many things shared between Neanderthals and modern humans that don't seem
Starting point is 01:18:34 to be shared with East Asians. They both share middle-stone-age stone tools, level technology, this cognitively unique type of way of making stone tools that wasn't used in East Asia. they both have the same mitochondrial DNA and Y chromosome sequence. So the Y chromosome sequence of Neanderthals, the mitochondrial DNA of Neanderthals is actually modern human that came through interbreeding two or 300,000 years ago and then shot up to 100% frequency.
Starting point is 01:19:03 And then Neanderthals and modern humans are both the product of mixture events that happened between archaic and modern humans 300 or 200,000 years ago, demonstrably through patterns of variation in ancient and modern DNA. And so it feels that there's something shared between Neanderthals and modern humans that's not shared with Denisovins, even though the vote of the whole genome says that Denisovins and Neanderthals are related. So one wonders whether there's something connecting kind of Neanderthals and modern humans that's different from Denisovins, even though genome-wide, Denisovins and Neanderthals cluster. So I'm thinking about that all the time now.
Starting point is 01:19:39 And then connecting them would be interbreeding events or being in the same place at the same time that we missed. There's a known interpreting event from the lineage leading to modern humans into Neanderthals, but it's supposed to be only 5%. So I'm interested in that that 5% is actually a sign of something much more impactful. That is that somehow Neanderthals are in some sense deeply modern in some ways, and even though they get swamped by archaic genes, that somehow they actually have more of a modern impact than one would think, and that the Middle Stone Age and Middle Paleolithic Revolution, that they share with modern humans is actually more fundamentally a part of who they are in some sense that we think. Interesting. So when was this interpreting event? 300,000 to 200,000 years ago. And so the common ancestor between Neanderthals and most humans alive today is potentially more recent than the common ancestor between all humans alive today.
Starting point is 01:20:36 Oh, for sure. Yeah. Which is crazy. Yeah. Well, you're not, not, the divergence to all the archaic humans, including dynosivins, is within human variation. Okay. So, but... Wait, what? Yes. So the average time to the common ancestor of any two human genes is one or two million years ago. So, like, if you look at any bit of your DNA that you get from your mother and a bit of the same bit of your DNA on the same chromosome, the copy of chromosome three you get from your mother and the copy of chromosome three you get from your father, typical time they share common ancestor is one or two million years ago.
Starting point is 01:21:08 That's before the split from Neanderthals and Denisovans. So there's many places in your DNA where you're more closely related to a Neanderer. on your mother's side than you are to your father. And I'm sure there's a simple explanation, but how? This is the same reason that if you have like a sister, you know, you're in some places in your DNA more closely related to her than you are to me because you share a parent, but in other places you're more close to me to me than you are to your sister because you happen not to share the same DNA from your parents.
Starting point is 01:21:36 It's a process, it's just that the DNA that we get from our common ancestral population was already quite variable. I see. 500,000 years ago, 700,000 years ago, a million years ago, and some of us descend from some of those ancestors and others of us descend from other those ancestors. And Neanderthal split from our lineage really close in time on human evolutionary time scale,
Starting point is 01:21:59 such that in some places in our DNA were more closely related to Neanderthals than to each other. What are the other big questions? I think that's the main thing that I'm thinking about a lot these days. I think that I'm really continue to be very obsessed with questions about the spread of human populations around the world
Starting point is 01:22:18 and trying to reconstruct that with ancient DNA. After the recording ended, David started spontaneously explaining a new theory he's working on about in the end with all genetics on a whiteboard in the room, which I ended up capturing on my iPhone. Because it's a whiteboard, I think it might be helpful to switch over to a video platform like YouTube or Spotify,
Starting point is 01:22:35 but if you can't, it's totally okay to listen on audio. The thing I'm thinking about a lot recently is the possibility that it's maybe we're not thinking in the right way about the relationship between archaic and modern humans. So the standard model is one like this, where Denisovans, these archaic humans that were found from ancient DNA and Neanderthals descend from a common ancestral population. five or six hundred thousand years ago, and that these two separate earlier maybe 700 to 800,000 years ago from the ancestors of modern humans, people like us. So that's the big results of a lot of studies since 2010. But there's also evidence of interbreeding events
Starting point is 01:23:29 that happened maybe 200 to 300,000 years ago. and that actually resulted in modern humans contributing DNA to the ancestors of Neanderthals. So this is maybe 5% of the DNA of Neanderthals comes from this interbreeding event, and a lot of studies have shown this. And so I'm very interested in this because actually from the archaeological record, Neanderthals, and modern humans sort of look actually quite similar to each other, much more similar to each other than a lot of them do to Denisovans, these archaic humans in East Asia.
Starting point is 01:24:15 So a lot of the history, people have thought that Neanderthals are our sister, but in 2010, the sequencing of the Denisovin genome made it very clear that on average, Denisovins are closer to Neanderthals than to modern humans. So this was like a very confusing result. And most people now think that Neanderthals and Denisovins are, like descend from a common ancestral population, separated earlier from the ancestors of modern humans. So I'm interested in the possibility
Starting point is 01:24:44 that actually the right way to think about Neanderthals is actually as somehow culturally modern humans, and even though that genetically, they're mostly Denisovans. And the model I'm thinking about is motivated by this archaeological phenomenon known as the Middle Stone Age Revolution. So if this is Africa and this is, I don't know, Europe, we know that the new way of making stone tools with these cores that were very carefully mined far away from the locations they were used made out of high-quality stone like Flint, start being used three or 400,000 years ago, first in the Caucasus, places like Georgia today or East Africa, and that this way of making stone tools, which is quite revolutionary and is known. in Europe as the Middle Paleolithic and Africa as the Middle Stone Age, and is associated with much more widespread use of fire and also moving stone around at much further distances than before.
Starting point is 01:25:48 I'm interested in the idea that this is something that's shared between modern humans and Neanderthal, it is somehow some shared cultural feature that's absent in East Asia. And that might have a relationship in the genetic data and is somehow related to this 5% DNA. So the idea I'm interested is the possibility that there is a population here that invents the Middle Stone Age and the Middle Paleolithic, sometimes called Levelaw technology, and that people from this population expand into Europe, and they mix with the local archaic humans who are there. And that is what this 5% interbreeding event is. It happens 2 to 300,000 years ago, and it produces a group that as it expands across this landscape in Europe, mostly picks up the local DNA and becomes most,
Starting point is 01:26:34 mostly archaic genetically, but retained its modern human culture, the way of making stone tools and some of its traditions. And so one of the things that's super interesting about this is that if you actually look at the genetics, the whole genome, the Neanderthals and Denisovans cluster. But if you look at the mitochondrial DNA, which people get from their moms and they get from their moms, Neanderthals and modern humans cluster. So if you look at the mitochondrial DNA, Denisovins and modern humans share an ancestor well more than 700 or 800,000 years, as you expect for the history. And if you look at the Y chromosome
Starting point is 01:27:07 that you got from your dad, Denisovins and modern humans share ancestor more than 7 or 800,000 years ago, which is consistent with this history. But if you look at the Neanderthal mitochondrial DNA, it's only 3 to 450,000 years. If you look at the Y chromosome, it's only 3 to 450,000 years. So what the current genetic work is asking us to believe is that even though this is only 5% of the whole genome, it introduces mitochondrial DNA and Y chromosomes, and they jump up to 100% frequency. It's kind of a crazy claim because the probability of this occurring by chance is low, maybe 5% times 5%, so a very small number. And so it's sort of what we actually all believe, but it's sort of a very sort of surprising event. And somehow it's accreted all the findings in the whole
Starting point is 01:27:52 literature so that we make ourselves believe this, but it seems sort of unlikely on first principles that somehow only 5% will introduce both the wygramsome and mitochondrial DNA. And it really looks like this. So there's this amazing data from this site in Spain that's like 2 to 400,000 years old. It's 3 to 400,000 years old at site called Seema de losuizos. And they have a nuclear genome that looks Neanderthal-like, most of the genome, but their mitochondrial DNA and wikromosome is Denisovan-like. So it really looks like there was a population related to modern humans
Starting point is 01:28:22 that pushed into this Sima de loswazos-like population, and displaced its mitochondrial DNA and Y chromosome, but kept the rest of its genome. So it really looked like something like this happened. So the idea that I'm sort of playing with, and probably it's wrong, who knows, but is that there's a landscape, this is maybe Europe,
Starting point is 01:28:41 and you can break it up into 100 or so deems, little areas. And modern humans get introduced at the bottom right corner in the Middle East or something, and they spread into Europe. And as this population spreads, there's a wavefront of expansion and they're interacting with the local archaic humans. And even if there's a small amount of interbreeding, the theory from lots of studies, simulations and lots of studies of all these different species like mammals and birds and so on,
Starting point is 01:29:10 shows that there is, when there's even a little amount of interbreeding as there's an invasion or a movement of expansion of one group into the territory occupied by the other, there's massive integration of local genes. There's that these pioneers at the wavefront, they'll sometimes interbreed with the local population. There's so many of them around that their DNA will get swamped by the local groups. So by the time they make it to the other side,
Starting point is 01:29:34 they're largely local. And so maybe what we're seeing is that this is what's happened. You have like a modern human population that's matrilineal, for example, where transmission of making stone tools this way is happening from your mother to the kid. and that's why they're retaining their mitochondrial DNA, but by the time they get to the other end of Europe,
Starting point is 01:29:56 they're mostly archaic, they're mostly local archaic. So you end up with a 95% population replacement. So this would explain why the mitochondrial DNA is shared between Neanderthals and modern humans, and it would also explain why the mixture proportion is only 5%. But the really interesting thing is that actually there's other evidence from studies of modern humans that actually modern humans are two also admixt,
Starting point is 01:30:20 and that the right way to think about this is that modern humans are a mixture of two groups, maybe like 1.5 million years ago, and that they come together two to 300,000 years ago with like 20% 200 to 300,000 years ago, with maybe 20% ancestry from this archaic African group, and 80% ancestry from this early modern lineage, and that the same group then mixes with Neanderthals
Starting point is 01:31:01 and this 5% modern here and 95% local here. And so you actually have this key population that makes the Middle Stone Age, or LavaWat technology, this one that appears here, and it expands in all directions into Europe here, into Africa here, two to 300,000 years ago, bringing this technology, bringing these new ideas, bringing perhaps some genetic adaptations.
Starting point is 01:31:29 It expands into archaic humans in Europe. It mixes with the local population. It gets 95% replaced, but still retains its cultural features and maybe some genetic features. And it expands in Africa, too. And here it's not 95% replaced. It's only 20% replaced. And probably the reason that happens is that this group is much, much more diverged. It's much more archaic. It's 1.5 million years diverged rather than 7 to 800,000 years diverged. And as a result, there's many more incompatibilities genetically, and there's much more barriers to gene flow. But there's still a lot, maybe 20%.
Starting point is 01:32:04 And we have evidence that this is a big mixture that happens. And so what you're actually seeing is a modern human expansion, both into Europe and into Africa, in one places it forms Neanderthals, in one places it forms the ancestors of everybody living today. But all of these groups are descended from this key sort of revolutionary event that happens here. So we often talk about the revolutionary events 50 to 100,000 years ago, the more symbolic behavior and so on and so forth, that sort of first appear in Africa and the Middle East and spread beyond. But there's also this earlier event, and this event is sort of contemporaneous with the
Starting point is 01:32:40 breakup of all the different groups also in Africa today, you know, the Khoisan Southern Africans and the Central African Rainforest Hunter Gathers. So one wonders whether this is an equally important formative event. And it also, if that's true, makes you think of Neanderthals as actually somehow our cousins, that they're actually share our Y chromosome, their share mitochondrial DNA, their share formation of this two or three hundred thousand year old event, their shared toolkit. So even though the genome is telling us that they're cousins of Neanderthal, Denisovans, the actual correct way to think about them may be in an important sense somehow. relations or the close cousins of modern humans.
Starting point is 01:33:25 I have so many questions. Do you have 15 more minutes? Yeah. Okay. First of all, what is going on with this group of archaic Africans 1.5 million years ago? Where in Africa are they? And what happens to the portion of them that don't form modern humans? Do they survive? So they, the genetic data suggests, this is analysis, not. of any ancient DNA, but only an analysis of modern DNA from different people, mostly in Africa, but also non-Africans.
Starting point is 01:33:57 And multiple studies, there's at least three, maybe four or five studies that I know about, have looked at the patterns of variation in people today and say the data in modern people today, including in Africans, is not consistent with the homogeneous populations. It looks like a population that split well more than a million years ago into multiple, at least two, but maybe many groups, and then came together with an important coming together a few hundred thousand years ago. The papers have different models that they fit, but they all have this feature of more than a million years ago, there's a split up, and then on the order of a few hundred thousand years ago, there's a coming together and a remixed or event forming the
Starting point is 01:34:32 ancestors of anatomically modern humans. And then this includes the coison and whatever other groups. All of these groups have this. Maybe it's in slightly different proportions. So you ask, where are these people living? Who knows? Right? Like, you know, in this scenario, the 80% is coming from the Caucasus or Northeast Africa, where this Middle Stone Age form. It's from this population that forms the Middle Stone Age. And they mix with local groups. And who knows where they are? Southern Africa, Western Africa, Central Africa, Eastern Africa. We don't have any ancient DNA. But, like, you know, this is a very rich environment. People have been living there for, like, seven million years at least. And, like, there would have been different groups of people everywhere.
Starting point is 01:35:10 Probably it's not just two groups. It's probably more groups. I think the important theme here is there's evidence of substructure that's well more than a million years ago, and this place would have been a landscape full of archaic humans that would have been differently related to these expanding people and would have adnixed with them when they came through. Okay, so the Neanderthals, first time around 300,000 years ago, our ancestors share culture with them. They share the Middle Estonian technology, but they don't replace the population.
Starting point is 01:35:40 The technology spreads through culture, basically. Oh, it spreads through genes, too. if you look at Yamaya in India, there's almost no Yamnaia ancestry in India. Huh? I mean, it's just diluted, diluted, diluted, diluted down. As Yamnaya expanded into Central Asia, you know, like it expands into Europe, it makes the cordidware, there's a 25% dilution, it expands back across Central Asia, it goes through the Hindu Kush, you know, it gets into Northern South Asia, it mixes more with local people. By the time, you know, today, the most Yamaya ancestry you see in India is 20% or, you know, or, you know, it gets into northern South Asia. or 10, you know, most people have less than 10% or 5%.
Starting point is 01:36:18 I see. It's, you know, there's just been a lot of mixture on the way, but it is the tracer die, right? Like, it tracks Indo-European languages, and important aspects of Indo-European culture are coming through Yamaya. So if you know where to look, that tracer die is only 10%. It's only 5%. It's only 2% in some groups, but it's the language as people speak,
Starting point is 01:36:36 and it's important cultural shared elements that connect them to people on the other side of the Indo-European-speaking world. So this 5%, you shouldn't sneeze at it, right? Like, that's tracing something important in this model. And then I understand that if things are transmitted more through women that... Actually, sorry, let me back on. I don't understand why the maternal DNA and mitochondrial DNA and the white chromosome would be especially privileged as the spreading is happening.
Starting point is 01:37:04 Can you explain that? So the reason I'm talking about these matrilineal or patrilineal expansions is I'm really troubled and have been troubled for like many years. actually 15 years, but like especially in the last three or four years, by the fact that the mitochondrial DNA and Y chromosome cluster Neanderthals and modern humans, but the rest of the genome clusters, Neanderthals, and Denisovins. This is like a crazy result that is not seen in any other species where you see this pattern. So I'm very interested in patterns that would explain this. If you invoke and assume that there was like a matrilineal or a patrilineal expansion,
Starting point is 01:37:38 it could be either, where modern humans, when they were expanded across the landscape of Europe, retained their identity along one of the lines. Like if you incorporate a local, if it's matrilineal, when they incorporate a male from the local community, they brought into the community and the kids are raised based on the culture of the mothers or something. Or if it's a patrilineal expansion, they incorporate a female from the community. It's incorporated, sort of raised with the culture of the fathers. So if that happens, it guarantees one of these two parts of the genome to look like it does, because it's a modern human expansion. If it's patrilineal, it will retain the Ychrome zone. If it's matrilineal, it will retain the mitochondrial DNA. So it will solve one of your
Starting point is 01:38:20 two problems. But not both. It won't solve the other one, so you need to solve the other one. So the other one, you can solve either by natural selection or you can solve it by social selection. So by the way, patrilineality and matrilineality are the rule, not the exception in human communities. Usually communities sort of follow, have continuity along the male or the female line. And usually it's patrilineality. Sometimes it's matrilineality. So you can also have phenomena like social selection. So it could be that once you have kids of someone who is from the whose father, for example, is from the outside community, that those, the male, usually in most communities, females all reproduce. That's typical today. Like, usually women have kids if they can,
Starting point is 01:39:11 but men in traditional societies are actually very variable in their reproductive success, a large fraction of men never have kids. And then there's a relatively smaller number of, there's a subset of men have many kids with many women. And so there's competition among men for kids. So in this context where males are competing for access to females, then female mate choice begins to be an important process. And you have a phenomenon where it could be the case that if your dad is an archaic male, then you're not going to be as successful in the competition
Starting point is 01:39:46 for local females if your dad is a non-archaic male. So some simple social phenomenon like that could explain the data. We actually see this in human society. So for example, if I remember right, like in Central African Rainforest Hunter Gatherers, there's different treatment of boys and girls, depending on whether their dad or mom is one group or the other. I guess I don't understand how the maternal, like, okay, you know, the group spreads and it gets to the next front.
Starting point is 01:40:15 Yeah. And they have kids, and some of those kids are, okay, from the group, from the humans that have just entered, the kids will have the maternal DNA, the mitochondrial DNA from the humans. Yeah. But from the existing people, they will have the mitochondrial DNA of the archaic humans. Yeah. And why are the people with the archaic mitochondrial DNA not surviving? So it's a question.
Starting point is 01:40:44 So there's multiple possible explanations, but it's much easier to explain that than both mitochondrial DNA and the lycrumsum. One possibility is that the mitochondrial DNA was less biologically fit. Another possibility is that their social discrimination against people based on whether their parents are archaic or not. Which is, I think, not at all surprising in a human context. What, okay, so the Neanderthals. It's the weakest link in this argument. This argument is probably wrong, but I'm just telling you what I'm thinking about. Okay, the Neanderthal, so 300,000 years ago, we, our lineage interacts with them, but mostly
Starting point is 01:41:20 their lineage survives and there's cultural diffusion, et cetera, and genetic diffusion. And then is it 70,000 years ago? Then we interact again. Yes. and they don't survive. The genetic ancestry doesn't survive. The genetic ancestry doesn't survive. So presumably there was also other contact in between 300,000 years ago and 70,000 years ago.
Starting point is 01:41:41 Probably, yeah. But these are the ones we were detecting currently. Is it just sort of like there's not really an answer or just contingent to why one time there's this kind of diffusion where most of the archaic genome survives and the other time it's total replacement? I think that this is not at all surprising given the context. So, like, if you look, if you think about this model, this is 700,000 or 800,000 years ago, this is 300,000 years ago, right? So this is like 400,000 years separated.
Starting point is 01:42:09 You talked about the Batia paper with me earlier. That's two populations 70,000 years separated. There's no biological incompatibilities between West Africans and Europeans. There's no natural selection against biological incompatibilities. So we know with when Neanderthals and modern humans met and mixed, there were biological incompatibilities. That was at 700,000 years ago. And so as populations become more apart, there begins to be biological and compatibility rapidly developing probably as the square of the distance separation because you need pairs
Starting point is 01:42:40 of interacting genes and therefore it's the square of the separation. So here it would have been maybe only 400,000 years separated between this lineage and this lineage. Here, it's like 1.2 million years. It's a lot. So these are at the edge of not being able to produce children. So this is quite different humans. These are actually three times closer than these. And like if you look at mixtures of humans today, there are mixtures in Southern Africa today are people who are half this distance.
Starting point is 01:43:08 Right? Like if you look at Khoisan and Bantu people mixing in Southern Africa like Rheosa, which is the population of, for example, Nelson Mandela, this is groups that are separated by almost 200,000 years, which is half of this, totally compatible, compatible. And so what you're seeing is this is a group that's actually completely permeable genetically
Starting point is 01:43:28 or nearly completely permeal. This one almost certainly has substantial biological incompatibilities because 300,000 years later, two or 300,000 years later, we see the interbreeding between Neanderthals and modern humans or between Denisovans and modern humans. There's clear evidence of incompatibility at that point. But this would be even bigger. So what you would expect to see is that as this group spread,
Starting point is 01:43:49 they would be moving into a territory full of archaic humans, and there would be some interbreeding, but the kids would be not very fit, they would die off, there would be a lot of infertility, And so the barriers to gene flow and to interbreeding would be greater. So to me, it's not at all surprising that as this group moves into Eurasia, here's Eurasian archaics, the ancestors of Denisovans. And these are only 400,000 years diverged from these people over here.
Starting point is 01:44:18 And here's African archaics. And these are like 1.2 million years diverge. So, you know, they just don't interbreed as much. And so you don't get as much, much, much, much, much, much. but the key thing is at the same time. It's the same time. So like it really feels like the signature of an explosion of people from one place, interacting with people here, interacting with people here. It's the same sort of cultural revolution or technological revolution, impacting this place, impacting this place, and creating populations that are kind of both impacted
Starting point is 01:44:51 by this cultural revolution, which we know is the case because they shared the same toolkit. And so, you know, some people argue that level law technology is independently invented, but this would be a sort of, but, you know, it's very similar, and this would be a way that it would have the same origin. Interesting. And sort of, so there's a cultural shared thread, this shared toolkit. There's a mitochondrial DNA and Y chromosome thread, which is, and then there is a timing sort of shared thread, which is they both form by mixture.
Starting point is 01:45:20 Because otherwise you'd have to believe that Neanderthal's independently developed Stone Age. Yes, which is not inconceivable, but it's a little bit like believing that farming independently developed in multiple parts of the world. Right. But it did. It did? Yeah.
Starting point is 01:45:34 So as I said, this is probably wrong. I'm trying to tell you that, like, we don't really know the world we live in. And, like, you know, like, this is not obviously wrong. In fact, to me, this is much more plausible than the model we currently, like, sort of write down. Like, you know, it's probably wrong. But, like, it's just much more plausible. It explains many more things.
Starting point is 01:45:54 it's no more complicated. Interesting. Do you want to recapitulate the thing you're saying about the analogy to the Ptolemy and the epicycles was quite interesting? Yeah. I mean, you know, like, I think that, you know, the model that we've put together collectively about the relationships between archaic and modern humans is sort of accreted over time. There was this, you know, idea that modern humans are distinct and that Neanderthals and
Starting point is 01:46:20 Denisovans are like sisters of each other. And then over time, we detected additional mixture events, like this modern human into Neanderthal, and then this other ones I didn't even talk about, like super divergent lineage filling into Denisovins and like all this other stuff. And we still say, oh, the whole genome says Neanderthals and Denisovans are sisters, so that's the truth.
Starting point is 01:46:42 And we've like patched it all together and gotten it all to work. And oh, you look at the mitochondrial DNA in the Ychromosome and they have this odd pattern and it's improbable, but we can get that to work if we invoke natural selection. you know, things like this. So, so you patch it all together, you make these, it's a little reminds one of sort of what happened in the ancient world where there was this idea that the sun revolves around the earth.
Starting point is 01:47:08 But it doesn't quite explain the movements of the planets properly. And so in order to get the movements of the planets to work right, you know, the Ptolemyan astronomers would have made up these epicycles, these. special extra rotations and movements to make everything work about right. And it was such a convoluted model. And then when Copernicus and colleagues suggested instead that actually what's happening is everything's revolving around the sun, that is simplified things and made things every so much simpler.
Starting point is 01:47:42 So the situation that was happening is that as sort of as astronomical information accumulated, it kept being contradictory to the standard model, but it could be made. to work by proposing another complication and another complication, another complication. But, you know, this is not like as, like, fantastic as, you know, proposing that everything revolves around the sun rather than the earth. But it is much simpler. And actually, it explains many, many things. What is counterintuitive or unexpected or hard to accept about this alternative model? Like, what is the hesitation that people have for adopting this as the... I don't know. I mean, nobody's thinking about this model.
Starting point is 01:48:22 right now. So, I don't know. It's just, I think that, I don't know, it seems like obviously a very natural model to me. There's an aiasia. So Aristarchus, ancient Greek, had the heliocentric theory because he had done a bunch of observations about how far the Earth, or had been deduced how far the Earth is from the Sun had noticed other things. But it was not adopted because his fellow Athenians were like, look, if we believe that the earth revolves around the sun, for it to be the case that we don't see relative movement of the stars to the earth, the only possible explanation is that the stars are so far away that it is just incomprehensible and implausible. And so the heliocentric theory was dismissed.
Starting point is 01:49:12 And the reason what I want to turn ask is what is the equivalent of like, oh, for this to work, the stars had to be so far away that isn't conceivable, where like actually the stars are far away. And maybe we should adopt the implausible implication that this theory gives us. That's a great question. I think that we have to assume that there's a linkage between the cultural transformations in Africa and Eurasia at this time. And that's sort of not something that the community has really put together with the genetic data. So I think that there's this thread in the genetics about substructure in Africans. And then there's this whole world, on ancient DNA, and they've never been put together.
Starting point is 01:49:53 So, you know, nobody's put together the extensive, now extensive work on modern human substructure with the now extensive work based on ancient DNA of archaic human relationships to modern humans. And if you put them together, you realize they line up in terms of their time of substructuring. So I think that, I don't know if that's improbable. It seems actually parsimonious to me, but, yeah. Yeah. And it also seems significant that different to groups of humans in this sense,
Starting point is 01:50:20 time, we're capable of adopting Stone Age technology. Once one group had figured it out, the genetic difference between different human lineages was not so big that you could not show people how to you still. Who knows? I mean, it could be that actually this was genetically driven, right? Like, you know, the time to, we talked before about the time to the common ancestor of human genes, you know, there's nothing at, you know, 100,000 years or 150,000 years. But there's a lot at 400 or 500,000 years.
Starting point is 01:50:49 So if that's what happens, and you have a mutation that occurs in the Caucasus or, you know, somewhere in the Middle East or Northeast Africa, and there's key genetic mutations that make people able to do this, and then this population expands. You know, when it moves into Europe, it's swamped by local genes, but there could be retention of those genes through selection as it expands. So maybe what you're actually seeing is that actually there are genetic developments. Most of the discussion on this, I point, has been focused on the 50 to 100,000-year event. this is like anatomically modern human behavior. But this is like a lot of my archaeologists think this is an equally, if not more profoundly significant event in many ways. And why is that not the event that we should be talking about? And then we know you're talking about how there's no fixed differences between modern humans and the humans 50,000 years ago. Are there any, do we know if there's any fixed differences between the people 50,000 years
Starting point is 01:51:45 ago and the people 300,000 years ago? I think there are. then obviously these interbreeding. I think that this is what we're talking about, which is like if you look at the genetic variation going back 3 or 400,000 years, then there are, do begin to be places where all modern humans
Starting point is 01:52:01 share common ancestry three or 400,000 years ago. And that's another way of saying there begin to be fixed differences at that time depth. So that is where you start seeing evidence for possible fixed differences. What's basically happening if everybody shares a common ancestor 400,000, 500,000 years ago is there's a single thing.
Starting point is 01:52:17 ancestor at that time, and if you compared it to another population, like, you know, these guys, they would descend from a different lineage. So any mutation that occurred ancestral to that single ancestor would be a fixed difference. So this is the time at which you can begin to see fixed differences. But anatomically modern, cognitively modern humans exist by the beginning of the Middle Stone Age and before we're breeding with this ancient group of Africans or breeding with Anatomically modern humans occur exactly here. It's the same moment. This is when they occur.
Starting point is 01:52:51 People who have skeletal features like ours, and Neanderthals appear exactly then. This is when it all happens. So, like, this is when we, there is this disconnect between anatomically modern humans in the skeletal record and between, you know, behaviorally modern humans, which is 50 to 100,000 years ago. But anatomically modern humans appear at this time. And actually, recognizable Neanderthals appear roughly around this time, too. Interesting.
Starting point is 01:53:14 Interesting. But we don't know what exactly happens, if anything, between 200,000 years ago and 50,000 years ago. That goes from just anatomical modernity to behavior. My understanding is no. There begins to be, you know, they're busy making level lost stone tools like Neanderthals for 200,000 years. And not more impressive than Neanderthals in any way. Interesting. And in any obvious way, as I understand. And then there begins to be in the archaeological record a quickening of sort of, you know, behavioral sort of traits, you know, which could be not genetic at all or it could be genetic. Like, you know, there was, there's lots of arguments about this. But, you know, people are obsessed with,
Starting point is 01:53:54 you know, like we were obsessed with intelligence and earlier in our conversation, but people are obsessed with art and, you know, these things that seem important to us. But like, who knows what's important. Yeah. And, yeah. Interesting. Cool. Thanks for the digression. The work that I've been involved in is consistently shown that I was wrong in my biases coming into the work. And I've really been almost traumatized by this. Again and again, I've come into a project with some kind of guess about what the data was showing. And then the data doesn't show that.
Starting point is 01:54:27 So for example, when I got involved in the Neanderthal genome project and helping to analyze data, looking at how archaic Neanderthals were related to modern humans, I was part of a group of scientists who had established that non-Africans were a simple subset of African variation and that there was no evidence at all of Neanderthal interbreeding into the ancestors of modern humans or other archaic interbreeding. Different analyses that I and very much more other people had done made it look like non-African variation was just a subset, a small sample of that in Africa, and that could have fully explained the data. And so that when I was involved in analyzing the Neanderthal DNA sequences, what happened was I found this very strong
Starting point is 01:55:13 evidence of Neanderthals being more closely related to non-Africans than to Africans. And it was very surprising, and I thought it must be a mistake. I thought it was quite incredulous. I thought it was unlikely to be true because other evidence that had been found before seemed to point in the other direction. And so I spent several years trying to make these results go away, as did my colleagues, and we just couldn't make the results go away. They just kept getting stronger.
Starting point is 01:55:40 And this experience working on natural selection was the same. same. So what we had felt here was that what we were convinced of was that natural selection had been pretty quiescent in our species over the last several hundred thousand years. Therefore, if we look at patterns of variation in non-African people today or in any people today, we should see not a lot of selection going on. And indeed, the first ancient DNA studies, beginning in 2015 with this paper that we were involved in with Ian Matheson and colleagues, Indeed, these papers seem to show relatively small numbers of genetic positions associated with natural selection. So in 2015, we analyzed data from about 200 Europeans and Middle Easterners to try to understand frequency changes over time.
Starting point is 01:56:29 And we compared those ancient people who were the sources of modern Europeans to people in Europe today. And we looked at frequency differences that were too extreme to be due to chance. And we were very excited to find 12 positions that we were convinced were highly different. and frequency between Europeans today and what we would expect based on the history that we and others had identified was the history relating modern to ancient Europeans. And so some of these were known and some of these were not known, and this was very exciting. And we hoped that as the numbers of samples would increase and we would get higher resolution to be able to appreciate differences in frequencies over time, we hoped that this would
Starting point is 01:57:08 make it possible to detect far more. And what was quite disappointing over the subsequent decade is that that didn't happen. So, for example, the largest study of that type in 2024 by a group in Copenhagen analyzed the data, much better data than we had in 2015, and found only 21 positions that were highly different in frequency across time. And while that was exciting, it was almost twice as many as we had found in 2015. In a lot of ways, it was disappointing because the sample size and data quality had gone up so much, and yet this is all that was found. And so what that suggested is that we might be hitting an asymptote, and we might not be able to get beyond where we currently were, and that this approach to learning about biology, which was very promising in theory, might actually not produce a high yield. But maybe, in fact, natural selection was quiescent. And in fact, the reason we're seeing so few changes is that actually there's not been a lot of adaptive directional selection.
Starting point is 01:58:04 So that was the situation we found ourselves in until just a few years ago when we carried out this study in our research. group led by Ali Akbar. So what we did is we deployed a few innovations to try to improve our power to detect natural selection. One of them is we just pumped a lot of data into the system. And so we increased the amount of data by about 14-fold. And the main thing that we do in this study is we report data and this study from about 10,000 individuals with new data. So this is like a very big increase in the amount of data in the literature, and the total data set size of ancient individuals distributed over the last 18,000 years is about 16,000 people. So this is a large data set.
Starting point is 01:58:51 It's much larger than was previously possible. And when you have more data, you can estimate frequency changes with much more subtlety. And the data comes from only one part of the world, which is Europe and the Middle East. It's not a more important part of the world than other places, but it's the place where maybe 70 or 80 percent of the data in the English. in DNA literature so far comes from due to historical reasons. And it provides us with a natural laboratory where we can see what happens over one place over time as environments change to the genome. It's really interesting to imagine doing this type of analysis in other parts of the world,
Starting point is 01:59:24 and the comparative analyses are super important and interesting, but this study right now is about this one place in the world where we have particularly fantastic data. The other thing we did is we developed an entirely new methodology that hadn't been used in this area before. And the methodology is based on a technique that had been developed for finding risk factors for disease in medical studies. And a simple way to explain it is we ask how to predict the genetic type a person has based on its pattern of relatedness to other people. So we'll have a data set of about 16,000 ancient people and 22,000 people, if we include the ancient and modern people. And then we look at how closely related each of these 22,000 people are to each other.
Starting point is 02:00:08 and we predict the genetic type at each position in the DNA, at 10 million positions, based on the pattern of relatedness to all of the other 22,000 people. And then we ask if natural selection blowing the frequency of the mutation in the same direction in all geographic places and at all times predicts the data a little bit better than just knowing the relatedness to all the other samples in the database. So we're simply asking the alternative hypothesis is that selection has been blowing in the same direction at all times. And we simply ask if that explains the data better. And that's a dumb assumption, because, of course, the truth is that natural selection is going to have changed in frequency over time.
Starting point is 02:00:47 But we're just asking the simplest of questions whether assuming a constant rate of selection explains the data more than not doing so. And just to summarize to make sure I've understood, you're trying to make a model that predicts allele frequency changes over time. Right. And you have two different parts. Right. One part is this genetic related relatedness matrix, which captures how similar different genomes are to each other and that should capture
Starting point is 02:01:12 the impact of different bottlenecks and of drift and of population admixtures and all those things which affect the entire genome. Correct. And then you have the separate thing which is like, okay, if we look at specific locations, can we just say that, oh, this location has been selected at whatever coefficient over time?
Starting point is 02:01:33 And if we add some coefficient, And does it become easier to predict the illegal frequency changes than you would have just seen from this other artifact, which is only predicted, which is just looking at like, oh, if you look at the whole genome, are these guys in the same, you know, have they gone through the same bottlenecks? Have they gone through the same drift, et cetera? That's precisely right. Okay. Okay. So what have you learned? So when we analyzed the data this way, we looked at 10 million positions in the DNA that in these 22,000 people. 16,000 of them were ancient, and we looked to see if there was more change in this consistent direction over time than you would expect by chance. And when we analyze the data, we found many,
Starting point is 02:02:15 many hundreds of places in the DNA that were changing too much over time and too consistent a way to be explained by chance. Now, there's a bit of a statistical problem in figuring out how many there are because they're so densely packed that they're close to each other and they're interfering with each other. But when you try to piece them out and say, let's count them only one in each place in the DNA and blank out the others, we find at least about 479 positions that are all independently pushing in the same way. Those positions are 99% confident that they're real. By another criteria of more than 50% confident that they're real, we think that about 3,800 positions are all pushing in the same direction. So this is like a
Starting point is 02:02:56 crazy number of results, given that in our work previously, in other people's at work, there were at most a couple of dozen discoveries coming from a single scale. So when we got this result, we were very surprised. We thought it must be wrong, and we spent the next couple of years trying to make the results go away, but they just kept getting stronger. And so what we were trying to do is to look for some kind of independent type of evidence to tell us whether these positions were real. And we stumbled on something really powerful for this purpose that had not been used in this way before. And it relied on the fact that we had very large numbers of discoveries, like many hundreds of discoveries or even thousands. And so what we did is we took a completely independent data set, which was the corpus of genome-wide association studies.
Starting point is 02:03:41 So these are studies that people have carried out in hundreds of thousands of people looking for whether particular genetic mutations are more common in people with high blood pressure and with low blood pressure or something like this. So we took the UK Biobank, which is about 500,000 people from Great Britain who have been measured for hundreds and hundreds of traits. The whole genomes of all these peoples have been sequenced. And for each of these traits, we could look whether each of these 10 million positions are connected to this trait in some way in a convincing way. So in 10 million positions, about 15%, about 1.5 million positions in the DNA are predictive of at least one of these several hundred traits. So then we could ask a question, is our natural selection signal, our statistic, is it related to whether a mutation causes high blood pressure or some other trait? So we slid our statistic for natural selection from upward, you know, to a value of one, a value of two, a value of three, a value of four, a value of five. And as we did that, the enrichment for genetic mutations that affect traits got higher and higher. So whereas it was only 15% when we didn't use our selection statistic, when we required the selection statistic to be above about five, there was about a five-fold enrichment for mutations that caused traits. What is this selection statistic?
Starting point is 02:05:01 This is the statistic we use to measure whether a mutation is changing over time significantly in a non-zero way. So it can be approximately thought of as a normally distributed statistic, a Gaussian statistic, which is the number of standard deviations the statistical value is away from zero, where zero is no natural selection. It's not exactly that, but it's close to that. And so if this statistic is above five, we see about a five-fold enrichment in mutations that affect a trait. And so instead of 15% of the mutations that are at random affecting the trade, it's like 60 or 70 that are affecting the trait when we slide our statistic upward. And this is providing completely independent evidence that these sites are real. And as you slide above five, there's no more enrichment. So our interpretation of these results
Starting point is 02:05:59 that we were able to validate and show that these interpretations made sense using computer simulations of our process, our interpretation of this result is that once you slide the statistic above five, essentially all the signals of natural selection are real. Okay, and so just to make sure I understood, you're saying, look, in order to figure out
Starting point is 02:06:19 what allele has been under selection, your model assigned statistics saying, oh, in order to explain why this has a specific frequency, we're going to give it a selection statistic. And independently, you know, we run these studies on modern populations where we say, if you look at height
Starting point is 02:06:36 or eye color, intelligence, whatever trait, what are the parts of the genome that are correlated with that trait? And the higher statistic you give it in your study, in order to explain the oil frequency changes over time as a result of selection, the more probable it is that that region in the genome
Starting point is 02:06:54 is associated with traits that have like some functional thing that we can measure. That's exactly right. And this is like a brilliant idea that Ali had. And it's it really abandons the traditional approach of assigning statistical significance to mutations that cause a trait because we're just using an external piece of information, the correlation to traits measured in a completely different way to read off the probability mutations are real. So we can ask how much enrichment for real signal is there, given a particular selection statistic.
Starting point is 02:07:31 And if it's halfway enriched to the plateau, the correct interpretation of that we're able to show is that 50% of the mutations are really selected. If it's three quarters of the way toward the plateau, there's a three-quarters probability that the mutation is real. If there's a 99% of the way to the plateau, there's a 99% probability that's real. So that gives us a calibrated estimate of the probability that a particular position is really under natural selection. A major concern here is that actually what we're seeing is not that these mutations are really under selection, but rather that both association to a disease and our selection signal are due to some third thing that's causing both of them, which is a type of selection, which is not what we're after, not selection. to adapt to new environments,
Starting point is 02:08:21 but what's called background selection, selection against newly arising bad mutations that are removed from the population that tend to be concentrated in genes. Genes are also the parts of the genome that tend to be associated to traits. And so this common process is causing both the enrichment for trait signals
Starting point is 02:08:39 and is also causing the enrichment for selection signals that we're observing. That's the concern. We were super concerned about this. So what we did is we repeated this enrichment analysis, in slices of the DNA that all were affected to the same extent by background selection, by this reign of slightly bad mutations, and we get exactly the same pattern.
Starting point is 02:09:01 We also repeated this experiment in just using mutations of the same frequencies because there's different statistical power to detect these signals at different frequencies, and we see the same pattern where above a value of the selection statistic of around five, we get this plateau. So the thing that changed allowed you to increase the amount of sequences you're generating by Tward's magnitude is just the statistical method you're using to identify which part is human, or what exactly changed in 2014 and since then? So there's been a whole series of improvements.
Starting point is 02:09:34 I think that the big ones have been the huge drop in sequencing cost, which made it possible to generate ancient DNA in the first place. So the drop and cost has been a million fold since the late 2000s, and another maybe one to two orders of magnitude from 2010 to today. So that's one big change. Another change has been in solution enrichment. So it's been this way of taking a sample that has very small percentages of human DNA, but then suddenly creating a process that will mean that the great majority of the sequences
Starting point is 02:10:06 that one's analyzing will be useful for analyses. And so the approach that we used was we took the DNA samples that we had, most of which were very low percentages of human DNA, less than 10%, often less than 1%, which is such a low proportion that it's prohibitively expensive to sequence them and to just brute force sequencing them, given the technology that we had available at the time. And so we took these samples and washed them over an artificially synthesized set of short DNA fragments that targeted positions of the DNA that we were interested in analyzing. So this is more than a million positions that are highly variable in people,
Starting point is 02:10:44 and we picked many of these to be biologically interesting. We had a whole set of known biological targets that affected traits in genome-wide association studies, which is the way that people look to see if there's particular genetic variants and modern people that have particular impacts and phenotypes and traits. And so what we did is we had this artificially synthesized set of DNA fragments that we washed our ancient sample over, and it bound the parts of the DNA that we targeted, and the resulting sequence that we generated was very enriched for the parts of the genome that were informative about history. And even though only 10% or 1% of the DNA was human, it ended up that a very large fraction was from the parts of the genome that we were interested in.
Starting point is 02:11:30 And it became economically efficient to do it. And sorry, what was the other 99% of the DNA? It's mostly microbial. So it's from bacteria and fungi that colonize a person's body after they die, depending on how they do. die, there'll be more or less of these bacteria and fungi. And so when you typically sequence DNA from a person, it'll just be full of microbial sequence. Sometimes the microbial sequence is very interesting, so it might be pathogens that a person died of. So there's, for example, amazing work about, for different plagues, malaria and black death and hepatitis B and so on
Starting point is 02:12:10 that have been obtained from the sequences of these pathogens and people's teeth and other parts of their body when they died. But we're focusing here on the human DNA. And so what we did is this changed the amount of data that was possible to produce from tens per year to hundreds per year. And then we further roboticized and industrialized the process so that there were many hundreds or even thousands per year. And so just in our laboratory, we've been generating genome-scale data from more than
Starting point is 02:12:40 5,000 individuals per year. I know this is true also of several other laboratories. in the world now. And this huge jump in data, this sort of semi-exponential or even super-exponential jump in some cases, has made it possible to ask and answer questions. So while there were only on the order of 10 genome sequences
Starting point is 02:12:59 from humans in 2010, this year it was passed more than 20,000 reported sequences. So there's several orders of magnitude increase. And the questions we were able to ask in 2014 are just not the same as the ones we can ask today. Yeah, awesome. Excellent. David, thanks for your time. Thank you. Thank you, Dorcasch.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.