Lex Fridman Podcast - #133 – Manolis Kellis: Biology of Disease
Episode Date: October 26, 2020Manolis Kellis is a computational biologist at MIT. Please support this podcast by checking out our sponsors: - SEMrush: https://www.semrush.com/partner/lex/ to get a free month of Guru - Pessimists A...rchive: https://pessimists.co/ - Eight Sleep: https://www.eightsleep.com/lex and use code LEX to get $200 off - BetterHelp: https://betterhelp.com/lex to get 10% off EPISODE LINKS: Manolis Website: http://web.mit.edu/manoli/ Manolis Twitter: https://twitter.com/manoliskellis Manolis YouTube: https://www.youtube.com/channel/UCkKlJ5LHrE3C7fgbnPA5DGA Manolis Wikipedia: https://en.wikipedia.org/wiki/Manolis_Kellis PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. 00:00 - Introduction 08:05 - Molecular basis for human disease 32:04 - Deadliest diseases 37:47 - Genetic component of diseases 46:38 - Genetic understanding of disease 1:02:25 - Unified theory of human disease 1:08:26 - Genome circuitry 1:33:29 - CRISPR 1:45:06 - Mitochondria 1:53:10 - Future of biology research 2:22:46 - The genetic circuitry of disease
Transcript
Discussion (0)
The following is a conversation with Manoas Kellis, his third time on the podcast.
He is a professor at MIT and head of the MIT Computational Biology Group.
This time, we went deep on the science, biology and genetics.
So this is a bit of an experiment.
Manoas went back and forth between the basics of biology to the latest state of the art
and the research.
He's a master at this, so I just sit back and enjoy the ride.
This conversation happened at 7 a.m., so it's yet another podcast episode after an all-nighter
for me.
And once again, since the universe has a sense of humor, This one was a tough one for my brain to keep up, but
I did my best and I never shy away from good challenge.
Quick mention of eSponsor followed by some thoughts related to the episode.
First is SCM Rush, the most advanced SEO optimization tool I've ever come across. I don't like
looking at numbers, but someone probably should. It helps
you make good decisions. Second is pessimist archive. They're back. One of my favorite
history podcasts on why people resist new things from recorded music to umbrellas,
to cars, chas, coffee, and the elevator. Third is 8th sleep. A mattress that cools itself,
measures heart rate variability, has an app,
and has given me yet another reason
to look forward to sleep,
including the all-important power nap.
And finally, better help.
Online therapy, when you want to face your demons
with a licensed professional,
not just by doing the David Goggins
like physical challenges, like I seem to do on occasion.
Please check out the sponsors in the description to get a discount and to support this podcast.
As a side note, let me say that biology in the brain and in the various systems of the body
filled me with awe. Every time I think about how such a chaotic mess coming from its humble origins in the ocean was able to achieve such incredibly complex and robust mechanisms of life that survived despite all the forces of nature that want to destroy it. we humans have engineered that it makes me feel that in order to create artificial general intelligence
and artificial consciousness, we may have to completely rethink how we engineer computational systems.
If you enjoy this thing, subscribe on YouTube, review it with 5 stars and Apple podcasts,
follow on Spotify, support on Patreon, or connect with me on Twitter, at Lex Friedman.
As usual, I'll do a few minutes
of Asnow and no ads in the middle. I try to make these interesting, but I give you time stamps,
so if you skip, please still check out the sponsors by clicking the links in the description.
It's the best way to support this podcast. This show is sponsored by SCM Rush,
which, if you look around, seems to be one of, if not the most respected digital
marketing tool out there.
It does a lot of stuff, including SEO optimization of keywords, backlinks, content creation,
social media posts, and so on.
They have over 45 tools and are trusted by over 6 million marketers worldwide.
I don't like numbers, but that's because I'm an idiot with that stuff.
And in general, I speak from the heart and data be damned, but somebody needs to pay attention
to numbers because otherwise you can't make optimal decisions.
I believe heart comes first, data second, but both are necessary.
I started using them just for fun to explore non-numeric things like what kind of title
is a word connect with people.
As a writer and a somewhat crappy part-time speaker, that information helps me in moderation,
of course.
The amount of data that they put at your fingertips is just amazing.
So if you want to optimize your online presence, check them out at scmrush.com slash partner slash Lex to get a free
month of guru level membership. This episode is also sponsored by an amazing
podcast called pessimist archive. They were one of the first sponsors of this
podcast ever and now they're back. I think it should be one of the top podcasts in the world, frankly.
It's a history show about why people resist new things.
Each episode looks at a moment in history when something new was introduced, something that
today we think of as a commonplace.
Like recorded music, umbrellas, bicycles, cars, chastis, coffee, and the elevator.
And the show explores why it freaked everyone out.
The fascinating thing about this show is that stuff that happened a long time ago, especially
in terms of our fears of new things, repeats itself in the modern day, and so has many
lessons for us to think about in terms of human psychology and the role of technology
in our society.
Anyway, subscribe and listen to pessimist archive anywhere and everywhere.
The website is pessimists.co.co.
I highly recommend this podcast you won't regret it.
And yes, dear listener, the Dan Carlin conversation is coming soon. Probably before the election,
but please be patient with me. Okay, this show is also sponsored by 8th Sleep and its podprone
mattress that you can check out at 8thsleep.com slash Lex to get $200 off. It controls temperature
with an app, it is packed with sensors, and can cool down to as low as 55 degrees
on each side of the bed separately.
And it totally has been a game changer for me.
I don't particularly like fancy material possessions
as you may or may not know.
And in general, I live a minimalist life,
but sleep is important.
So if you're a little less minimalist than me,
slash insane than me, then I recommend you invest in quality temperature control sleep.
A cool bed surface with a warm blanket after a long day of focus work is heaven. Same applies
for the perfect 30 minute power nap. They can track a bunch of metrics like heart rate variability, but cooling alone
is honestly worth the money. Anyway, go to a 8 sleep dot com slash Lex to get $200 off.
This show is also sponsored by BetterHelp spelled H E L P help. Every time I say that, it
reminds me of the movie Castaway, which is awesome.
Check it out at BetterHealth.com slash Lex.
They figure out what you need and match it with a licensed professional therapist in
under 48 hours.
I chat with a person on there and enjoy it.
Of course, I also regularly talk to David Goggins these days, who is definitely not a licensed
professional therapist, but he does help me meet his and my demons
and become comfortable to exist in their presence. Everyone is different, but for me I think
suffering is essential for creation, but you can suffer beautifully in a way that doesn't destroy
you. Therapy can help in whatever form that therapy takes. BetterHelp is an option worth trying.
They're easy, private, affordable, and available worldwide.
You can communicate by text anytime and schedule weekly audio and video sessions.
Check it out at betterhelp.com slashlex.
And now here's my conversation with Monolus Calis.
So your group at MIT is trying to understand the molecular basis of human disease. What are some of the biggest challenges in your view?
Don't get me started.
I mean, understanding human disease is the most complex challenge in modern science.
So because human disease is as complex as the human genome, it is as complex as the human
brain.
And it is in many ways even more complex because the more we understand disease complexity,
the more we start understanding genome complexity and epigenome complexity and brain circuitry complexity
and immune system complexity and cancer complexity and so on and so forth.
So traditionally, human disease was following basic biology.
You would basically understand basic biology in model organisms like, you know, mouse and fly and yeast, you would understand sort of mammalian biology, an animal biology,
and eukaryotic biology in sort of progressive layers of complexity, getting closer to human
phylogenetically. And you would do perturbation experiments in those species to see if I knock out a
gene, what happens. And based on the knocking out of these genes, you would basically then have a way to drive
human biology, because you would sort of understand the functions of these genes, and
then if you find that a human gene locust, something that you've mapped from human genetics
to that gene, is related to a particular human disease you take.
Now, I know the function of the gene from the model organisms.
I can now go and understand the function of that gene in human.
But this is all changing. This is dramatically changed. So that was the old way of doing basic biology.
You would start with the animal models, the eukaryotic models, the mammalian models, and then you would go to human.
Human genetics has been so transformed in the last decade or two, that human genetics
is now actually driving the basic biology.
There is more genetic mutation information in the human genome than there will ever be
in any other species.
What do you mean by mutation information?
So perturbations is how you understand systems.
So an engineer builds systems, and then they
know how they work from the inside out.
A scientist studies systems through perturbations.
You basically say, if I poke that balloon, what's going to happen?
And I'm going to film it in super high resolution,
understand, I don't know, aerodynamics or fluid dynamics
if it's filled with water, et cetera.
So you can then make experimentation by perturbation, resolution, understand, I don't know, aerodynamics or fluid dynamics if it's filled with water, et cetera.
So you can then make experimentation by perturbation, and then the scientific process is sort of
building models that best fit the data, designing new experiments that best test your models and
challenge your models and so on and so forth.
This the same thing with science, basically, if you're trying to understand biological
science, you basically want to do perturbations that then drive the models.
So how do these perturbations allow you to understand disease?
So if you know that a gene is related to disease, you don't want to just know that it's related
to the disease.
You want to know what is the disease mechanism because you want to go and intervene.
So the way that I like to describe it is that traditionally,
epidemiology, which is basically the study of disease, you know, sort of the observational study of disease, has been about correlating one thing with another thing. So if you have a lot of
people with liver disease who are also alcoholics, you might say, well, maybe the alcoholism is driving the liver disease,
or maybe those who have liver disease self-medicate with alcohol, so that the connection could be
either way. With genetic epidemiology, it's about correlating changes in genome with phenotypic
differences, and then you know the direction of causality.
So, if you know that a particular gene is related to the disease, you can basically say,
okay, perturbing that gene in mouse causes the mice to have X phenotype. So, perturbing
that gene in human causes the humans to have the disease, so I can now figure out what are the detailed molecular phenotypes in the human that are related to that
organismo phenotype in the disease.
So it's all about understanding disease mechanism, understanding what are the pathways, what are
the tissues, what are the processes that are associated with the disease, so that we know
how to intervene.
You can then prescribe particular medications that also alter these processes. You can prescribe lifestyle changes that also affect these processes
and so on and so forth.
That's such a beautiful puzzle to try to solve. Like what kind of perturbations eventually have this ripple effect that leads to a disease
across the population. And then you started that for animals, a mice,
first, and then see how that might possibly connect to humans. How hard is that puzzle
of trying to figure out how little perturbations might lead to in a stable way to a disease?
In animals, we make the puzzle simpler because we perturb one gene at a time.
That's the beauty of it's the power of animal models.
You can basically decouple the perturbations.
You only do one perturbation
and you only do strong perturbations at a time.
In human, the puzzle is incredibly complex.
Because, I mean, obviously you don't do human experimentation,
you wait for natural
selection and natural genetic variation to basically do its own experiments, which it
has been doing for hundreds and thousands of years in the human population and for hundreds
of thousands of years across the history leading to the human population.
So you basically take this natural genetic variation that we all carry within us, every one of us carries six million perturbations.
So I've done six million experiments on you, six million experiments on me,
six million experiments on everyone of seven billion people on the planet.
What's the six million correspond to?
of 7 billion people on the planet. What's the 6 million correspond to?
6 million unique genetic variants
that are segregating the human population.
Every one of us carries millions of polymorphic sites.
Polymany morph forms.
Polymorphic means many forms, variants.
That basically means that every one of us
has single nucleotide alterations that we
have inherited from mom and from dad that basically can be thought of as tiny little perturbations.
Most of them don't do anything, but some of them lead to all of the phenotypic differences that we
see between us. The reason why two twins are identical is because these variants completely
determine the way that I'm going to look
at exactly 93 years of age. How happy are you with this kind of data set? Is it large enough
of the human population of Earth that too big, too small? Yeah, so is it large enough,
is a power analysis question? And in every one of our grants, we do a power analysis
based on what is the effect size that I would like to detect
and what is the natural variation in the two forms.
So every time you do a perturbation,
you're asking a change in form A into form B.
Form A has some natural genetic,
some natural phenotypic variation around it,
and form B has some natural phenotypic variation around it, and form B has some natural phenotypic
variation around it. If those variances are large and the differences between the mean of A and the
mean of B are small, then you have very little power. The further the means go apart, that's the
effect size, the more power you have, and the smaller the standard deviation, the more power you
have. So basically, when you're asking, is that sufficiently large?
Certainly not for everything, but we already have enough power
for many of the stronger effects in the more tight distributions.
So that's the hopeful message that there exists parts of the genome
that have a strong effect that has a small variance.
That's exactly right.
Unfortunately, those perturbations are the basis of disease in many cases.
So it's not a, you know, hopeful message.
Sometimes it's a terrible message.
It's basically, well, some people are sick.
But if we can figure out what are these contributors to sickness, we can then help make them better and help many other people better who don't carry that exact mutation, but who carry mutations on the same pathways.
And that's what we like to call the alilic series of a gene.
You basically have many perturbations of the same gene in different people, each with a different
frequency in the human population and each with a different effect on the individual
of carism. So you said in the past there would be these small experiments on
perturbations and animal models, what does this puzzle-solving process look like
today? So we basically have something like 7 billion people
in the planet, and every one of them
carries something like 6 million mutations.
You basically have a enormous matrix of genotype
by phenotype by systematically measuring
the phenotype of these individuals.
And the traditional way of measuring this phenotype
has been to look at one trait at a time.
You would gather families and you would sort of paint
the pedigrees of a strong effect, what we'd like to call
Mendelian mutation.
So a mutation that gets transmitted in a dominant or a
recessive but strong effect form.
We're basically one locus plays a very big role in that disease.
And you can then look at carriers versus non-carriers in one family,
carriers versus non-carriers in another family, and do that for hundreds,
sometimes thousands of families, and then trace these inheritance patterns,
and then figure out what is the gene that plays that role.
Is this the matrix that you've shown in talks or lectures?
So that matrix is the input to the stuff that I saw in talks.
So basically that matrix has traditionally been strong effects genes.
What the matrix looks like now is instead of pedigrees, instead of families, you basically
have thousands and sometimes hundreds of thousands of unrelated individuals, each with all of
their genetic variants, and each with their phenotype, for example, height or lipids, or
whether they're sick or not for a particular trait.
That has been the modern view, instead of going to families,
going to unrelated individuals with one phenotype at a time.
And what we're doing now, as we're
maturing in all of these sciences,
is that we're doing this in the context
of large medical systems or enormous cohorts
that are very well phenotype, to cross hundreds of phenotypes,
sometimes with a complete electronic health record.
So you can now start relating,
not just one gene, segregating one family,
not just thousands of variants,
segregating with one phenotype,
but now you can do millions of variants
versus hundreds of phenotypes.
And as a computer scientist,
I mean, decompvolving that matrix, partitioning it into the layers of biology that
are associated with every one of these elements is a dream come true.
It's like the world's greatest puzzle. And you can now solve that puzzle by
throwing in more and more knowledge about the function of different genomic regions
and how these functions are changed across tissues any of the context of disease.
And that's what my group and many other groups are doing.
We're trying to systematically relate this genetic variation with molecular variation
at the expression level of the genes, at the epigenomic level of the
gene regulatory circuitry, and at the cellular level of what are the functions that are happening
in those cells, at the single cell level, using single cell profiling, and then relate all
that vast amount of knowledge computationally with the thousands of traits that each of
these of thousands of variants are perturbing.
I mean, this is something we talked about, I think last time.
So there's these effects at different levels that happen.
You said at a single cell level, you're trying to see things that happen due to certain perturbations.
And then it's not just like a puzzle of perturbation and disease.
It's perturbation then effect at the cellular level,
then at an organ level.
By like, how do you disassemble this into,
like what your group is working on?
You're basically taking a bunch of the hard problems
in the space.
How do you break apart a difficult disease
and break it apart into problems that you,
into puzzles that you can now start solving?
So there's a struggle here.
Computer scientists love hard puzzles.
And they're like, oh, I wanna build a method
that just de-convolves the whole thing computationally.
And that's very tempting and it's very appealing.
But biologists just like to
decouple that complexity experimentally, to just like peel off layers of complexity experimentally.
And that's what many of these modern tools that, you know, my group and others have both
developed and used, the fact that we can now figure out tricks for peeling off these layers
of complexity by testing one cell type at a time,
or by testing one cell at a time.
And you could basically say, what is the effect of this genetic variant associated with Alzheimer's on human brain?
Human brain sounds like, oh, it's an organ, of course, just go one organ at a time, but human brain has, of course, dozens of different brain regions.
And within each of these brain regions, dozens of different cell types.
And every single type of neuron, every single type of glial cell,
between astrocytes, oligodendrocytes, microglia,
between all of the neural cells and the vascular cells and the immune cells that are
co-inhabiting the brain between the different types of excitatory and inhibitory neurons that are sort of interacting with each other between different layers of neurons in the cortical layers.
Every single one of these has a different type of function to play in cognition, in interaction with the environment, in maintenance of the brain,
in energetic needs, in feeding the brain with blood, with oxygen, in clearing out the debris
that are resulting from the super high energy production of cognition in humans. So all of these things are basically potentially
deconvolvable computationally, but experimentally,
you can just do single cell profiling of dozens of regions
of the brain across hundreds of individuals,
across millions of cells.
And then now you have pieces of the puzzle
that you can then put back together to understand that complexity.
I mean, first of all, the cells in the human brain are the most...
maybe I'm romanticizing it, but cognition seems to be very complicated.
So separating into the function, breaking Alzheimer's down to the cellular level seems very challenging.
Is that basically you're trying to find a way that some perturbation in genome results
in some obvious major dysfunction in the cell. You're trying to find something like that.
Exactly. So what does human genetics do? Human genetics basically looks at the whole path from
genetic variation all the way to disease. So human genetics has basically taken thousands of
basically taken thousands of Alzheimer's cases and thousands of controls matched for age for sex, for environmental backgrounds and so on and so forth.
And then looked at that map where you're asking what are the individual genetic perturbations
and how are they related to all the way to Alzheimer's disease.
And that has actually been quite successful.
So we now have, you know, more than 27 different low-cy,
these are genomic regions that are associated with Alzheimer's
at this end-to-end level.
But the moment you sort of break up that very long path
into smaller levels, you can basically say from genetics, what are the
epigenomic alterations at the level of gene regulatory elements, where that genetic
variant perturbs the control region nearby. That effect is much larger.
You mean much larger in terms of this down the line impact?
Or it's much larger in terms of the measurable effect this A versus B variance
is actually so much cleanly defined when you go to the shorter branches because for one
genetic variant to affect Alzheimer's that's a very long path. That basically means that in the
context of millions of these six million varies that every one of us carries that one single nucleotide
has a detectable effect all the way to the end.
I mean, it's just mind-boggling that that's even possible.
But indeed, there are such effects.
So the hope is, or the most scientifically speaking, the most effective place where to detect
the alteration that results in disease is earlier on in the pipeline.
So it's really possible.
It's a trade-off.
If you go very early on in the pipeline, now each of these epigenomic alterations, for
example, this enhancer control region is active, maybe 50% less, which is a dramatic
effect.
Now you can ask, well, how much just changing one regulatory region in the genome in one cell type change disease?
Well, that path is now long.
So if you instead look at expression, the path between genetic variation and the expression of one gene
goes through many enhancer regions, and therefore it's a subtler effect at the gene level,
but then now you're closer because one gene is acting on, you know,
in the context of only 20,000 other genes, I was supposed to one enhancer acting in the context of
two million other enhancers. So you basically now have genetic epigenomic, the circuitry,
transcriptomic, the gene expression level, and then cellular, where you can basically say, I can measure various properties of those cells.
What is the calcium influx rate when I have this genetic variation?
What is the synaptic density?
What is the electric impulse conductivity and so on and so forth?
So you can measure things along this path to disease and you can also measure endophenotypes. You can basically measure
you know
Your brain activity you can do imaging in the brain. You can basically measure
I don't know the heart rate the pulse the lipids the
amount of blood secreted and so on so forth and. And then through all of that, you can basically
get at the path to causality, the path to disease. And is there something beyond cellular?
So you mentioned lifestyle interventions or changes as a way to or like be able to prescribe
changes in lifestyle. Like what, what about organs, what about the function of the body as a whole?
Yeah, absolutely.
So basically when you go to your doctor,
they always measure your pulse,
they always measure your height,
those measure your weight,
your BMI, basically,
these are just very basic variables.
But with digital devices nowadays,
you can start measuring hundreds of variables for every individual.
You can basically also phenotype cognitively
through tests, Alzheimer's patients.
There are cognitive tests that you typically do
for cognitive decline, these mini mental observations
that you have specific questions to.
You can think of sort of enlarging the set of cognitive tests.
So in the mouse, for example, you do experiments
for how do they get out of mazes, how do they find food,
whether they recall a fear, whether they shake
in a new environment and so on and so forth.
In the human, you can have much, much richer phenotypes,
where you can basically say, not just imaging
at the organ level, but
and in all kinds of other activities at the organ level, but you can also do at the organism
level, you can do behavioral tests and how did they do on empathy, how did they do on
memory, how did they do on long term memory, first your short term memory, and so on and
so forth.
I love how you're calling that phenotype.
I guess it is.
It is.
But like your behavior patterns that might change over over a period of
a life, it's your ability to remember things, your ability to be, yeah,
empathetic or emotionally, your intelligence, perhaps even intelligence has
hundreds of variables.
You can be your math intelligence, your literary intelligence, your puzzle solving intelligence,
your logic, it could be like hundreds of things.
And all of that, we were able to measure that better and better.
And all of that could be connected to the entire pipeline.
We used to think of each of these as a single variable, like intelligence.
I mean, that's ridiculous.
It's basically dozens of different genes that are controlling every single variable. You can basically think of,
imagine us in a video game where every one of us has measures of strength, stamina, energy
left, and so on and so forth. But you could click on each of those like five bars that are just
the main bars and each of those will just give you then hundreds of bars. And you can basically say, okay, great, for my machine learning task,
I want someone who, I'm a human, who has these particular forms of intelligence.
I require now these 20 different things.
And then you can combine those things and then relate them to, of course,
performance in particular task.
But you can also relate them to genetic variation
that might be affecting different parts of the brain, for example, your frontal cortex versus
your temporal cortex versus your visual cortex and so on and so forth. So genetic variation that
affects expression of genes in different parts of your brain can basically affect your music
ability, your auditory ability, your smell, your, you know, just dozens of different phenotypes can be broken down into, you know,
hundreds of cognitive variables and then relate each of those two thousands of
genes that are associated with them.
So somebody who loves RPGs or playing games, there's, uh,
there's two few variables that we can control.
So I'm excited.
If we're in fact living in a simulation, this is a video game.
I'm excited by the quality of the video game.
The game designer did a hell of a good job.
So I'm more impressed.
Oh, I don't know.
The sunset last night was a little unrealistic.
Yeah.
Yeah.
Yeah.
The graphics.
Exactly.
Come on in video. To zoom back out, we've been talking about the genetic origins of diseases, but I think
it's fascinating to talk about what are the most important diseases to understand, and
especially as it connects to the things that you're working on.
So it's very difficult to think about important diseases to understand as many metrics of importance.
One is lifestyle impact.
I mean, if you look at COVID, the impact on lifestyle has been enormous.
So understanding COVID is important because it has impacted the well-being in terms of
ability to have a job, ability to have an apartment, ability to go to work, ability to have
a mental circle of support.
And all of that for millions of Americans, like huge, huge impact.
So that's one aspect of importance.
So basically mental disorders.
Alzheimer's has a huge importance in the well-being of Americans.
Whether or not it kills someone for many, many years, it has a huge impact.
So the first measure of importance is just well-being.
Like impact on the quality of life.
Impact on the quality of life, absolutely.
The second metric, which is much easier to quantify, is deaths.
What is the number one killer?
The number one killer is actually heart disease.
It is actually killing 650,000 Americans per year.
Number two is cancer with 600,000 Americans.
Number three, far far down the list is accidents.
Every single accident combined.
So basically, you know, you read the news, accidents like, you know, there was a huge car crash all over the news.
But the number of deaths, number three by far, 167,000,
lower respiratory disease, so that's asthma and not being able to breathe and so on and so forth,
160,000, Alzheimer's, number five, with 120,000, and then stroke brain aneurysms and so on and so forth
that's 147,000, diabetes and metabolic disorders, etc. That's 85,000.
The flu is 60,000, Suicide, 50,000, and then overdose, etc. goes further down the list.
So of course, COVID has creeped up to be the number three killer this year with more than 100,000 Americans and counting.
But if you think about what we use, what are the most important diseases, you have to understand
both the quality of life and the sheer number of deaths and just numbers of years lost,
if you wish.
And each of these diseases you can take of as a and also including terrorist attacks
as cool shootings for example things which lead to fatalities you can look at as problems that could
be solved and some problems are harder to solve than others. I mean that's part of the equation.
So maybe if you look at these diseases if you look at heart disease or cancer or Alzheimer's
or just schizophrenia and obesity, not necessarily things that kill you, but affect the quality
of life, which problems are solvable, which aren't, which are harder to solve, which aren't.
I love your question because you put in the context of a global effort rather than just a local effort. So basically if you look at the global
aspect, exercise and nutrition are two interventions that we can as a society make a much better job at.
So if you think about sort of the availability of
cheap food, it's extremely high in calories, it's extremely detrimental for you,
like a lot of processed food, etc. So if we change that equation and as a
society we made availability of healthy food much much easier and charged a
burger at McDonald's, the price that it costs on the health system,
then people would actually start buying more healthy foods.
So basically that's sort of a societal intervention, if you wish.
In the same way, increasing empathy, increasing education,
increasing the social framework and support
would basically lead to fewer suicides, it would
lead to fewer murders, it would lead to fewer deaths overall.
So that's something that we as a society can do.
You can also think about external factors versus internal factors.
So the external factors are basically communicable diseases like COVID, like the flu, etc.
And the internal factors are basically things like,
you know, cancer and Alzheimer's where basically your
genetics will eventually, you know, drive you there.
And then of course, with all of these factors,
every single disease has both a genetic component
and environmental component.
So heart disease, you disease, huge genetic contribution,
Alzheimer's, it's like 60% plus genetic.
So I think it's like 79% heritability.
So that basically means that genetics alone
explains 79% of Alzheimer's incidents.
And yes, there's a 21% environmental component where you could basically
enrich your cognitive environment and reach your social interactions, read more books, learn
a foreign language, go running, you know, sort of have a more fulfilling life. All of that will
actually decrease Alzheimer's, but there's a limit to how much that can impact
because of the huge genetic footprints.
So this is fascinating.
So each one of these problems have a genetic component
and an environment component.
And so when there's a genetic component,
what can we do about some of these diseases?
What have you worked on?
What can you say that's in terms of problems
that are soluble here or understandable.
So my group works on the genetic component, but I would argue that understanding the genetic
component can have a huge impact even on the environmental component. Why is that? Because
genetics gives us access to mechanism. And if we can alter the mechanism, if we can impact
the mechanism, we can perhaps counteract some of the environmental components.
So understanding the biological mechanisms leading to disease is extremely important in being
able to intervene. But when you can intervene, the analogy that I like to give is for example,
for obesity. Think of it as a giant bathtub of fat.
There's basically fat coming in from your diet and there's fat coming out from your exercise.
That's an in-out equation and that's the equation that everybody's focusing on.
But your metabolism impacts that, you know, bathtub, basically your metabolism controls the rate at which you're burning energy.
It controls the way the rate at which you're storing energy.
And it also teaches you about the various valves that control the input and the output equation.
So if we can learn from the genetics, the valves, we can then manipulate those valves.
And even if the environment is feeding you a lot of fat and getting a little that out,
you can just poke another hole at the bathtub and just get a lot of the fat out.
Yeah, that's fascinating.
Yeah, so we're not just passive observers of our genetics.
The more we understand, the more we can come up with actual treatments.
And I think that's an important aspect to realize when people are thinking about
strong effect versus weak effect variants. So some variants have strong effects. We talked about
these Mendelian disorders, where a single gene has a sufficiently large effect, pen and trans
specificity and so on and so forth, that basically you can trace it in families
with cases and not cases, cases, not cases and so on and so forth.
But even the, you know, but so these are the genes that everybody says, oh, that's the
genes we should go after because that's a strong effect gene.
I like to think about it slightly differently. These are the genes where genetic impacts that have a strong effect were tolerated.
Because every single time we have a genetic association with disease, it depends on two things.
Number one, the obvious one, whether whether there is genetic variation, standing and circulating
and segregating in the human population, that impacts that gene.
Some genes are so darn important that if you mess with them even a tiny little amount,
that person is dead.
So those genes don't have variation. You're not going to find the genetic association if you don't have variation.
That doesn't mean that the gene has no role.
It simply means that the gene tolerates no mutations.
So that's actually a strong signal when there's no variation.
That's so fast.
Exactly.
Genes that have very little variation are hugely important.
You can actually rank the importance of genes based on how little variation they have. And those genes that have very little variation are hugely important. You can actually rank the importance of genes based on how little variation they have.
And those genes that have very little variation,
but no association with disease,
that's a very good metric to say,
oh, that's probably a developmental gene
because we're not good at measuring those phenotypes.
So it's genes that you can tell evolution
has excluded mutations from,
but yet we can't see them associated with
anything that we can measure nowadays. It's probably early embryonic lethal.
What are all the words you just said early embryonic what?
Litho, meaning meaning that you don't have to die. Okay. There's a bunch of
stuff that is required for a stable functional organism across the board for our entire species, I guess.
If you look at sperm, it expresses thousands of proteins. Does sperm actually need thousands
of proteins? No. But it's probably just testing them. So my speculation is that misfolding of these proteins is a nearly test for failure.
So that out of the millions of sperm that are possible, you select the subset that are
just not grossly misfolding thousands of proteins.
So, it's kind of an assert that this is fully correctly.
Correct.
Yeah, just because if this little thing about the folding of an approach isn't correct,
that probably means somewhere down the line there's a bigger issue.
That's exactly right.
So fail fast.
So basically if you look at the mammalian investment in a new boron, that investment is enormous
in terms of resources.
So mammals have basically evolved mechanisms for fail fast.
We're basically in those early months of development.
I mean, it's horrendous, of course, at the personal level
when you lose your future child.
But in some ways,
there's so little hope for that child to develop and sort of make it through
the remaining months that sort of fail fast is probably a good evolutionary principle for
mammals.
And of course, humans have a lot of medical resources that you can sort of give those
children a chance.
And you know, we have so much more success
in sort of giving folks we have
these strong carrier mutations a chance.
But if they're not even making it
through the first three months, we're not gonna see them.
So that's why when we say,
what are the most important genes to focus on?
The ones that have a strong effect mutation
or the ones that have a weak effect mutation? Well, you know, they're doing might be out because the ones that have a strong effect mutation or the ones that have a weak effect mutation, well, you know, they're doing might be out because the ones that have a strong effect mutation
are basically, you know, not mattering as much. The ones that only have weak effect mutations
by understanding through genetics that they have a weak effect mutation and understanding that
they have a causal role on the disease,
we can then say, okay, great, evolution has only tolerated a 2% change in that gene.
Pharmacutically, I can go in and induce a 70% change in that gene, and maybe I will poke another
hole at the bathtub that was not easy to control in many of the other strong effect genetic variants.
So, okay, so this is a beautiful map of across the population of things that you're saying
strong and weak effects, so stuff with a lot of mutations and stuff with little mutations
with no mutations.
Any of this map, it lays out the puzzle.
Yeah, so when I say strong effect I mean at the level of individual mutations, so basically genes where
so you have to think of first the effect of the gene on the disease remember how it's sort of
painting that map earlier from genetics all the way to phenotype.
That gene can have a strong effect on the disease, but the genetic variant might have a weak
effect on the gene.
So, basically, when you ask what is the effect of that genetic variant on the disease, it
could be that that genetic variant impacts the gene by a lot, and then the gene impacts the disease by a little, or it could be that the genetic variant impacts the gene by a lot and then the gene
impacts the disease by a little, or it could be that the genetic variance impacts the gene
by a little and then the gene impacts the disease by a lot.
So what we care about is genes that impact the disease a lot, but genetics gives us
a full equation.
And what I would argue is if we couple the genetics with expression variation to basically ask what
genes change by a lot, and you know which genes correlate with disease by a lot, even if
the genetic variance changed them by a little, then that those are the best places to intervene.
Those are the best places where thermosudically, if I have even a modest effect, I will have
a strong effect on the disease.
Whereas those genetic variants that have a huge effect on the disease, I might not be able
to change that gene by this much without affecting all kinds of other things.
Interesting.
So, yeah, okay.
So, that's what we're looking at.
What have we been able to find in terms of which disease could be helped?
Again, don't get me started.
This is, we have found so much.
Our understanding of disease has changed so dramatically with genetics.
I mean, places that we had no idea would be involved.
So one of the worst things
about my genome is that I have a genetic predisposition to age-related magnetization, AMD.
So it's a form of blindness that causes you to lose the central part of your vision,
progressively as you grow older. My increased risk is fairly small. I have an 8% chance.
You only have a 6% chance. You, I'm an average. Yeah. By the way, when you say,
my, you mean literally yours. You know this about you. I know this about me. Yeah.
Which is kind of, I mean, philosophically speaking is a pretty powerful thing to live with.
Maybe that's, so we agreed to talk again, by the way, for the listeners
to where we're going to try to focus on science today and a little bit of philosophy next
time, but it's interesting to think about the more you're able to know about yourself
from the genetic information in terms of the diseases, how that changes your own view
of life. Yeah.
So there's a lot of impact there.
And there's something called genetic exceptionalism,
which basically thinks of genetics
as something very, very different than everything else
as a type of determinism.
And let's talk about that next time.
So basically, that's a good preview basically, let's go back to AMD.
So basically with AMD, we have no idea what causes an AMD.
It was a mystery until the genetics were worked out.
And now the fact that I know that I have a predisposition allows me to sort of make
some life choices,
number one, but number two, the genes that lead to the predisposition give us insights
as to how does it actually work.
And that's a place where genetics gave us something totally unexpected.
So there's a complement pathway, which is an immune function pathway, that was in, you know, most of the
low-cye associated with AMD. And that basically told us that, wow, there's an immune basis
to this eye disorder that people had just not expected before. If you look at complement, it was recently also implicated in schizophrenia.
And there's a type of microglia that is involved in synaptic pruning.
So synapses are the connections between neurons.
And in this whole user- or lose- it view of mental cognition and other capabilities,
you basically have microglia, which are immune cells, that are
sort of constantly traversing your brain, and then pruning neuronal connections, pruning synaptic
connections that are not utilized. So in schizophrenia, there's thought to be a change in the pruning,
that basically if you don't prune your synapses the right way,
you will actually have an increased role of schizophrenia. This is something that was
completely unexpected for schizophrenia. Of course, we knew it has to do with neurons, but
the role of the complement complex, which is also implicated in AMD, which is now also
implicated in schizophrenia, was a huge surprise.
What's the complement complex? So it's basically a set of genes, the complement genes that are basically having various immune
roles.
And as I was saying earlier, our immune system has been co-opted for many different roles
across the body.
So they actually play many diverse roles.
And somehow the immune system is connected to the synaptic pruning process.
Exactly. Exactly. So immune cells were co-opt synaptic pruning process. Exactly.
So immune cells were co-opted to prune synapse. How did you forget this out? How does one go off
figuring this intricate connection, like pipeline of connection, though? Yeah, let me give you another
example. So Alzheimer's disease, the first place that you would expect it to act is obviously the
brain.
So we had basically this road map epigenomics consortium view of the human epigenum, the
largest map of the human epigenum that has ever been built across 127 different tissues
and samples with dozens of epigenomic marks measured in, you know, hundreds of donors.
So what we've basically learned through that is that you basically can map what are the
active genregulatory elements for every one of the tissues in the body.
And then we connected these gene regulatory active maps of basically what regions of the
human genome are turning on in every one of different issues.
We then can go back and say, where are all of the genetic loci that are associated with disease?
This is something that my group, I think, was the first to do back in 2010 in this Ernst nature biotech paper. But basically, we were for the first time able to show that specific chromatin states,
specific epigenomic states, in that case, enhancers, were in fact enriched in disease-associated
variants.
We pushed that further in the Ernst nature paper a year later, and then in this roadmap
epigenomics paper, you know, if yours after that, but basically
that matrix that you mentioned earlier was in fact the first time that we could see what
genetic traits have genetic variants that are enriched in what tissues in the body.
And a lot of that map made complete sense.
If you look at a diverse two-immune traits, like allergies
and type 1 diabetes and so on and so forth,
you basically could see that they were enriching,
that the genetic variants associated with those traits were enriched
in enhancers in these gene regulatory elements,
active in T cells and B cells and hematopoiotic stem cells and so on and so forth.
So that basically gave us confirmation in many ways that those immunosate were
indeed enriching in immune cells.
If you look, if you look at type 2 diabetes, you basically saw an enrichment in only one type of sample
and it was pancreatic eyelids. And we know that type of diabetes, you know, sort of stems from the desegulation of insulin
in the beta cells of pancreatic eyelids.
And that sort of was, you know, spot on super precise.
If you looked at blood pressure, where would you expect blood pressure to occur?
You know, I don't know, maybe in your metabolism, in ways that you process coffee
or something like that, maybe in your brain, the way that you stress out and crisis your
blood pressure, et cetera.
What we found is that blood pressure localized specifically in the left ventricle of the
heart.
So the enhancers of the left external in the heart contain a lot of genetic variants
associated with blood pressure.
If you look at height, we found an enrichment specifically
in embryonic stem cell enhancers. So the genetic variants predisposing you to be taller or shorter
are in fact acting in developmental stem cells, makes complete sense. If you looked at inflammatory
bowel disease, you basically found inflammatory, which is immune, and also bowel disease, which is digestive.
And indeed, we saw a double enrichment, both in the immune cells and in the digestive
cells.
So that basically told us that, I have, this is acting in both components.
There's an immune component to inflammatory bowel disease, and there's a digestive component.
And the big surprise was for Alzheimer's.
We had seven different brain samples. We found zero enrichment in the brain
samples for genetic variants associated with Alzheimer's. And this is mind-boggling. Our brains
were literally hurting. What is going on? And what is going on is that the brain samples are
primarily neurons, oligodendrocytes, and astrocytes,
in terms of the cell types that make them up.
So that basically indicated that genetic variants
associated with Alzheimer's were probably not acting
in oligodendrocytes, astrocytes, or neurons.
So what could they be acting in?
Well, the fourth major cell type is actually microglia
microglia are resident immune cells in your brain. Oh, nice.
They immune. Oh, wow. And they are CD14 plus, which is this sort of cell surface markers
of those cells. So they're CD14 plus cells just like microfages that are circulating in your blood.
The microglia are resident monocytes
that are basically sitting in your brain.
They're tissue-specific monocytes.
And every one of your tissues, like your fat, for example,
has a lot of microfages that are resin.
And the M1 versus M2 microfage ratio
has a huge role to play in obesity.
And so basically, again, these immune cells are everywhere, but basically what we found
through this completely unbiased U of what are the tissues that likely underlie different
disorders, we found that Alzheimer's was humongously enriched in microglia, but not at all in the
other cell types.
So what are we supposed to make that if you look at the tissues involved, is that simply
useful for indication of propensity for disease or does it give us somehow a pathway of treatment?
It's very much the second.
If you look at the way to therapeutics, you have to start somewhere.
What are you going to do?
You're going to basically make assays that manipulate those genes and those pathways in those
cell types.
So, before we know the tissue of action, we don't even know where to start. We basically are at a loss, but if you know the tissue of action, and even better if you know the pathway of action,
then you can basically screen your small molecules.
Not for the gene, you can screen them directly for the pathway.
In that cell type, you can basically develop a high throughput multiplexed robotic system
for testing the impact of your favorite molecules
that you know are safe efficacious and hit that particular gene
and so on and so forth.
You can basically screen those molecules against
either a set of genes that act in that pathway
or on the pathway directly by having a cellular assay.
And then you can basically go into mice and do experiments and basically figure out ways
to manipulate these processes that allow you to then go back to humans and do a clinical
trial that basically says, okay, I was able indeed to reverse these processes in mice,
can I do the same thing in humans?
So that the knowledge of the tissues gives you the pathway to treatment, but that's not
the only part.
There are many additional steps to figuring out the mechanism of disease.
And so that's really promising.
Maybe take a small step back.
You've mentioned all these puzzles that were figured out with the nature paper for me, you mentioned ton of diseases, mobility to Alzheimer's,
even schizophrenia, I think you mentioned.
What is the actual methodology of figuring this out?
So indeed, I mentioned a lot of diseases, and my lab works on a lot of different disorders. And the reason for that is that if you look at the, if you
look at biology, it used to be, you know, zoology departments and botanology departments
and, you know, virology departments and so on and so forth. And MIT was one of the first
schools to basically create a biology department like, oh, we're going to study all of life suddenly.
Why was that even a case?
Because the advent of DNA and the genome and the central dogma of DNA makes our
name exploding in many ways unified biology.
You could suddenly study the process of transcription in viruses or in bacteria and have a huge impact on yeast and fly and maybe even mammals
because of these realization of these common underlying processes.
And in the same way that DNA unified biology, genetics is unifying disease studies.
identifying disease studies. So you used to have, you used to have, you know, I don't know, cardiovascular disease department and, you know, neurological disease department
and your other generation department and, you know, basically immune and cancer and so
on and so forth. And all of these were studied in different labs, you know, because it made sense, because
we basically, the first step was understanding how the tissue functions and we kind of
knew the tissues involved in cardiovascular disease and so on and so forth.
But what's happening with human genetics is that all of that, all of these walls and
edifices that we had built are crumbling. And the reason for that is that genetics
is in many ways revealing unexpected connections.
So suddenly, we now have to bring the immunologists
to work on Alzheimer's.
They were never in their room.
They were in another building altogether.
The same way for schizophrenia, we now
have to sort of worry about all these interconnected
aspects.
For metabolite exorders, we're finding contributions from brain.
So suddenly we have to call the neurologist from the other building and so on and so forth.
So in my view, it makes no sense anymore to basically say, oh, I'm a geneticist studying immune disorders.
I mean, that's ridiculous because, I mean, yeah, of course, in many ways, you still need
to sort of focus. But what we're doing is that we're basically saying, we'll go wherever
the genetics takes us. And by building these massive resources, by working on our latest
maps now, 833 tissues,
sort of the next generation of the epigurami's roadmap,
which we're now called epimap, is 833 different tissues.
Using those, we've basically found
enrichments in 540 different disorders.
Those enrichments are not like,
oh, great, you guys work on that and we'll work on this.
They're intertwined amazingly. Those enrichments are not like, oh, great, you guys work on that and we'll work on this.
They're intertwined amazingly.
So of course, there's a lot of modularity, but there's these enhancers that are sort of
broadly active and these disorders that are broadly active.
So basically, some enhancers are active in all tissues and some disorders are enriching
in all tissues.
So basically, there's these multifactorial and these other class which I like to call
polyfactorial diseases, which are basically lighting up everywhere.
And in many ways, it's sort of cutting across these walls that were previously built across
these departments.
And the polyfactorial ones were probably the previous structure of departments wasn't equipped
to deal with those. I mean, again, maybe it's a romanticized question, but you know, there's in physics,
there's a theory of everything.
Do you think it's possible to move towards an almost theory of everything of disease
from a genetic perspective?
So if this unification continues, is it possible that, like, do you think in those terms,
like trying to arrive at a fundamental understanding of how disease emerges, period?
That unification is not just foreseeable, it's inevitable.
I see it as inevitable.
We have to go there.
You cannot be a specialist anymore, if you're a genomicist. You have to be a specialist
in every single disorder. And the reason for that is that the fundamental understanding of the
circuitry of the human genome that you need to solve schizophrenia. That fundamental circuitry
is hugely important to solve Alzheimer's and that same circuitry is hugely important to solve Alzheimer's, and that same circuitry
is hugely important to solve metabolic disorders.
And that same exact circuitry is hugely important for solving immune disorders and cancer and
every single disease.
So all of them have the same sub-task.
And I teach dynamic programming in my class,
dynamic program is all about sort of not redoing the work. It's reusing the work that you do once. So basically for us to say,
oh, great, you know, you guys in the immune building goes
solve the fundamental circuitry of everything. And then you
guys in the skits of rena building goes solve the fundamental
circuitry of everything separately.
It's crazy.
So what we need to do is come together
and have a circuitry group,
the circuitry building that tries to solve the circuitry of everything.
And then the immune folks who will apply this knowledge
to all of the disorders that are associated with immune dysfunction.
And the Schizophrenia folks will basically interact with both the immune folks all of the disorders that are associated with immune dysfunction.
And the schizophrenia folks will basically interact with both the immune folks and with the neuronal
folks.
And all of them will be interacting with the circuitry folks and so on and so forth.
So that's sort of the current structure of my group, if you wish.
So basically what we're doing is focusing on the fundamental circuitry.
But at the same time, we're the users of our own tools by collaborating
with many other labs in every one of these disorders that we mentioned. We basically have
a heart focus on cardiovascular disease, coronary artery disease, heart failure, and so
and so forth. We have an immune focus on several immune disorders. We have cancer focus on metastatic melanoma and immunotherapy response.
We have psychiatric disease focus on schizophrenia, autism, PTSD, and other psychiatric disorders.
We have an Alzheimer's and neurodegeneration focus on hunting the disease, ALS, and AD related disorders
like frontal temporal dementia and Louis body dementia,
and of course a huge focus on Alzheimer's.
We have a metabolic focus on the role of exercise and diet
and sort of how they're impacting metabolic organs
across the body and across many different issues.
And all of them are interfacing with the circuitry organs across the body and across many different issues.
And all of them are interfacing with the circuitry.
And the reason for that is another computer science principle
of eat your own dog food.
If everybody ate their own dog food,
dog food would taste a lot better.
The reason why Microsoft Excel and Word and PowerPoint
was so important and so successful
is because the employees that were working on them
were using them for their day-to-day tasks.
You can just simply build a circuitry and say,
here it is guys, take the circuitry, we're done,
without being the users of that circuitry because you then go back and
Because we span the whole spectrum from profiling the epigenum using comparative genomics finding the important nucleotide in the genome
Building the basic functional map of what are the genes in the human genome?
What are the genoregulatory elements of the human genome?
I mean over the years we've written a series of papers on how do you find human genes in the first place using
comparative genomics. How do you find the motifs that are the building blocks of gene regulation
using comparative genomics? How do you then find how these motifs come together and act
in specific tissues using epigenomics? How do you link regulators to enhancers and enhancers to their target genes using epigenomics
and regulatory genomics?
So through the years, we've basically built all this infrastructure for understanding what
I like to say every single nucleotide of the human genome and how it acts in every one
of the major cell types and tissues of the human
body.
This is no small task.
This is an enormous task that takes the entire field, and that's something that my
group has taken on, along with many other groups.
And we have also, and that sort of a thing sets my group perhaps apart, we have also worked
with specialists in every one of these disorders to basically
further our understanding all the way down to disease. And in some cases collaborating
with Pharma to go all the way down to therapeutics because of our deep, deep understanding of
that basic circuitry. And how it allows us to now improve the circuitry, not just treat
it as a black box,
but basically go and say, okay,
we need a better cell type specific wiring
that we now have at a teacher specific level.
So we're focusing on that
because we're understanding the needs from the disease front.
So you have a sense of the entire pipeline.
I mean, one, maybe you can indulge me
with one nice question to ask would be, how do you, from the scientific perspective,
go from knowing nothing about the disease to going,
you said, to go into the entire pipeline
and actually have a drug or a treatment that
cures that disease.
So that's an enormously long path
and an enormously great challenge.
And what I'm trying to argue is that
it progresses in stages of understanding
rather than one gene at a time.
The traditional view of biology was you have one postdoc
working on this gene
and another postdoc working on that gene.
And they'll just figure out everything about that gene. And that's their job. What we've realized is how
polygenic the diseases are. So we can't have one post-opera gene anymore. We now
have to have these cross-cutting needs. And I'm going to describe the path to
circuitry along those needs.
And every single one of these paths, we are now doing in parallel across thousands of genes.
So the first step is you have a genetic association.
And we talked a little bit about sort of the Mendelian path and the polygenic path to that
association.
So the Mendelian path was looking through
families to basically find gene regions and ultimately genes that are underlying particular
disorders. The polygenic path is basically looking at unrelated individuals in this giant
matrix of genotype by phenotype and then finding hits where a particular variant impacts disease all the
way to the end.
And then we now have a connection, not between a gene and a disease, but between a genetic
region and a disease.
And that distinction is not understood by most people, so I'm going to explain it a little
bit more.
Why do we not have a connection between a gene and a disease,
but we have a connection between a genetic region and a disease?
The reason for that is that 93% of genetic variants
that are associated with disease don't impact the protein at all.
So if you look at the human genome, there's 20,000 genes, there's 3.2 billion nucleotides.
Only 1.5% of the genome codes for proteins.
The other 98.5% does not code for proteins.
If you now look at where are the disease variants located?
93% of them fall in that outside the gene's portion.
Of course, genes are enriched,
but they're only enriched by a factor of three.
That means that still 93% of genetic variants
fall outside the proteins.
Why is that difficult?
Why is that a problem? The problem is that when a variant falls outside the proteins. Why is that difficult? Why is that a problem? The problem is
that when a variant falls outside the gene, you don't know what gene is
impacted by that variant. You can't just say, oh, it's near this gene. Let's just
connect that variant to the gene. And the reason for that is that the genome
circuitry is very often long range. So, you basically have that genetic variant
that could sit in the intron of one gene.
And an intron is sort of the place between the axons
that code for proteins.
So proteins are split up into axons and introns
and every axon code for a particular subset of amino acids
and together they're spliced together
and then make the final protein.
So that genetic variant might be sitting in an intron of a gene. It's transcribed with a gene, acids and together they're spliced together and then make the final protein.
So that genetic variant might be sitting in an intern of a gene.
It's transcribed with a gene, it's processed and then excised, but it might not impact
this gene at all.
It might actually impact another gene that's a million nucleotides away.
So it's just riding along even though it has nothing to do with this nearby neighborhood.
That's exactly right.
Let me give you an example. The strongest
genetic association with obesity was discovered in this FTO gene, fat and obesity associated gene.
So this FTO gene was studied ad nauseam. People did tons of experiments on it. They figured out that FTO is in fact RNA methylation transferase.
It basically impacts something that we know that we call the epitranscriptum. Just like the genome
can be modified, the transcriptum, the transcripts of the genes can be modified. And we basically said,
oh, great, that means that epitranscriptomics is hugely involved
in obesity because that gene FTO is clearly where the genetic
locus is at.
My group studied FTO in collaboration with a wonderful team
led by Melina Clathmancer.
And what we found is that this FTO locus, even though it is as associated with obesity,
does not implicate the FTO gene.
The genetic variant sits in the first intran of the FTO gene, but it controls two genes,
IRX3 and IRX5, that are sitting 1.2 million nucleotides away, several genes away.
Oh boy.
What am I supposed to feel about that?
Because it's not like super complicated then.
So the way that I was introduced at a conference a few years ago was, and here's Monolis
Kelly's, who wrote the most depressing paper of 2015.
And the reason for that is that the entire
pharmaceutical industry was so comfortable that there was a single gene in that
locust because in some loci you basically have three dozen genes that are all
sitting in the same region of association and you're like, gosh, which ones of
those is it? But even that question of which ones of those is it is making the
assumption that it is one of those as opposed to some random gene just far, far away, which is what our paper
showed.
So basically what our paper showed is that you can't ignore the circuitry.
You have to first figure out the circuitry, all of those long-range interactions, how
every genetic variant impacts the expression of every gene in every tissue imaginable across
hundreds of individuals.
And then you now have one of the building blocks, not even all of the building blocks,
for then going and understanding disease.
So, okay. So embrace the wholeness of the circuitry.
Correct. But what, so back to the question of starting knowing nothing to the
disease and go into the treatment. So what are the next steps? So you basically have to first figure out the
tissue and then describe how you figure out the tissue. You figure out the tissue by taking all of these non-coding
variants that are sitting outside proteins and then figuring out what are the epigenomic enrichments? And the reason for that, thankfully,
is that there is convergence, that the same processes
are impacted in different ways by different loci.
And that's a saving grace for our field,
the fact that if I look at hundreds
of genetic variants associated with Alzheimer's,
they localize in a small
number of processes.
Can you clarify why that's hopeful?
So they show up in the same exact way in the specific set of processes?
Yeah, so basically, there's a small number of biological processes that underlie, or
at least that play the biggest role in every disorder.
So in Alzheimer's, you basically have, you know, maybe 10 different types of processes.
One of them is lipid metabolism.
One of them is immune cell function.
One of them is neuronal energetics.
So these are just a small number of processes, but you have multiple lesions, multiple genetic
perturbations that are associated with those processes.
So, if you look at schizophrenia, it's excitatory neuron function.
It's inhibitory neuron function.
It's synaptic pruning.
It's calcium signaling and so on and so forth.
So, when you look at disease genetics, you have one hit here and one hit there and one hit there
and one hit there, completely different parts of the genome, but it turns out all of those hits are calcium signaling proteins.
Oh, cool.
You're like, aha, that means that calcium signaling is important.
So those people who are focusing on one delk is at a time cannot possibly see that picture.
You have to become a genomicist.
You have to look at the omics, the holistic picture to understand these enrichment.
But you mentioned the convergence thing.
Whatever the thing associated with the disease shows up.
So let me explain convergence.
Convergence is such a beautiful concept.
So you basically have these four genes that are converging on calcium signaling. So that basically means that they
are acting each in their own way, but together in the same process. But now in every one of these
low-sci, you have many enhancers controlling each of those genes. That's another type of convergence where this regulation of seven different enhancers
might all converge on this regulation of that one gene, which then converges on calcium
signaling.
And in each one of those enhancers, you might have multiple genetic variants distributed
across many different people.
Everyone has their own different mutation, but all of these
mutations are impacting that enhancer, and all of these enhancer are impacting that gene,
and all of these genes are impacting this pathway, and all of these pathways are acting
the same tissue, and all of these tissues are converging together on the same biological process
of schizophrenia. And you're saying the saving grace is that, that conversion seems to happen for a lot of these diseases.
For all of them, basically that, for every single disease that we've looked at, we have found an epigenomic enrichment.
How do you do that? You basically have all of the genetic variants associated with the disorder, and then you're asking for all of the enhancers active in a particular tissue. For 540 disorders, we've basically found that indeed there is an enrichment.
That basically means that there is commonality.
And from the commonality, we can just get insights.
So to explain in the mathematical terms, we're basically building an empirical prior.
We're using a Bayesian approach to basically say,
great, all of these variants are equally likely
in a particular locust to be important.
So in a genetic locust, you basically
have a dozen variants that are co-inherited,
because the way that inheritance works in the human genome
is through all of these recombination events during myosis. You basically have,
you know, you inherit maybe three, chromosome three, for example, in your inner body,
it's inherited from four different parts. One part comes from your dad, another part comes
from your mom, another part comes from your dad, another part comes from your mom. So basically,
the way that it, sorry, from your mom's mom.
So you basically have one copy that comes from your dad
and what copy that comes from your mom.
But that copy that you got from your mom
is a mixture of her maternal and her paternal chromosome.
And the copy that you got from your dad
is a mixture of his maternal and his paternal chromosome.
So these breakpoints that happen
when chromosomes are lining up, are lining up
are basically ensuring through these crossover events, they're ensuring that every
child cell during the process of myosis, where you basically have one spermatozoid that basically
couples with one ovule to basically create one egg to basically create the zygote.
You basically have half of your genome that comes from that and half your genome that comes from mom, but in order to light up, not line them up, you basically have this cross over events.
This cross over very events are basically leading to co inheritance of that entire block coming from the your maternal grandmother and
that entire block coming from your maternal grandfather. Over many generations
these crossover events don't happen randomly. There's a protein called PRDM9
that basically guides the double-stranded brakes and then leads to these
crossovers and that protein has a particular preference
to only a small number of hotspots of a combination which then leads to a small number of breaks
between these co-inheritance patterns. So even though there are 6 million variants,
there are 6 million low-sci, there are, you know, this variation is inherited in blocks and
Every one of these blocks has like two dozen genetic variants that are all associated
So in the case of FTO it wasn't just one variant it was 89 common variant that were all
humongously associated with obesity
Which ones of those is the important one?
Well, if you look at only one lock, you have no idea.
But if you look at many low-sci, you basically say, aha,
all of them are enriching in the same epigenomic map.
In that particular case, it was mesenchymal stem cells.
So these are the progenitor cells that give rise
to your brown fat and your white fat.
Progenitor is like the early on developmental stem.
So you start from ones I go and that's a total potent cell type.
It can do anything.
You then different, you know, that cell divides, divides, divides.
And then every cell division is leading to specialization, where you now have a mesodermal lineage,
an ectodermal lineage, an endodermal lineage that basically leads to different parts of your body.
The ectoderm will basically give rise to your skin, ectomines outside,
the derm is skin, so ectoderm, but it also gives rise to your neurons and your whole brain.
So that's a lot of ectoderm.
Mesoderm gives rise to your internal organs, including the vasculature and your muscle
and stuff like that.
So you basically have this progressive differentiation, and then if you look further, further down
that lineage, you basically have one lineage that will give rise to both your muscle and your bone, but
also your fat.
If you go further down the lineage of your fat, you basically have your white fat cells.
These are the cells that store energy.
So, when you eat a lot, but you don't exercise too much, there's an excess, a set of calories,
excess energy.
What you do with those, you basically create,
you spend a lot of that energy to create these high energy molecules, lipids, which you
can then burn when you need them on a rainy day.
So that leads to obesity if you don't exercise and if you overeat because your body is like,
oh, great, I have all these calories, I'm going to store them.
Oh, more calories, I have all these calories. I'm gonna store them. Oh more calories
I'm gonna store them too. Oh more calories and the you know 42% of European chromosomes
Have a predisposition to storing fat which was selected probably in the you know food scarcity periods
Like basically as we were exiting Africa before and during the
ice ages, there was probably a selection to those individuals who made it north to
basically be able to store energy, a lot more energy. So you basically now have
this lineage that is deciding whether you want to store energy in your white fat or burn energy in your
beige fat. It turns out that your fat is, you know, we have such a bad view of fat. Fat
is your best friend. Fat can both store all these excess lipids that would be otherwise
circulating through your, you know, body and causing damage, but it can also burn calories directly.
If you have too much of energy,
you can just choose to just burn some of that as heat.
So basically when you're cold,
you're burning energy to basically warm your body up
and you're burning all these lipids
and you're burning all these calories.
So what we basically found is that across the board,
genetic variants associated with obesity
across many of these regions were all enriched repeatedly in mesenchymal stem cell enhancers.
So that gave us a hint as to which of these genetic variants was likely driving this whole
association. And we ended up with this one genetic variant called RS1421085.
And that genetic variant out of the 89 was the one that we predicted to be causal for the disease.
So going back to those steps, first step is figure out the relevant tissue based on the global
enrichment. Second step is figure out the causal
variant among many variants in this linkage disequilibrium in this co-inherited block between
these recombination hotspots, these boundaries of these inherited blocks. That's the second
step. The third step is once you know that causal variant, try to figure out what is the motif
that is disrupted by that causal variant.
Basically, how does it act?
Variants don't just disrupt elements, they disrupt the binding of specific regulators.
So basically, the third step there was how do you find the motif that is responsible,
like the gene regulatory word, the building block of gene regulation, that is responsible for that desregulatory event.
And the fourth step is finding out what regulator normally binds that motif
and is now no longer able to bind.
And then once you have the regulator, can you then try to figure out how to
what, after it developed, how to fix it?
That's exactly right.
You now know how to intervene.
You have basically a regulator.
You have a gene that you can then perturb.
And you say, well, maybe that regulator
has a global role in obesity.
I can perturb the regulator.
Just to clarify, when we say perturb,
like on the scale of a human life,
can a human being be helped?
Of course. Of course.
Yeah.
So understanding is the first step.
Exactly.
No, but perturbed basically means you now develop therapeutics, pharmaceutical therapeutics
against that.
Or you develop other types of intervention that affect the expression of that gene.
What do pharmaceutical therapeutics look like when your understanding is on a genetic level.
Yeah, sorry if it's a dumb question. No, no, no, it's a brilliant question, but I want to save it for a
little bit later when we start talking about therapeutics. Perfect. We talked about the first four steps.
There's two more. So basically the first step is figure out, I mean, the zeroth step, the starting
point is the genetics. The first step after that is figure out the tissue of action.
The second step is figuring out the nucleotide
that is responsible or set of nucleotides.
The third step is figure out the motif
and the opturing in regulator number four.
Number five and six is what are the targets?
So number five is great.
Now I know the regulator, I know the motif,
I know the tissue and I know the motif, I know the tissue, and I know the variant.
What does it actually do? So you have to now trace it to the biological process
and the genes that mediate that biological process. So knowing all of this can now allow you to find
the target genes. How, by basically doing perturbation experiments, or by looking at the folding of
the epigenum, or by looking at the genetic impact of that genetic variant on the expression of
genes. And we use all three. So let me go through them. Basically, one of them is physical
links. This is the folding of the genome onto itself. How do you even figure out the folding?
It's a little bit of a tangent, but it's a super awesome technology.
Think of the genome as again this massive packaging that we talked about
of taking two meters worth of DNA and putting it in something that's a million times smaller than
two meters worth of DNA that's a single cell.
You basically have this massive packaging and this packaging basically leads to the
chromosome being wrapped around in sort of tight, tight ways in ways however that are functionally
capable of being reopened and reclosed. So I can then go in and figure out that folding
by sort of chopping up the spaghetti soup,
putting glue and ligating the segments that were chopped up
but nearby each other, and then sequencing
through these ligation events to figure out
that these regions of this chromosome,
that region of the chromosome, were near each other,
that means they were interacting, even though they were far away on the genome
itself. So that shopping up, sequencing, and re-gluing is basically giving you folds of
the genome that we call.
So that's our cue, backtrack. How does cutting it help you figure out which ones were close in the original folding.
So you have a bowl of noodles.
Go on.
And in that bowl of noodles, some noodles are near each other.
So throwing a bunch of glue, you basically freeze the noodles in place.
Throw in a cotter that chops up the noodles into little pieces.
Now throwing some legation enzyme that lets those pieces that were free, religate
near each other. In some cases, they're religate what you had just got, but that's very rare.
Most of the time, they will religate in whatever was proximal.
You now have glued the red noodle that was crossing the blue noodle to each other.
You then reverse the glue, the glue goes away, and you just sequence the heck out of it.
Most of the time you'll find a red segment with, you know, a red segment, but you can
specifically select for ligation events
that have happened that were not from the same segment
by sort of marking them in particular way.
And then selecting those, and then you sequence
and you look for red with blue matches of sort of things
that were glued that were not immediate
proximal to each other.
And that reveals the linking of the blue noodle
and the red noodle.
You're with me so far?
Yeah.
Good. So we've done this.
That's physical.
That's physical.
That's the step one of the physical.
And what the physical revealed is topologically associated domains, basically big blocks of the genome
that are topologically connected together.
That's the physical.
The second one is the genetic links.
It basically says, across individuals that have different genetic variants, how are their
genes expressed differently?
Remember before I was saying that the path between genetics and diseases is enormous, but
we can break it up to look at the path between genetics and gene expression.
So instead of using Alzheimer's as a phenotype,
I can now use expression of IRX3 as the phenotype,
expression of gene A.
And I can look at all of the humans who contain a G
at that location and all the humans
that contain a T at that location.
And basically say, wow, turns out
that the expression of the gene is higher
for the T humans than for the G humans at that location. So that basically gives me a genetic link between a genetic
variant, a locust, a region, and the expression of nearby genes. Good on the genetic link.
I think so. Awesome. So the third link is the activity link. What's an activity link? It basically says if I look across 833 different epigenomes, whenever this enhancer is active,
this gene is active.
That gives me an activity link between this region of the DNA and that gene.
And then the fourth one is perturbations where I can go in and blow up that region and see what are the genes that change in expression,
or I can go in and overactivate that region and see what genes change in expression.
So I guess that's similar to activity?
Yeah, so that's basically similar to activity.
I agree, but it's causal rather than correlational.
Again, I'm a little weird like that.
No, no, you're 100% on.
It's exactly the same.
But the perturbation, where I go in intervene. Yes, I basically take a
bunch of cells. So you know, CRISPR, right? CRISPR is this
genome guidance and cutting mechanism is what George George
likes to call genome vandalism. So you basically are able to
live. You can basically take a guide RNA that you put into the CRISPR
system and the CRISPR system will basically use these guide RNAs, can the genome find wherever
there's a match and then cut the genome.
So you know, I digress, but it's a bacterial immune defense system.
So basically bacteria are constantly attacked by viruses,
but sometimes they win against the viruses,
and they chop up these viruses and remember,
as a trophy inside their genome, they have this low side,
this CRISPR low side, that basically stands
for clustered, repeats, interspersed, et cetera.
So basically it's an interspersed repeats structure where basically you have a set of repetitive regions and
then interspersed where these variables segments that were basically
matching viruses. So when this was first discovered, it was basically
hypothesized that this is probably a bacteria immune system that remembers the
trophies of the viruses that manage the kill.
And then the bacteria pass on, you know, they sort of do lateral transfer of DNA and they
pass on these memories so that the next bacterium says, oh, you killed that guy, when that
guy shows up again, I will recognize him.
And the CRISPR system was basically evolved as a bacterial, adaptive immune response
to sense foreigners that should not be long
and to just go and cut their genome.
So it's an RNA-guided RNA cutting enzyme
or an RNA-guided DNA cutting enzyme.
So there's different systems,
some of them cut DNA, some of them cut RNA,
but all of them remember this sort of viral attack.
So what we have done now as a field is through the work of
Jennifer Donner, Manuel Carponeche, Feng Zhang, and many others
is co-opted that system of bacterial immune defense
as a way to cut genomes.
You basically have this guiding system that allows you to use an RNA guide
to bring enzymes to cut DNA at a particular lockers.
That's so fascinating. So this is already a natural mechanism, a natural tool for cutting
that was useful, in this particular context. We're like, what can you use that thing to actually,
it's a nice tool that's already in the body.
Yeah.
Yeah.
It's not in our body, it's in the bacteria body.
It was discovered by the yogurt industry.
They were trying to make better yogurts,
and they were trying to make their bacteria
in their yogurt cultures more resilient to viruses.
And they were studying bacteria, and they found
that, wow, this cryptosystem is awesome, it allows you to defend against that. And then
it was co-opted in mammalian systems that don't use anything like that as a targeting
way to basically bring these DNA cutting enzymes to any locus in the genome. Why would
you want to cut DNA to do anything? The reason is that
our DNA has a DNA repair mechanism where if a region of the genome gets randomly cut,
you will basically scan the genome for anything that matches and sort of use it by homology.
So the reason why we're deployed is because we now have a spare copy. As soon as my mom's copy is
deactivated, I can use my dad's copy. And somewhere else, if my dad's copy is deactivated, I can use my mom's
copy to repair it. So this is called homologous based repair. So all you have to do is the cutting
you don't have to do the fixing. That's exactly right. You don't have to do the fixing
because it's already built in. That's exactly right. But the fixing can be co-opted by throwing in a bunch of homologous
segments that instead of having your dad's version have whatever other version you'd like to use.
So the so you then control the fixing by throwing in a bunch of other stuff. Exactly right. And that's how you do genome editing. So that's what Chris Briss.
That's what CRISPR is. That's what CRISPR is. That's what CRISPR is. One in popular culture people use the term. I've
never, wow, that's brilliant. So CRISPR is an explanation. Genome vandalism followed
by a bunch of band aids that have the sequence that you'd like. And you can control the choices
of band aids. Correct. And of course, there's new generations of CRISPR. There's something
that's called prime editing that was sort of very very much in the press recently that basically instead of sort of making a double-stranded break, which again is genome vandalism, you basically make a single
stranded break, you basically just nick one of the two strands
enabling you to sort of peel off without sort of completely breaking it up and Then repair it locally using a guide that is coupled to your
initial RNA that took you to that location
dumb question, but is
CRISPR as awesome and cool as it sounds?
I mean technically speaking in terms of like as a tool for
manipulating our genetics in the positive meaning of the word manipulating, or is there downsides, drawbacks in this whole
context of therapeutics that we're talking about or understanding and stuff.
So, so, so, so, when I teach my students about CRISPR, I show them articles with the headline,
Genome Editing Tool revolutionizes biology.
And then I show them the date of these articles and they're 2004,
like five years before CRISPR was invented.
And the reason is that they're not talking about CRISPR.
They're talking about zinc finger enzymes.
There are another way to bring these coders to the genome.
It's a very difficult way of sort of designing the right set of zinc finger proteins,
the right set of amino acids that will now target a particular long stretch of DNA.
Because for every location that you want to target, you need to design a particular regulator,
a particular protein that will match that region. Well, there's another technology called tail-ins, which are basically, you know, just a different
way of using proteins to sort of, you know, guide these coders to a particular location
of the genome.
These require a massive team of engineers, of biological engineers, to basically design
a set of amino acids that will target a particular
sequence of your genome. The reason why CRISPR is amazingly awesomely revolutionary is because
instead of having the steam of engineers design a new set of proteins for every locker that you
want to target, you just type it in your computer and you just synthesize an RNA guide. The beauty
of CRISPR is not the cutting, it's not the fixing.
All of that was there before.
It's the guiding.
And the only thing that changes that it makes the guiding easier by sort of, you know, just
typing in the RNA sequence, which then allows the system to sort of scan the DNA to find
that.
So the coding, the engineering of the cutter is easier on the, in terms of
SV, that's kind of similar to the story of deep learning versus old school machine learning.
Yeah. Some of the challenging parts are automated. Okay, so, but CRISPR is just one cutting
technology. Exactly. And then there's, well, that's part of the challenges and exciting
opportunities of the field. It's is the design different cutting technologies.
So now, you know, this was a big parenthesis on CRISPR.
But now, you know, when we were talking about perturbations, you basically now have the ability to not just look at correlation between enhancers and genes, but actually go in either destroy that enhancer and see if the gene
changes in expression, or you can use the CRISPR targeting system to bring in not vandalism
and cutting, but you can couple the CRISPR system with, and the CRISPR system is called
usually CRISPR cast 9 because cast 9 is the protein that will then come and cut.
But there's a version of that protein called
deadcast9, where the cutting part is deactivated.
So you basically use Dcast9, deadcast9,
to bring in an activator, or to bring in a repressor.
So you can now ask, is this enhancer changing that gene?
By taking this modified CRISPR,
which is already modified from the bacteria to be used in humans, that you can now modify the Cas9 to be dead Cas9,
and you can now further modify to bring in a regulator, and you can basically turn on or turn off that enhancer,
and then see what is the impact on that gene.
So these are the four ways of linking the locus
to the target gene.
And that's step number five.
Okay?
Step number five is find the target gene.
And step number six is what the heck does that gene do?
You basically now go and manipulate that gene
to basically see what are the processes that change.
And you can basically ask, well,
in this particular case, in the FTO locus,
we found mesenchymal stem cells that are the progenitors
of white fat and brown fat or beige fat.
We found the RS-1421085 nucleotide variant
as the causal variant.
We found this large enhancer, this master regulator. I like to call it OB1 for
OBCT1, like the strongest enhancer associated with it. And OB1 was kind of chubby as the
actor, I don't know if you remember him. So you basically are using this Jedi mind trick to
basically find out the location of the genome that is responsible, the enhancer that harbors it, the motif,
the upstream regulator, which is arid 5B for 80-rich interacting domain 5B,
that's a protein that sort of comes and binds normally.
That protein is normally a repressor.
It represses the super enhancer, this massive 12,000 nucleotide master regulatory control region, and it
turns off IRX3, which is a gene that's 600,000 nucleotides away, and IRX5, which is 1.2 million
nucleotides away.
So those are the effects of turning them off.
That's exactly the next question.
So step six is what do these genes actually do?
So we then ask what does IRX3 and IRX5 do?
The first thing we did is look
across individuals for individuals that had higher expression of RX3 or lower
expression RX3. And then we looked at the expression of all of the other genes
in the genome. And we looked for simply correlation. And we found that RX3 and RX5
were both correlated positively with lipid metabolism and negatively with mitochondrial biogenesis.
You're like, what the heck does that mean?
It doesn't sound related to obesity.
Not at all, superficially.
But lipid metabolism should, because lipids is these high energy molecules that basically
store fat. So, Iarxin and Ix-5 are
negatively correlated with lipid metabolism. So, that basically means that when they turn on,
lipid metabolism, positive, when they turn on, they turn on lipid metabolism. And they're
negatively correlated with mitochondrial biogenes. What do mitochondria do in this whole process? Again,
small parenthesis, what are mitochondria? Mitochondria are little organelles.
They are rows. They only are found in eukaryotes.
Euk means good, carrier means nucleus. So truly like a true nucleus. So eukaryotes have a nucleus.
Prokaryotes are before the nucleus. They don't have a nucleus. So eukaryotes have a nucleus. Prokaryotes are before the nucleus.
They don't have a nucleus. So eukaryotes have a nucleus, compartmentalization.
Eukaryotes have also organelles. Some eukaryotes have chloroplasts. These are the plants.
They photosynthesize. Some other eukaryotes, like us, have another type of organelle called mitochondria.
These arose from an ancient species that we engulfed. This is an endosymbiosis event.
Symbiosis, biomeans life, sim, means together. sim biodes are things that live together.
Endosymbiosis, endomines inside, so endosymbiosis means you live together holding the other one inside you.
So the pre-ukaryotes engulfed an organism that was very good at energy production
and that organism eventually shed most of its genome to now
have only 13 genes in the mitochondrial genome.
And those 13 genes are all involved in energy production, the electron transport chain.
So basically, electrons are these massive super energy rich molecules. We basically have these organelles
that produce energy. And when your muscle exercises, you basically multiply your mitochondria,
you basically use more and more mitochondria, and that's how you get beefed up.
You basically the muscles learn to generate more energy. So basically every single time your muscles will, you know,
overnight regenerate and sort of become stronger
and amplify their mitochondria and so on.
So what does mitochondria do?
The mitochondria use energy to sort of do any kind of task.
When you're thinking, you're using energy.
This energy comes from mitochondria.
Your neurons have mitochondria all over the place.
Basically this mitochondria can multiply as organelles and they can be spread along the body of
your muscle. Some of your muscle cells have actually multiple nuclei, they're pulling nucleated,
but they also have multiple mitochondria to basically deal with the fact that your muscle is enormous.
You can sort of span these super super long lengths and you need energy throughout the
length of your muscle.
So that's why you have mitochondria throughout the length,
and you also need transcription through the length,
so you have multiple nuclei as well.
So these two processes, lipids, store energy,
what do mitochondria do?
So there's a process known as thermogenesis.
Thermoheat, genesis generation.
Thermogenesis is-generation of heat.
Remember that bathtub with the in and out?
That's the equation that everybody's focused on.
So how much energy do you consume?
How much energy do you burn?
But in every thermodynamic system, there's three parts to the equation.
There's energy in, energy out, and energy lost. Any machine has
loss of energy. How do you lose energy? You emanate heat. So heat is energy loss. So there's
which is where the thermogenesis comes in. Thermogenesis is actually a regulatory process that
modulates the third component of the thermodynamic equation. You can basically control thermogenesis
explicitly. You can turn on and turn off thermogenesis. And that's where the mitochondria comes in
exactly. So irix dnrx5 turn out to be the master regulators of a process of thermogenesis versus lipogenesis generation of fat.
So iryxdenaric 5 in most people, burn heat, burn calories as heat.
So when you eat too much, just burn it, burn it off in your in your fat cells.
So with that bathtub, that's basically a sort of dissipation knob that most people are able to turn on. I am unable
to turn that on because I am a homozygous carrier for the mutation that changes a t into a
c in the RS-1421085 allele and lock is a snip. I have the risk allele twice from my
mom and from my dad. So I'm unable to thermogenize.
I'm unable to turn on thermogenesis through iris-strain iris-5,
because the regulator that normally binds here,
iris-5B, cannot longer buy, because it's an 80-rich,
interacting domain.
And as soon as it changes the T into a C,
it cannot longer bind, because it's not longer 80-rich.
But doesn't that mean that you're able to use the energy more efficiently?
You're not generating heat or is that so?
That means I can eat less and get around just fine.
Yes.
So that's a feature actually.
It's a feature in a food scarce environment.
Yeah.
But if we're all starving, I'm doing great.
If we all have access to massive amounts of food, I'm obese basically.
That's taken us to the entire process of then understanding that why mitochondria and the lipids are both
even though distant are somehow involved. Different sides of the same coin.
And you basically choose to store energy or you can choose to burn energy.
And that all of that is involved in the puzzle of obesity.
And that's what's fascinating, right?
Here we are in 2007,
discovering the strongest genetic association with obesity
and knowing nothing about how it works for almost 10 years.
For 10 years, everybody focused on this FTO gene.
And they were like, oh, it must have to do something
with RNA modification.
And it's like, no, it has nothing to do with the function of FTO.
It has everything to do with all of these other process.
And suddenly, the moment you solve that puzzle,
which is a multi-year effort, by the way,
and tremendous effort by Medellina and many, many others.
So this tremendous effort basically
led us to recognize this circuitry.
You went from having some 89 common variants associated in that region
of the DNA sitting on top of this gene to knowing the whole circuitry. When you know the circuitry,
you can now go crazy. You can now start intervening at every level. You can start intervening at the
iris 5B level. You can start intervening with CRISPR-Cas9 at the single SNP level. You can start intervening at the iris 5b level. You can start intervening with CRISPR-Cas9 at the single SNIP level.
You can start intervening at iris 3 and iris 5 directly there. You can start intervening at the thermogenesis level because you know the pathway.
You can start intervening at the differentiation level where the decision to make either white fat or beige fat, the energy burning beige fat,
is made developmentally in the first three days
of differentiation of your deposits.
So as they're differentiating, you basically
can choose to make fat burning machines or fat storing
machines.
And so that's how you populate your fat.
You basically can now go in from a pseudically and do all
of that.
And in our paper, we actually did all of that.
We went in and manipulated every single aspect.
At the nucleotide level, we use CRISPR-Cast9 genome editing
to basically take primary deeper sites
from risk and non-risk individuals
and show that by editing that one nucleotide
out of 3.2 billion nucleotides in the human genome,
you could then flip between an obese phenotype and a lean phenotype like a switch.
You can basically take my cells that are non-thermogenizing and just flipping to thermogenizing cells
but changing one nucleotide.
It's mind-boggling.
It's so inspiring that this puzzle could be solved in this way and it feels within reach
to then be able to crack the problem of some of these diseases.
What are, so 2007 you mentioned 2000, what are the technologies, the tools that came along that made this possible?
Like what, what are you excited about? Maybe if we just look at the buffet of things that you've kind of mentioned.
Is there, is there, what's involved, what should we be excited about, what are you excited about?
I love that question because there's so much ahead of us, there's so, so much.
There's, so, so basically solving that one lock is required massive amounts of knowledge that we have been building across the years,
through the epigenum, through the comparative genomics to find out the causal variant and the control or regulatory motif through the conserved circuitry. It required
knowing this regulatory genomic wiring, it required high C of the sort of topologically associated
domains to basically find this long range interaction. It required E-cutials of this sort of genetic
perturbation of these intermediate gene phenotypes.
It required all of the arsenal of tools that have been describing was put together for
one lockers.
And this was a massive team effort, huge investment in time, energy, money, effort, intellectual,
everything.
You're referring to, I'm sorry, this one paper.
Yeah, this one paper.
This one single paper.
This one single lock is, I like to say that this is a paper about one nucleotide in the
human genome, about one bit of information, c versus t in the human genome.
That's one bit of information and we have 3.2 billion nucleotides to go through.
So how do you do that systematically? I am so excited about the next phase of research
because the technologies that my group and many other groups have developed allows us to now do
this systematically, not just one, lock-ish at a time, but thousands of low-sci at a time.
So let me describe some of these technologies. The first one is automation and robotics.
So basically, you know, we talked about how you can take all of these molecules and see
which of these molecules are targeting each of these genes and what do they do.
So you can basically now screen through millions of molecules, through thousands and thousands
and thousands of plates, each of which has thousands and thousands and thousands of plates each of which has thousands and thousands and thousands of molecules
every single time testing
You know all of these genes and
asking which of these molecules perturb these genes. So that's technology number one automation and robotics
Technology number two is parallel readouts
so instead of perturbing one locus,
and then asking if I use CRISPR-Cas9 on this enhancer to basically use D-Cas9 to turn on or turn
off the enhancer, or if I use CRISPR-Cas9 on the SNIP to basically change that one SNIP at a time,
then what happens? But we have 120,000 diseases associated
SNPs that we want to test. We don't want to spend 120,000 years doing it.
So what do we do? We've basically developed this technology for massively parallel
reporter assays, MPRA. So in collaboration with Tarjan Mikkelsen, Eric Lander, I mean, Jason Duris Group has done
a lot of that.
So there's a lot of groups that basically have developed technologies for testing 10,000
genetic variants at a time.
How do you do that?
We talked about microarray technology.
The ability to synthesize these huge microarrays
that allow you to do all kinds of things
like measure gene expression by hybridization,
by measuring the genotype of a person,
by looking at hybridization with one version,
with a T versus the other version with a T,
with a C, and then figuring out that I am a risk carrier
for obesity based on these hybridization,
differential hybridization in my genome that says, oh, you seem to only have this allele or you seem to have that allele.
Microarrays can also be used to systematically synthesize small fragments of DNA.
So you can basically synthesize these 150 nucleotide long fragments across 450,000 spots
at a time. You can now take the result of that synthesis,
which basically works through all of these sort of layers
of adding one nucleotide at a time.
You can basically just type it into your computer
and order it.
And you can basically order
10,000 or 100,000 of these small DNA segments at a time.
And that's where awesome molecular biology comes in.
You can basically take all these segments,
have a common start and end barcode, or sort of like gator,
like you, just like pieces of a puzzle,
you can make the same end piece and the same start piece
for all of them.
And you can now use plasmids, which are these extra chromosomal, small DNA circular segments
that are basically inhabiting all our genomes.
We basically have plasmids floating around, bacteria use plasmids for transferring DNA and
that's where they put a lot of antibiotic resistance genes.
So they can easily transfer them from one bacterium to the other. So one bacterium evolves
a gene to be resistant to a particular antibiotic. It basically says to all its trans, hey,
here's that sort of DNA piece. We can now co-opt these plasmids into human cells.
We can basically make a human cell culture and add plasmids to that human cell culture
that contain the things that you want to test.
You now have this library of 450,000 elements.
You can insert them each into the common plasmid and then test them in millions of cells in
parallel.
And the common plasma
That's all the same before you add it exactly the rest of the plasma is the same. So it's it's called an epi
Zomal reporter assay
Episome means not inside the genome. It's sort of outside the chromosomes
So it's an episomeal assay that allows you to have a variable region where you basically test
10,000 different enhancers and you have a common region which you basically test 10,000 different enhancers, and you have
a common region which basically has the same reporter gene. You now can do some very cool
molecular biology. You can basically take the 450,000 elements that you've generated,
and you have a piece of the puzzle here, a piece of the puzzle here, which is identical,
so they're compatible with that placement. You can chop them up in the middle to separate a barcode reporter from the enhancer,
and in the middle put the same gene again using the same piece of the puzzle.
You now can have a barcode readout of what is the impact of 10,000 different versions of an enhancer on gene expression.
So we're not doing one experiment, we're doing 10,000 experiments.
And those 10,000 can be 5,000 of different low-sci and each of them in two versions, risk
or non-risk. I can now test tens of thousands.
It's a little hypothesis. Exactly.
And then you can do 10,000 and wait. You can test 10,000 hypotheses at once.
How hard is it to generate do 10,000 and wait, uh, you can test 10,000 hypotheses at once. How, how hard is it to generate those 10,000, uh, trivial, trivial, but it's
biology. No, no, generating the 10,000 is trivial because you basically add, it's,
it's by technology. You basically have these arrays that, that add one nucleotide at a time
at every spot. Oh, and so it's printing in it. So you's printing and so you're able to control.
Yeah.
Super costly.
Is it $10,000?
So this isn't millions.
$10,000 for $10,000 experiments?
Sounds like the right, you know.
I mean, so that's super, that's exciting because you don't have to do one thing at a time.
You can now use that technology, these massively parallel reporter assays, to test 10,000
locations at a time.
We've made multiple modifications to that technology.
One was Sharper MPRA, which stands for basically
getting a higher resolution view by tiling these elements. So you can see where along the region of control are they
acting. And we made another modification called Hydra for high, you know, definition,
regulatory notation or something like that, which basically allows you to test 7 million of these at a time by
sort of cutting them directly from the DNA. So instead of synthesizing, which basically
has the limit of 450,000 that you can synthesize at a time, we basically said, hey, if we want
to test all accessible regions of the genome, let's just do an experiment that cuts accessible
regions. Let's take those accessible regions, put them all with the same end joins
of the puzzles, and then now use those to create a much, much, much larger array of things that
you can test, and then tiling all of these regions, you can then pinpoint what are the driver
nucleotides, what are the elements, how are they acting across 7 million experiments at a time.
So basically, this is all the same family of technology
where you're basically using these parallel readouts
of the barcodes.
And then, to do this, we used a technology called Starseek
for self-transcribing report races,
a technology developed by Alex Stark,
my former postdoc, who is now API over in Vienna.
So we basically coupled the star sig, the self-transcribing reporters, where the enhancer
can be part of the gene itself.
So instead of having a separate barcode, that enhancer basically acts to turn on the
gene and is transcribed as part of the gene.
You're not the two separate parts exactly.
So you can just read them directly.
So there's constant improvements in this whole process.
By the way, generating all these options
is a basically brute force.
How much human intuition is?
Oh gosh, of course it's human intuition
and human creativity and incorporating
all of the input data sets.
Because again, the genome is enormous.
3.2 billion, you don't want to test that.
Instead, you basically use all of these tools
that I've talked about already.
You generate your top favorite 10,000 hypotheses,
and then you go and test all 10,000.
And then from what comes out,
you can then go to the next step.
So that's technology number two.
So technology number one is robotics, automation,
where you have thousands of wells,
and you constantly test them.
The second technology is, instead of having wells,
you have these massively parallel readouts
in sort of these pooled assays.
The third technology is coupling
crisp perpetrabations with these single cell RNA readouts.
So let me make another parenthesis here with these single cell RNA readouts.
So let me make another parenthesis here to describe now single cell RNA sequencing.
Okay, so what does single cell RNA sequencing mean?
So RNA sequencing has been traditionally used,
oh well, traditionally the last 20 years,
ever since the advent of next generation sequencing.
So basically before RNA expression profiling
was based on this microarrays.
The next technology after that was based on sequencing.
So you chop up your RNA and you just sequence
small molecules, just like you would sequence a genome,
you basically reverse transcribe these small RNAs into DNA
and you sequence that DNA in order to get the
number of sequencing reads corresponding to the expression level of every gene in the genome.
You now have RNA sequencing. How do you go to single cell RNA sequencing? That technology also
went through stages of evolution. The first was microfluidics. You basically had these, or even even chambers.
You basically had these ways of isolating individual cells, putting them into a well for every
one of these cells. So you have 384 well plates, and you now do 384 parallel reactions to measure
the expression of 384 cells. That sounds amazing, and it was amazing. But we want to do a million cells. How do you go from
these wells to a million cells? You can't. So what the next technology was after that is instead
of using a well for every reaction, you now use a lipid droplet for every reaction. So you use micro droplets as reaction chambers
to basically amplify RNA.
So here's the idea.
You basically have microfluidics
where you basically have every single cell coming down one tube
in your microfluidics.
And you have little bubbles getting created in the other way
with specific primers that mark every cell with its own barcode. You basically
couple the two and you end up with little bubbles that have a cell and tons of
markers for that cell. You now mark up all of the RNA for that one cell with the
same exact barcode and you then lice all of the droplets and you sequence the
heck out of that and you have have, for every RNA molecule, a unique identifier that tells you what cell was
it on.
That is such good engineering, microfluidics, and using some kind of primer to put a label
on the thing.
I mean, you're making it sound easy.
I assume it's a beautiful right?
It's beautiful, right?
But it's gorgeous, yeah.
So there's the next generation.
That's great. So there's the next generation.
That's the second generation.
Next generation is, forget the microfluidics altogether.
Just use big bottles.
How can you possibly do that with big bottles?
So here's the idea.
You dissociate all of your cells or all of your nuclei
from complex cells like brain cells that are very long
and sticky, so you can't do that.
So if you have blood cells or if you have neuronal nuclei, or brain nuclei,
you can basically dissociate, let's say, a million cells.
You now want to add a unique barcode, a unique barcode,
in each one of a million cells, using only big bottles.
How can possibly do that? Sounds crazy, but here's the idea.
You use a hundred of these bottles.
You randomly shuffle all your million cells and you throw them into the 100 bottles, randomly, completely
randomly. You add one barcode out of a hundred to every one of the cells. You then, you
not take them all out, you shuffle them again, and you throw them again into the same 100
bottles. But now, in a different randomization, and you add a second barcode. So, and you throw them again into the same hundred bottles. But now in a different
randomization, and you add a second barcode. So every cell now has two barcodes. You take
them out again, you shuffle them, and you throw them back in. Another third barcode is adding
randomly from the same hundred barcodes. You've now labeled every cell,
probabilistically,
based on the unique path that it took
of which of 100 bottles did it go for the first time,
which of 100 bottles the second time
and which of 100 bottles the third time.
100 times 100 times 100 is a million unique bar codes
in every single one of these cells,
without ever using microflux.
Very clever.
That's beautiful, right?
Computer science perspective.
That's very clever.
Yeah.
So you now have the single cell
secundic technology.
Yes.
You can use the wells,
you can use the bubbles,
or you can use the bottles.
And you know, sort of,
you have ways.
The bubbles still sound pretty damn good.
The bubbles are awesome.
And that's basically the main technology that we're using.
Okay.
So the bubbles is the main technology.
So, so there are kids now that companies just sell to
basically carry out single-celler any sequencing that you know you can
basically for two thousand dollars you can basically get ten thousand
sales from one sample. And for every one of those sales you basically have the
transcription of thousands of genes.
And of course, the data for anyone's cell is noisy, but being computer scientists, we
can aggregate the data from all of the cells together across thousands of individuals
together to basically make very robust inferences.
Okay.
So the third technology, basically single cell RNA sequencing that allows you to now start asking not just what
is the brain expression level difference of that genetic variant, but what is the expression
difference of that one genetic variant across every single subtype of brain cell?
How is the variance changing?
You can't just, you know, with a brain sample, you can just ask about the mean. What is the average expression?
If I instead have 3,000 cells that are neurons, I can ask not just what is the neuronal expression,
I can say for layer 5 excitatory neurons of which I have, I don't know, 300 cells, what
is the variance that this genetic variant has?
So suddenly, it's amazingly more powerful.
I can basically start asking about this middle layer of gene expression
at unprecedented levels.
And when you look at the average,
it washes out some potentially important signal
that corresponds to ultimately the disease.
Completely.
Yeah.
So that, I can do that at the RNA level,
but I can also do that at the DNA level for the
epigenum.
So remember how before I was all telling about all these technologies that we're using
to probe the epigenum, one of them is DNA accessibility.
So what we're doing in my lab is that from the same dissociation of say a brain sample
where you now have all these tens of thousands of cells floating around, you basically take
half of them to do RNA
profiling, and the other have to do epigenome profiling, both at a single cell level.
So that allows you to now figure out what are the millions of DNA enhancers that are accessible
in every one of tens of thousands of cells.
And computationally we can now take the RNA and the DNA readouts and group them together to basically figure out how is every
enhancer related to every gene.
And remember these enhancer gene linking that we were doing across
833 samples?
833 is awesome, don't get me wrong, but 10 million is way more awesome.
So we can now look at correlated activity across
2.3 million enhancers and 20,000 genes in each of millions of cells to basically start
piecing together the regulatory circuitry of every single type of neuron, every single
type of astrocytes, oligotentersides, microglial cell, inside the brains of 1,500 individuals that we sample across multiple
different brain regions, across both DNA and RNA.
So that's the data set that my team generated last year alone.
So in one year, we've basically generated 10 million cells from human brain across
a dozen different disorders. Across ketophrenia, Alzheimer's, front
otempro dementia, Louis body dementia, ALS, you know, Huntington's disease, posttraumatic
stress disorder, autism, like, you know, bipolar disorder, healthy aging, et cetera.
So it's possible that even just within that data set lie a lot of keys to
understanding these diseases and then be able to like directly leads to the entreatment.
Correct. Correct. So basically we are now...
Motivating.
Yeah, so our computational team is in heaven right now and we're looking for people.
I mean, if you have...
This is a very
interesting kind of side question. How much of this is biology? How much of
this is computation? So you have the computational biology group, but
how much of? I should, should you be comfortable with biology to be able
to solve some of these problems? If you just find, if you put several
of the hats you were on, fundamentally, are you thinking like a computer scientist here?
You have to. This is the only way. As I said, we are the descendants of the first digital computer.
We're trying to understand the digital computer. We're trying to understand the circuitry, the logic
of this digital core computer and
all of these analog layers surrounding it. So, you know, the case that I've been making
is that you cannot think one gene at a time. The traditional biology is dead. There's
no way you cannot solve disease with traditional biology. You need it as a component. Once you
figure that, RX3 and RX5, you
now can then say, hey, have you guys worked on those genes with your single gene approach?
We'd love to know everything you know. And if you haven't, we now know how important
these genes are. Let's now launch a single gene program to dissect them and understand
them. But you cannot use that as a way to dissect disease. You have to think genomically. You
have to think from the global perspective
and you have to build these circuits systematically.
So we need numbers of computer scientists
who are interested in willing to dive into these data,
you know, fully, fully in.
And sort of extract meaning.
We need computer science people
who can understand sort of machine learning and
inference and sort of, you know, decouple these matrices come up with super smart ways of
sort of dissecting them.
But we also need by all computer scientists who understand biology, who are able to design
the next generation of experiments.
Because many of these experiments, no one in the right mind would design them without
thinking of the analytical approach that you would use to deconvolve the data afterwards.
Because it's massive amounts of ridiculously noisy data.
And if you don't have the computational pipeline in your head before you even design the experiment, you would never design the experiment that way.
That's brilliant. So in designing the experiment, you have to see the entirety of the computational
pipeline.
That drives the design.
That even drives the necessity for that design.
Basically, if you didn't have a computer scientist way of thinking, you would never design
these hugely combinatorial, massively parallel experiments.
So that's why you need interdisciplinary teams, you need teams, and
I want to clarify that what do we mean by computational biology group? The focus is not
on computational, the focus is on the biology. So we are a biology group, what type of biology?
Computational biology. That's the type of biology that uses the whole genome. That's the type
of biology that designs experiments, genomic experiments,
that can only be interpreted in the context of the whole genome.
Right, so it's philosophically looking at biology as a computer.
Correct.
Correct.
So, which is in the context of the history of biology is a big transformation.
Yeah, you can think of the name as what do we do, only computation, that's not true,
but how do we study it?
Only computationally, that is true.
So all of these single cell sequencing can now be coupled with the technology that we
talked about earlier for perturbation.
So here's a crazy thing.
Instead of using these wells and these robotic systems for doing one drug at a time,
or for perturbing one gene at a time in thousands of wells,
you can now do this using a pool of cells and single cell RNA sequencing.
How? You basically can take these perturbations using CRISPR,
and instead of using a single guide RNA, you can use a library of guide RNAs generated exactly the same way using this array technology.
So you synthesize a thousand different guide RNAs.
You now take each of these guide RNAs and you insert them in a pool of cells
where every cell gets one perturbation and you use CRISPR editing or CRISPR, so with
either CRISPR cast 9 to edit the genome with these thousand perturbations or with the activation
or with the repression and you now can have a single cell readout where every single cell
has received one of these modifications and and you can now, in massively parallel ways,
couple the perturbation and the readout in a single experiment.
How you're tracking which perturbations you self received.
So there's ways of doing that,
but basically one way is to make that perturbation
an expressable vector so that part of your RNA reading is actually
that perturbation itself.
So you can basically put it in an expressable part, so you can self-try it.
So the point that I want to get across is that the sky is the limit.
You basically have these tools, these building blocks of molecular biology.
You have these massive data sets of computational biology, you
have this huge ability to use machine learning and statistical methods and linear algebra
to reduce the dimensionality of all these massive data sets, and then you end up with a series
of actionable targets that you can then couple with pharma and just go after systematically.
So the ability to sort of bring genetics to the epigenomics, to the transcriptomics,
to the cellular readouts using these sort of high throughput perturbation technologies that I'm talking about.
And ultimately to the organism through the electronic health record endophenotypes,
and ultimately the disease, battery of assays,
at the cognitive level, at the physiological level,
and every other level,
there is no better or more exciting field,
in my view, to be a computer scientist,
or to be a scientist in a period.
Basically, this confluence of technologies, of computation, of data, of insights,
and of tools for manipulation is unprecedented in human history.
And I think this is what's shaping the next century, to really be a transformative century
for our species and for our planet.
of the 20th century for our species and for our planet. So you think the 21st century will be remembered for the big leaps and in understanding and
alleviation of biology?
If you look at the path between discovery and therapeutics, it's been on the order of
50 years, it's been shortened to 40, 30, 20 and now it's on the order of 10 years.
But the huge number of technologies that are going on right now for discovery
will result undoubtedly in the most dramatic manipulation of human biology
that we've ever seen in the history of humanity in the next few years.
Do you think we may be able to cure some of the disease that we started this conversation with?
Absolutely, absolutely.
It's only a matter of time.
Basically, the complexity is enormous,
and I don't want to underestimate the complexity,
but the number of insights is unprecedented,
and the ability to manipulate is unprecedented,
and the ability to deliver these small molecules
and other non-traditional medicine perturbations.
There's a lot of sort of new generation of perturbations that you can use at the DNA
level, at the RNA level, at the microRNA level, at the epigenomic level.
There's a battery of new generations of perturbations.
If you couple that with cell type identifiers that can basically
sense when you are in the right cell based on the specific combination and then turn on that
intervention for that cell, you can now think of combinatorial interventions where you can basically
sort of feed a synthetic biology construct to someone that will basically do different things
in different cells. So basically for cancer, this is one of the therapeutics
that our collaborator Ron Weiss is using
to basically start sort of engineering these circuits
that will use microRNA sensors of the environment
to sort of know if you're in a tumor cell
or if you're in a immune cell or if you're in a stromal cell
and so forth, and basically turn on particular interventions there.
You can sort of create constructs that are tuned
to only the sales, or only
the hard sales, or only the, you know, brain sales, and then have these new generations of
therapeutics coupled with this immense amount of knowledge on the sort of which targets
to choose and what biological processes to measure and how to intervene. My view is that disease is going to be
fundamentally altered and alleviated as we go forward. Next time we talk, we'll talk about the
philosophical implications that need effect of life, but let's stick to biology for just a
little longer. We did pretty good today. We stuck to the science. What are you excited in terms of the future of this field,
the technologies in your own group, in your mind,
you're leading the world at MIT and the science
and the engineering of this work?
So what are you excited about here?
I could not be more excited.
We are one of many, many teams who are working on this.
In my team, the most exciting parts are, you know, many fold.
So basically, we've now assembled these battery of technologies.
We've assembled these massive, massive data sets.
And now we're really sort of in the stage of our team's path
of generating disease insights.
So we are simultaneously working on a paper on schizophrenia right now that is basically
using the single cell profiling technologies, using this editing and manipulation technologies
to basically show how the master regulators underlying changes in the brain that are
sort of found in schizophrenia
are in fact that affecting excitatory neurons and inhibitory neurons in pathways that are
active both in synaptic pruning but also in early development. We've basically found
this set of four regulators that are connecting these two processes that were previously separate
in schizophrenia, in sort of having sort of more unified view across those two sides.
The second one is in the area of metabolism. We basically now have a beautiful collaboration
with a good ear lab that's basically looking at multi-tissue perturbations
in six or seven different tissues across the body, in the context of exercise,
and in the context of nutritional interventions,
using both mouths and human,
where we can basically see
what are the cell-to-cell communications
that are changing across them,
and what we're finding is this immense role of both immune cells,
as well as a depocyte stem cells,
in sort of reshaping that circuitry of all of both immune cells, as well as it adipocyte stem cells, in sort of reshaping
that circuitry of all of these different tissues, and that sort of painting to a new path for
their aputical intervention there.
In Alzheimer's, it's this huge focus on microglia, and now we're discovering different classes
of microglial cells that are basically either synaptic or immune.
And these are playing vastly different roles in Alzheimer's versus in schizophrenia.
And what we're finding is this immense complexity as you go further and further down
of how in fact there's 10 different types of microglia,
each with their own sort of expression programs.
We used to think of them as, oh yeah, they're microglia, each with their own expression programs. We used to think of them as, oh, yeah, they're microglia, but in fact, now we're realizing
just even in that sort of least abundant of cell types, there's this incredible diversity
there.
The differences between brain regions is another sort of major, major insight.
Again, one would think that, oh, ostraccytes are astrocytes, no matter where they are. But no, there's incredible region-specific differences in the expression patterns of all
of the major brain-sale types across different brain regions.
So basically, there's the neocortical region that are the recent innovation that makes us
so different from all other species.
There's the reptilian brain regions that are much more, sort of much more, you know, very extremely
distinct. There's the cerebellum. There's, um, each of those basically is associated in
a different way with disease. And what we're doing now is looking into pseudo temporal
models for how disease progresses across different regions of the brain. If you look at Alzheimer's,
it basically starts in this small region called the Enterinol cortex
and then it spreads through the brain
and through the hippocampus
and ultimately affecting the new cortex.
And with every brain region that it hits,
it basically has a different impact on the cognitive
and memory aspects, orientation,
short-term memory, long-term memory, et cetera,
which is dramatically affecting the cognitive path
that the individuals go through.
So what we're doing now is creating this computational
models for ordering the cells and the regions
and the individuals according to their ability
to predict Alzheimer's disease,
so we can have a cell level predictor of pathology that allows us to now create a temporal time course
that tells us when every gene turns on along this pathology progression,
and then trace that across regions and pathological measures that are region-specific,
but also cognitive measures and so and so forth.
So that allows us to now sort of for the first time look at can we actually do early intervention for Alzheimer's?
Where we know that the disease starts manifesting for 10 years before you actually have your first cognitive loss?
Can we start seeing that path to build new diagnostics, new prognostics, new biomarkers for this early intervention
in Alzheimer's?
The other aspect that we're looking at is mosaicism.
We talked about the common variants and the rare variants.
But in addition to those rare variants,
as your initial cell that forms the zygote,
divides and divides and divides, with every cell division, there are additional mutations
that are happening.
So what you end up with is your brain being a mosaic
of multiple different types of genetic underpinnings.
Some cells contain a mutation that other cells don't have.
So every human has the common variance
that all of us carry to some degree, the rare variants that you're in needed tree of the human species carries.
And then there's the somatic variant, which is the has been previously inaccessible to study in human
postmortem samples. But right now, with the advent of single-cellarine sequencing,
and this particular case, we're using the well-based sequencing, which is much more expensive,
but gives you a lot richer information about it, you know, the transcripts. So we're using now that
richer information to infer mutations that have happened in each of
the thousands of genes that are active in these cells, and then understand how the genome
relates to the function, this genotype phenotype relationship, that we usually build in GWAS
between genomic Association studies between
genetic variation and disease,
we're now building that at the cell level,
where for every cell we can relate
the unique specific genome of that cell,
with the expression patterns of that cell,
and the predicted function using these predictive models
that I mentioned before on
this regulation for cognition,
for pathology in Alzheimer's at the cell level.
And what we're finding is that the genes that are altered and the genetic regions that
are altered in common variants versus rare variants versus somatic variants are actually
very different from each other. The somatic variants are pointing to neuronal energetics
and oligodonusite functions that are not visible in the genetic
legions that you find for the common variants, probably because they have too strong of an effect
that evolution is just not tolerating them on the common side of the allele frequency spectrum.
So the somatic one, that's the variation that happens after the, the, the,
the zygo after correct your individual, I mean, this is a dumb question, but there's mutation and variation,
I guess that happens there.
And you're saying that through this,
if we focus in on individual cells,
we're able to detect the story that's interesting there.
And that might be a very unique kind of important variability
that arises for, you said neuronal or something that was energetic.
Energetics.
Energetics.
So the metabolism of humans is dramatically altered from that of nearby species.
You know, we talked about that last time that basically we are able to consume meat that
is incredibly energy rich and that allows us to sort of have functions that are, you know,
meeting this humongous brain that we have. Basically, on one hand, every one of our brain
cells is much more energy efficient than our neighbors, than our relatives. Number two,
we have way more of these cells. And number three, we have, you know, this new diet that
allows us to now feed all these needs. That basically creates
a massive amount of damage, oxidative damage from this huge super-powered factory of ideas
and thoughts that we carry in our skull. That factory has energetic needs and there's
a lot of biological processes underlying that, that we are finding are altered in the context of Alzheimer's disease.
As fascinating that so you have to consider all of these systems
if you want to understand even something like diseases that you would
maybe traditionally associate with just the particular cells of the brain.
Yeah. The immune system.
The metabolic system. Met metabolic system, metabolic system.
And these are all the things that makes us uniquely human. So our immune system is dramatically
different from that of our neighbors. Our societies are so much more clustered. The history
of infection that have plagued the human population is dramatically different from every
other species. The way that our society, our population, has exploded,
has basically put unique pressures on our immune system.
And our immune system has both coped with identity
and also been shaped by, as I mentioned, the fast amount
of death that has happened in the black plague
and other selective infants in human history,
famines, ice ages, and so on.
So that's number one on the sort of immune
side. On the metabolic side, again, we are able to sort of run marathons. I don't know if you
remember the human versus horse experiment where the horse actually tires out faster than the human
and the human actually wins. So on the metabolic side, we're dramatically different. On the immune
side, we're dramatically different. On the brain side, again, you know, no need to sort of, you know, it's a no brain or how our brain is like,
enormously more capable. And then, you know, in the side of cancer, so basically the cancers that
humans are having, the exposures, the environmental exposures, is again dramatically different. And the
lifespan, the expansion of human lifespan,
is unseen in any other species in recent evolutionary history.
And that now leads to a lot of new disorders
that are starting to manifest late in life.
So Alzheimer's is one example where basically these vast
energetic needs over a lifetime
of thinking can basically lead to all of these debris and eventually saturate the system
and lead to Alzheimer's in the late life.
But there's such a dramatic set of frontiers when it comes to aging research that, you know, will...
So what I often like to say is that if you want to re...
to engineer a car to go from 70 miles an hour to 120 miles an hour, that's fine.
You can basically, you know, fix a few components.
If you want it to now go at 400 miles an hour, you have to completely redesign the entire car.
Because the system is just not evolved to go that far. Basically our human body
has only evolved to live to 120 maybe we can get to 150 with minor changes. But if as we start
pushing these frontiers for not just living but well living the f-zine that we talked about last
time. So to basically push f-zine into the 80s and 90s and
100s and you know much further than that, we will face new challenges that have you know never been
faced before in terms of cancer, the number of divisions in terms of Alzheimer's and brain related
disorders in terms of metabolic disorder, in terms of regeneration, there's just so many different frontiers ahead of us.
So I am thrilled about where we're heading.
So basically I see this confluence in my lab and many other labs of AI, of the next frontier
of AI for drug design.
So basically these sort of graph neural networks on specific chemical designs that allow you to create new generations
of therapeutics.
These molecular biology tricks for intervening at the system at every level.
These personalized medicine, prediction, diagnosis, and prognosis using the electronic health records and using these
polygenic risk scores weighted by the burden the number of mutations that are
accumulating across common rare and somatic variants. The burden converging
across all of these different molecular pathways. The delivery of specific
drugs and specific interventions into specific cell
types.
And again, you've talked with Bob Langer about this.
There's many giants in that field.
And then the last concept is not intervening at a single gene level.
I want you to sort of conceptualize the concept of an on target side effect.
What is an on target side effect?
An off target side effect is when you design a molecule
to target one gene, and instead it targets another gene,
and you have side effect because of that.
And on target side effect is when your molecule does exactly
what you were expecting, but that gene is plyotropic.
Plyo means many, tropos means ways, many ways.
It acts in many ways.
It's a multifunctional gene. So you find
that this gene plays a role in this, but as we talked about the wiring of genes to phenotypes is
extremely dense and extremely complex. So the next stage of intervention will be intervening
not at the gene level, but at the network level. Intervening at the set of pathways and the set of genes with multi-input perturbations
to the system, multi-input modulations, pharmaceutical or other interventional.
And I basically allow you to now work at the sort of full level of understanding, not just
in your brain, but across your body, not just in one gene, but across the set of pathways
and so on and so forth for every one of these disorders.
So I think that we're finally at a level of systems medicine, of basically instead of sort
of medicine being at a single gene level, medicine being at a systems level, where you can
be personalized based on a specific set of genetic markers and genetic perturbations that
you are either born with or that you have developed during your lifetime.
Your unique set of exposures, your unique set of biomarkers, and your unique set of current set extremely precisely in the specific pathways and
the specific combinations of genes that should be modulated to sort of bring you from the
disease state to the physiologically normal state, or even to physiologically improve state
through this combination of interventions.
So that's in my view the field where basically computer science comes together with artificial
intelligence statistics, all of these other tools, molecular biology technologies and biotechnology
and pharmaceutical technologies that are sort of revolutionary in the way of intervention.
And of course, this massive amount of molecular biology and data gathering and generation
and perturbation in massively parallel ways.
So there's no better way, there's no better way, there's no better, you know, time, there's no better place to be sort of, you know, looking at this whole confluence of ideas.
And I'm just so thrilled to be a small part of this amazing, enormous ecosystem.
It's exciting to imagine what the humans of 100, 200 years from now, what their life experience is like. Because these ideas seem to have
potential to transform the quality of life. That when they look back at us, they probably
wonder how we were put up with all the suffering in the world. Manila, it's a huge honor. Thank
you for spending this early Sunday morning with me. I deeply appreciate it. See you next time.
Sounds like a plan. Thank you, Lex.
Thanks for listening to this conversation with Manolis Kellis, and thank you to our sponsors.
SCM Rush, which is an SEO optimization tool, pessimist archive, which is one of my favorite history podcasts,
which is one of my favorite history podcasts, 8th Sleep, which is a self-cooling mattress with smart sensors and an app, and finally better help, which is an online therapy service. Please
check out the sponsors in the description to get a discount and to support this podcast.
If you enjoyed this thing, subscribe on YouTube, review it with 5 stars and Apple podcasts,
follow on Spotify, support it on Patreon, or connect with me on Twitter at Lex Friedman. And now let me
leave you some words from Haruki Murakami. Human beings are ultimately
nothing but carriers, passageways for genes. They ride us into the ground like
race horses from generation to generation.
Jeans don't think about what constitutes good or evil.
They don't care whether we're happy or unhappy.
Or just means to an end for them.
The only thing they think about is what is most efficient for them.
Thank you for listening and hope to see you next time.
you