Lex Fridman Podcast - #90 – Dmitry Korkin: Computational Biology of Coronavirus
Episode Date: April 23, 2020Dmitry Korkin is a professor of bioinformatics and computational biology at Worcester Polytechnic Institute, where he specializes in bioinformatics of complex disease, computational genomics, systems ...biology, and biomedical data analytics. I came across Dmitry's work when in February his group used the viral genome of the COVID-19 to reconstruct the 3D structure of its major viral proteins and their interactions with human proteins, in effect creating a structural genomics map of the coronavirus and making this data open and available to researchers everywhere. We talked about the biology of COVID-19, SARS, and viruses in general, and how computational methods can help us understand their structure and function in order to develop antiviral drugs and vaccines. Support this podcast by signing up with these sponsors: - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Dmitry's Website: http://korkinlab.org/ Dmitry's Twitter: https://twitter.com/dmkorkin Dmitry's Paper that we discuss: https://bit.ly/3eKghEM This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts, follow on Spotify, or support it on Patreon. Here's the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time. OUTLINE: 00:00 - Introduction 02:33 - Viruses are terrifying and fascinating 06:02 - How hard is it to engineer a virus? 10:48 - What makes a virus contagious? 29:52 - Figuring out the function of a protein 53:27 - Functional regions of viral proteins 1:19:09 - Biology of a coronavirus treatment 1:34:46 - Is a virus alive? 1:37:05 - Epidemiological modeling 1:55:27 - Russia 2:02:31 - Science bobbleheads 2:06:31 - Meaning of life
Transcript
Discussion (0)
The following is a conversation with Dmitri Korkin.
He's a professor of bioinformatics and computational biology at WPI, Worcester Polytechnic Institute,
where he specializes in bioinformatics of complex diseases, computational genomics,
systems biology, and biomedical data analytics.
I came across Dmitri's work when in February, his group used the viral genome of the COVID-19 to reconstruct the 3D structure of its major viral proteins and their interaction with the human proteins.
In effect, creating a structural genomics map of SARS, and viruses in general, and how computational methods
can help us understand their structure and function in order to develop antiviral drugs and vaccines.
This conversation was recorded recently in the time of the coronavirus pandemic.
For everyone feeling the medical, psychological, and financial burden of the crisis,
I'm sending love to your way. Stay strong, We're in this together. We'll beat this thing.
This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, review it with five stars and Apple podcasts,
support it on Patreon, or simply connect with me on Twitter at Lex Friedman spelled F-R-I-D-M-A-N.
This show is presented by CashApp, the number one
finance app in the App Store, when you get it, use code Lex Podcast. CashApp lets you
send money to friends by bitcoin and invest in the stock market with as little as $1.
Since CashApp allows you to buy bitcoin, let me mention that cryptocurrency in the context
of the history of money is fascinating. I recommend a scent of money as a great book on this history. Debuts and credits on ledgers started around
30,000 years ago. The US dollar created over 200 years ago. And Bitcoin, the first decentralized
cryptocurrency, released just over 10 years ago. So given that history, cryptocurrency is still very much in its early days of development,
but is still aiming to and just might redefine the nature of money.
So again, if you get cash out from the App Store Google Play and use the code LX Podcast,
you get $10 and cash out will also donate $10 first, an organization that is helping
to advance robotics and STEM
education for young people around the world.
And now, here's my conversation with Dimitri Korkin. Define viruses, terrifying or fascinating.
When I think about viruses, I think about them,
I mean, I imagine them as those villains that do their work so perfectly well.
That is impossible not to be fascinated with them.
So what do you imagine when you think about O'Virus? Do you imagine the individual,
sort of these hundred nanometer particle things? or do you imagine the whole pandemic like society level the
When you say the efficiency of which they do their work do you think of viruses as
The millions that and that occupy a human body or a living organism
Society level like spreading as a pandemic. Or do you think of the individual little guy?
Yeah, I think this is a unique concept
that allows you to move from micro scale to the macro scale.
So the virus itself, I mean, it's not a living organism.
It's a machine, to me, it's a machine,
but it is perfected to the way that it essentially has
a limited number of functions it needs to do,
necessary some functions.
And essentially has enough information
just to do those functions, as well as the ability to modify itself.
So it's a machine, it's an intelligent machine.
So yeah, maybe on that point, you're in danger of reducing the power of this thing by calling
your machine, right?
But you now mention that it's also possibly intelligent.
It seems that there is these elements of brilliance
that a virus has, of intelligence,
of maximizing so many things about its behavior
into an insured survival and it's success.
So do you see it as intelligent?
So, I think it's a different, I understand it differently than I think about intelligence or intelligence of the artificial intelligence mechanisms.
I think the intelligence of a virus is in its simplicity.
The ability to do so much with so little material and information.
But also I think it's interesting, it keeps me wondering whether or not it's also an example of the basic swarm intelligence, where essentially the viruses act as the
hole and they're extremely efficient in that.
So what do you attribute the incredible simplicity and the efficiency to, is it the evolutionary
process?
So maybe another way to ask that, if you look at the next hundred years, are you more
worried about the natural pandemics or the engineered pandemics?
So how hard is it to build a virus?
Yes, it's a very, very interesting question because obviously there is a lot of conversations
about whether we are capable of engineering a, you know, anyone worth a virus.
I personally expect and am mostly concerned with the natural reoccurring viruses, simply because we keep seeing that.
We keep seeing new strains of influenza emerging, some of them becoming pandemic, we keep seeing new strains of coronaviruses emerging. This is a natural process and I think this is why it's so powerful.
papers about scientists trying to study the capacity of the modern, you know, biotechnology to alter the viruses. But I hope that, you know, it won't be our main concern in the near future.
What do you mean by hope?
Well, if you look back and look at the history of the most dangerous viruses, right?
So the first thing that comes into mind is smallpox.
So right now there is perhaps a handful of places where these strengths of this virus are stored.
So this is essentially the effort of the whole society to limit the access to those viruses.
You mean in a lab in a controlled environment in order to study?
And then smallpox is one of the viruses for which should be stated, there is a vaccine is developed. Yes, and it's you know, it's until 70s, it was perhaps the most dangerous thing that was there.
Is there a very different virus than influenza and coronaviruses it is it is different in several aspects biologically it's a so
called double stranded DNA virus but also in the way that it is much more
contagious so the R0 for so this is is the, what's R0?
R0 is essentially an average number as person infected by the virus can spread to other people.
So then the average number of people that he or she can spread it to.
And there is still some discussion about the estimates of the current virus.
The estimations vary between 1.5 and 3. In case of smallpox, it was 5 to 7.
And we're talking about the exponential growth.
So that's a very big difference.
It's not the most contagious one. Measles, for example, it's, I think, 15 and up.
So it's, you know, but it's definitely,
definitely more contagious that the seasonal flu,
than the current coronavirus or SARS for that may.
the current coronavirus or SARS for that matter.
What makes the virus more contagious? I'm sure there's a lot of variables that come into play,
but is it that whole discussion of aerosol
and the size of droplets, if it's airborne,
or is there some other stuff that's more biology centered?
I mean, there are a lot of components,
and there are a lot of components and there
are biological components that there are also, you know, social components. The ability of
the virus to, you know on the surfaces, to survive.
The ability of the virus to replicate fast, or so, you know, once it's in the cell, whatever
once it's inside the host.
And interestingly enough, something that I think we didn't pay that much attention to is the incubation period, where, you know,
hoarser symptomatic.
And now it turns out that another thing that we, one really needs to take into account,
the percentage of the symptomatic population, because those people still shed this virus and still
are contagious.
As far as the Iceland study, which I think is probably the most impressive size-wise,
shows 50% asymptomatic this virus. I also recently learned the swine flu is like just a number of people who got infected
was in the billions. There was some crazy number. It was like 20% of the population, 30% of the
population, something crazy like that. So the lucky thing there is the fatality rate is low.
But the fact that a virus can just take over an entire population so quickly, it's terrifying.
I think, I mean, this is, you know, that's perhaps my favorite example of a butterfly effect.
Because it's really, I mean, it's even tinier than a butterfly and look
at, you know, and with, you know, if you think about it, right? So it used to be in those
bad species. And perhaps because of, you know, a couple of small changes in the viral genome, it first had become capable of
jumping from bats to human, and then it became capable of jumping from human to
human. So this is this is I mean it's not even this size of a virus, it's a size of several atoms or says a few atoms.
And over sudden this change has such a major impact.
So is that a mutation like on a single virus?
So if we talk about the flap of a butterfly wing,
what's the first flap?
Well, I think this is the mutations that made this virus
capable of jumping from bad species to human.
Of course, the scientists are still trying to find,
I mean, they're even trying to find the,
who was the first infected, right?
The patient zero.
The first human, the first human infected, right?
I mean, the fact that there are coronaviruses, different strains of coronaviruses in various
bad species, I mean, we know that.
So we, you know, virologists absurd them, they studied them, they look at their
you know, genomic sequences, they are trying, of course, to understand what make these viruses
to jump from, from bats to human. There was, you know, similar to that in influence, that was I think a few years ago, there was this interesting story
where several groups of scientists
studying influence of virus essentially,
made experiments to show that this virus
can jump from one species to another,
you know, by changing, I think, just a couple of residues.
And, and, and of course, it was very controversial.
I think there was a moratorium on this study for a while,
but then the study was released. It was published.
So that, why was there a moratorium?
Because it shows through engineering it,
through modifying it, you can make it jump.
Yes.
Yes.
I personally think it is important to study this.
I mean, we should be informed.
We should try to understand as much as possible in order to prevent it.
But so then the engineering aspect there is, can't you then just start searching because there's so many strands of viruses out there?
Can't you just search for the ones in bats that are the deadliest, from the virologist perspective,
and then just try to engineer, try to see how to,
but see, that's a, there's a nice aspect to it.
The really nice thing about engineering viruses,
it has the same problems, nuclear weapons,
is it's hard for it to not only to mutual self-destruction so you can't control
a virus, it can't be used as a weapon, right? Yeah, that's why I you know in the beginning I said
you know I am hopeful because the definitely the definitely regulations to be needed to be introduced. And I mean, as the scientific society is,
we are in charge of making the right actions,
making the right decisions.
But I think we will benefit tremendously
by understanding the mechanisms by which the virus can jump, by which the virus
can become more dangerous to humans, because all this answers with, you know, eventually to designing better vaccines, hopefully universal vaccines,
right? And that would be a triumph of the science. So what's the universal vaccine? So is that something
that? Well, how universal is universal? Well, I mean, you know, so what's the dream, I guess, because you kind of mentioned the dream of this. I would be extremely happy if, you know, we designed the vaccine that is able, I mean, I'll
give you an example, right?
So, so every year we do a seasonal flu shot.
The reason we do it is because, you know, we are in the arms race, you know, our vaccines
are in the arms race with constantly changing virus.
Now, if the next pandemic, influenza pandemic will occur, most likely this vaccine would not save us.
right. Although it's, you know, it's the same virus might be different strain. So if we're able to essentially design a vaccine against, you know, influence a virus, no matter what's the strain, no
matter which species did it jump from, that would be, I think, that would be a huge, huge
progress in advancement.
You mentioned the smallpox until the 70s might have been something that you would be worried
the most about. What about these days? Well, we're sitting here in the middle of a COVID-19 pandemic, but these days, nevertheless,
what is your biggest worry virus-wise? What are you keeping your eye on?
It looks like, and, you know, based on the past several years of the new viruses emerging,
past several years of the new viruses emerging, I think we're still dealing with different types of influence.
I mean, so the 8, 7, and 9 avian flu that emerged, I think a couple of years ago in China, I think the mortality rate was incredible.
I mean, it was, you know, I think above 30%.
So this is this is huge.
I mean, luckily for us, this strain was not pandemic. So it was jumping from birds to human,
but I don't think it was actually transmittable between the humans.
And you know, this is actually a very interesting question,
which scientists tried to understand.
So the balance, the delicate balance between the virus being very contagious,
right, so efficient in spreading, and virus to be very pathogenic,
causing harms and deaths to the host. So it looks like that the more pathogenic the virus is, the less contagious it is. Is that a property, biology, or what is it?
I don't have an answer to that, and I think this is still an open question.
But, you know, if you look at, you know, with the coronavirus, for example, if you look at, you know, the deadlier relative nurse,
nurse was never a pandemic virus.
But the, you know, again, the mortality rate from nurse is far above, I think, 20 or 30%.
So whatever is making this all happen doesn't want us dead because it's balancing out nicely.
I mean, how do you explain that we're not dead yet?
Like, because there's so many viruses and they're so good at what they do.
Why do they keep us alive?
I mean, we also have, you know, a lot of protection, right?
So, we do the immune system.
And so, I mean, we do have, you know, ways to fight against those viruses.
And I think with the, now we're much better equipped, right?
So with the discoveries of vaccines, and there are vaccines
against the viruses that maybe 200 years ago would wipe us out completely. But because of
these vaccines, we are actually we are capable of eradicating pretty much fully
as is the case with smallpox. So if we could, can we go to the basics a little bit
of the biology of the virus? How does the virus infect the body?
So I think there are some key steps that the virus needs to perform.
And of course, the first one, the viral particle needs to get attached to the host cell.
In the case of coronavirus, there is a lot of evidence that it actually interacts in the
same way as the SARS coronavirus.
So it gets attached to AC2 human receptor.
And so there is, I mean, as we speak, there is a growing number of papers suggesting it. Moreover, most recent results suggest that
this virus attaches more efficiently to this human receptor than SARS.
Just to sort of back off, so there is a family of viruses, the coronaviruses and SARS whatever the heck forgot
Respiratory whatever that stands for so SARS actually stands for the disease that you get is the syndrome of acute respiratory
So SARS is the first strand and there's merrs
Merrs and the family And there is, yes.
But people, scientists actually know more than three strands.
I mean, so there is the MHV strain, which is considered
to be a canonical model, disease model in mice.
And so there is a lot of work done on this virus because it's
but it hasn't jumped to humans yet. No, no, it's interesting. Yes, fascinating. So,
and imagine AC2. So, when you say attach proteins are involved, yeah, on both sides. Yes, so we have this infamous spike protein on the surface of the virion particle,
and it does look like a spike. That's essentially because of this protein,
we call the coronavirus coronavirus, so that what makes corona on top of the surface.
So that what makes Corona on top of the surface. So this protein, it actually it acts,
so it doesn't act alone, it actually,
it makes a three copies,
and it makes so-called trimer.
So this trimer is essentially a functionally unit,
a single functional unit, that starts interacting with the AC2 receptor.
So this is again another protein that now sits on the surface of a human cell, a host cell, I would say. And that's essentially in that way the virus anchors itself to the host cell.
Because then it needs to actually, it needs to get inside, you know, it fuses its
membrane with the host membrane. It releases the key components, it releases its RNA, and then essentially hijacks
the machinery of the cell because none of the viruses that we know of have ribosome,
the machinery that allows us to print out proteins.
So in order to print out proteins that are necessary for functioning of this virus, it
actually needs to hijack the host ribosomes.
So virus is an RNA wrapped in a bunch of proteins, one of which is this functional mechanism
of a spike protein that does the attachment.
So, yeah, so if you look at this virus, there are several basic components,
right? So we start with the spike protein. This is not the only surface protein, the protein that
lives on the surface of the viral particle. There is also perhaps the
of the viral particle. There is also perhaps the protein with the highest number of copies is the membrane protein. So it's essentially, it forms the envelope of the protein, of particle and essentially helps to maintain a certain curvature, helps to make a certain curvature.
Then there is an ongoing research
what exactly does this protein do.
So these are sort of the three major surface proteins
that make the viral envelope.
And when we go inside, then we have
another structural protein
called nuclear protein.
And the purpose of this protein is to protect the viral RNA.
It actually binds to the viral RNA, creates a capsid.
And so the rest of the viral information
is inside of this RNA.
And if you compare the amount of the genes or proteins that
are made of these genes, it's significantly higher than of
influenza virus. For example, influenza virus has, I think,
around eight or nine
proteins where this one has at least 29. Wow. That has to do with the length of the RNA
strand. I mean, so it affects the length of the RNA strand, right? So because you
essentially need to have the minimum amount of information to encode those genes.
How many proteins did you say?
29.
29 proteins.
Yes.
So this is something definitely interesting because, believe it or not, we've been studying
coronaviruses for over two decades, we've yet to uncover all functionalities of its proteins.
Could we maybe take a small tangent
and can you, can you say how one would try to figure out
what a function of a particular protein is?
So you've mentioned people are still trying to figure out
what the function of the envelope
protein might be or what's the process.
So this is where the research that computational scientists do might be of help because in
the past several decades we actually have collected a pretty decent amount of knowledge about different
proteins in different viruses. So what we can actually try to do, and this could be our
first lead to a possible function, is to you know, say we have this genome of the
coronavirus, of the null coronavirus, and we identify the potential proteins.
Then in order to infer the function, what we can do, we can actually see whether those
proteins are similar to those ones that we already know.
In such a way, we can, for example, clearly identify some critical components that RNA polymerase
or different types of proteases, these are the proteins that essentially clip the protein
sequences. So this works in many cases. However, in some cases, you
have truly novel proteins. And this is a, and then a much more difficult task.
Now, as a small pause, when you say similar, like, what if some parts are different and some parts are similar?
Like how do you disentangle that?
You know, it's a big question.
Of course, what BINE-phematics does, it does predictions.
So those predictions, they have to be validated by experiments.
Functional or structural predictions? Balls. I mean, we do structural predictions, we do functional predictions, we do interaction
predictions. So this is interesting. So you just generate a lot of predictions, like reasonable
predictions, based on structural function, interaction, like you said. And then here you go.
That's the power of bioinformatics is data-grounded, good predictions
of what should happen. So, you know, in a way, I see it, we're helping experimental
scientists to streamline the discovery process. And the experimental scientists, is that what
a virologist is? So, yeah, virologist is one of the experimental sciences that focus on viruses.
They often work with other experimental scientists, for example, the molecular imaging scientists.
So the viruses often can be viewed and reconstructed through
electron microscopy techniques.
So, but these are, you know, specialists that are not
necessary byrologists, they work with small particles,
small, whether it's viruses or it's an organelle
over, you know, or a human cell, whether it's viruses or it's an organelle of a human cell, whether it's a complex
molecular machinery. So the techniques that are used are very similar in their essence.
So, yeah, so it's typically me and we see it now, the research that is emerging and that is needed often involves the collaborations between virologists, biochemists, people from pharmaceutical sciences, computational sciences, so we have
to work together.
So from my perspective, just step back.
Sometimes I look at this stuff, just how much we understand about RNA and DNA, how much we understand about protein,
like your work, the amount of proteins that you're exploring, is it surprising to you that
we were able, we descendants of apes were able to figure all of this out?
Like, how, so your computer scientists, so for me from computer science perspective, I know I'd write a Python program, things are clear, but biology is a giant mess.
It feels like to me from an outsider's perspective is how surprising is it amazing is it that we were able to figure this stuff out?
You know if you look at the, you know, how computational science and computer science
was evolving, right, I think it was just a matter of time that we would approach biology. So,
so we we started from, you know, applications to much more fundamental systems, physics, you know, and now we are, or, you know, small chemical compounds. Right. So now we are approaching the more complex biological systems.
And I think it's a natural evolution of, you know, of the computer science of mathematics.
So sure, that's the computer science side.
I just met even in higher levels.
So that to me is surprising that computer science
can offer help in this messy world.
But it just means it's incredible
that the biologists and the chemists
can figure all this out.
Or is that just some ridiculous to you
that of course they would.
It just seems like a very complicated set of problems.
Like the, the variety of the kinds of things that could be produced in the body, the, just,
just like you said, 29 approach.
I mean, just getting a hand of, uh, a hang of it so quickly, it just seems impossible to
me. I agree. I mean, I have to say, we are in the very, very beginning of this journey.
I mean, we've yet to comprehend, not even try to understand and figure out all the details,
but we've yet to comprehend the complexity of the cell.
We know that neuroscience is not even at the beginning of understanding the human mind.
So where's biology in terms of understanding the function,
deeply understanding the function of viruses and cells.
So sometimes it's easy to say when you talk about function,
what you really refer to is perhaps not a deep understanding,
but more of a understanding sufficient to be able to mess
with it using a antivir, like mess with it chemically
to prevent some of its function. Or do you
understand the function? Well, I think, deeply, I think we are much further in terms of understanding
of the complex genetic disorder, such as cancer, where you have layers of complexity. And we,
you know, as in my laboratory, we're trying to contribute to that research, but we're also,
you know, we're overwhelmed with how many different layers of complexity, different layers of
mechanisms that can be hijacked by cancer simultaneously. And so, you know, I think biology in the past 20 years, again from the perspective of the outsider, because
I'm not a biology, but I think it has advanced tremendously.
And one thing that were computational scientists and data scientists are now becoming very, very helpful is, it's coming from the
fact that we are now able to generate a lot of information about the cell, whether it's neck-generation sequencing or transcriptomics, whether it's life imaging information,
where it is complex interactions between proteins or between proteins and small molecules,
such as drugs, we are becoming very efficient in generating this information.
And now the next step is to become
equally efficient in processing this information
and extracting the key knowledge from that.
That could then be validated with experiment.
Yeah, back.
So maybe then going all the way back,
we're talking, you said, the first step
is seeing if you can match the new proteins you found in the virus against something we've
seen before to figure out as function. And then you also mentioned that, but there could be
case where it's a totally new protein. Is there something bioinformatics can offer when it's a totally new protein?
This is where many of the methods and you probably are aware of, you know, the case of machine learning, many of these methods rely on the previous knowledge, right? So things that where we
try to do from scratch are incredibly difficult,
something that we call Abonnesho.
And this is, I mean, it's not just the function,
I mean, we've yet to have a robust method
to predict the structures of these proteins in Abonnesho,
by not using any templates of other related proteins.
So, protein is a chain of amino acids, residues, yeah.
And then, somehow, magically, maybe you can tell me,
they seem to fold in incredibly weird and complicated 3D shapes.
Yes. So, and that's where actually the idea of protein folding or just not the idea, but the
problem of figuring out how the concept, the concept, how they fold into those weird shapes comes in.
So that's another side of computational work. So what can you
describe what protein folding from the computational side is and maybe your thoughts on the folding
at home efforts that a lot of people know that you can use your machine to do protein folding.
So yeah, protein folding is one of those 1 million dollar price challenges.
Right? So the reason for that is we've yet to understand precisely how the protein gets folded.
So efficiently, to the point that in many cases where you try to unfold it,
due to the high temperature, it actually folds back
into its original state.
So we know a lot about the mechanisms, right?
But putting those mechanisms together,
and making sense, it's computation with very expensive task.
In general, do proteins fold?
Can they fold in arbitrary large number of ways?
Is it usually fold in a very small number?
No, it's typically, I mean, we tend to think that there
is one sort of canonical fold for a protein.
Although there are many cases where the proteins,
upon this tapillization, it can be folded
into a different confirmation.
And this is especially true when you look at proteins
that include more than one structural unit.
So those structural units we call them protein domains.
Essentially protein domain is a single unit that typically is evolutionary preserved, that
typically carries out a single function and typically has a very distinct fault, the structure,
3D structure organization. But turns out that if you look at human, an average protein in a human cell would have
a bit of 2-3 subunits.
And how they are trying to fold into the sort of, you know,
next level, fold, right? So within subuniters folding, and then they
fold into the larger 3D structure, right? And all of that, there's some understanding of the
basic mechanisms, but not to put together to be able to fold it. Well, still, I mean, we're still struggling. I mean, we're getting pretty good about folding relatively
small proteins up to 100 residues.
I mean, but we're still far away from folding larger proteins.
And some of them are notoriously difficult,
for example, transmembrane proteins.
Proteins that sit in the membranes of the cell, they're incredibly important, but they
are incredibly difficult to solve.
And so basically, there's a lot of degrees of freedom, how it folds, and so it's a combinatorial
problem, or it just explodes.
There's so many dimensions.
Well, it is a combinatorial problem, but it doesn't mean that we cannot approach it from the
not from the brute force approach. And so the machine learning approaches have been emerged that try to tackle it.
So folding at home, I don't know how familiar you are with it, but is that used machine learning
or is it more brute force?
No, so folding at home, it was originally, I remember, I was a long time ago, I was a
postdoc and we learned about this game because it was originally designed as the game.
And I took a look at it, and it's interesting it to my son, but you know, kids are actually
getting very good at folding the proteins. And it was, you know, it came to me as the, not as a
surprise, but actually as the sort of manifest of, you know, our capacity to do this kind of, to solve this kind of problems.
When a paper was published in one of these top journals with the co-authors being the
actual players of this game.
So, and what happened is,
was that they managed to get better structures
than the scientists themselves.
So, that, you know, that was very,
I mean, it was kind of profound, that problems that are so challenging for a computational
science, maybe not that challenging for a human brain.
Well, that's a really good, that's a hopeful message always when there's the proof of existence, the existence proof that it's possible.
That's really interesting.
But it seems, what are the best ways to do protein folding now?
So if you look at what DeepMind does with alpha fold, alpha fold.
So that's a learning approach.
What's your sense? I mean, of, is that a learning approach? What's your sense of your background
on machine learning? Is this a learningable problem? Is this still a brute force? Are we in
the Gary Kasparov, the Blue days, or are we in the Alpha Go playing the game of Go days of folding?
Well, I think we are advancing towards this direction. I mean, if you look,
so there is a sort of Olympic game for protein folders called CASP. And it's essentially it's,
you know, it's a competition where different teams are given exactly the same
given exactly the same protein sequences and they try to predict their structures.
And of course there are different sort of subtasks, but in the recent competition, AlphaFault was among the top performing teams, if not the top performing team.
not data performing team. So there is definitely a benefit from the data
that have been generated in the past several decades,
the structural data.
And certainly, we are now at the capacity
to summarize this data, to generalize this data, and to use those principles in order
to predict protein structures.
That's one of the really cool things here is there's maybe you can comment on it.
There seems to be these open data sets of protein.
How do that?
The protein data bank?
The protein data bank? The protein data bank.
Is this a recent thing for just the coronavirus?
Or is it been for many, many years?
I believe the first protein data bank was designed on flashcards.
So, yes, this is a great example of the community efforts of everyone
contributing because every time you solve a protein or a protein complex, this is where
you submit it. And, you know, the scientists get access to it,
scientists get to test it,
and we, by and from addictions,
use this information to, you know, to make predictions.
So there's no culture like hoarding discoveries here.
So you've released a few or a bunch of proteins,
they were matching, we'll talk about details a little bit.
But it's kind of amazing that it's kind of amazing how open the culture here is.
It's kind of amazing how open the culture here is. It is.
And I think this pandemic actually demonstrated the ability of scientific community to solve
this challenge collaboratively.
And this is, I think, if anything, it actually moved us to a
brand new level of collaborations of the efficiency in which people establish new collaborations
in which people offer their help to each other. Scientists offer their help to each other.
And publishers also, it's very interesting. We're now trying to figure out
as a few journals that are trying to sort of do the very
Accelerate overview cycle, but so many preprints. So just posting a paper going out. I think it's fundamentally changing the
The way we think about papers. Yes. I mean the way we think about knowledge. Now let's say, yes, because yes,
I completely agree. I think now it's the knowledge is becoming sort of the core value, not the paper
or the journal where this knowledge is published. And I think this is again this we are living in the
in the times
where it
Becomes really crystallized that the idea that the most important value is in the knowledge
So maybe you can comment like what do you think the future of that knowledge sharing looks like? So you have this paper that will, I hope we get a
chance to talk about a little bit, but it has like a really nice abstract and
introduction related like it has all the usual, I mean, probably took a long time
to put together. So, but is that going to remain like you could have
communicated a lot of fundamental ideas here in much shorter amount?
That's less traditionally acceptable by the journal context. So
so well, you know, so the first version that we
Posted not even on a buyer kind because buyer archive back then it was essentially
overwhelmed with the number of submissions. So our submission, I think it took five
or six days to just for it to be screened and put online. So we, you know, essentially we put the first
preprient on our website and you know, it was, it started getting
accessed right away. So, and, you know, so this regional
preprient was in a much rougher shape than this paper. But we honestly tried to be as compact as possible
with introducing the information that is necessary
that to explain our results.
So maybe you can dive right in if it's okay.
Sure.
So there's a paper called Structural Genomics of SARS-CoV-2.
How do you even pronounce SARS-CoV-2?
COVID-2.
Yeah.
By the way, COVID is such a terrible name, but it's stuck.
Yes.
SARS-CoV-2 indicates evolutionary conserved functional regions of viral proteins.
So this is looking at all kinds of proteins that are part of
this novel coronavirus and how they match up against the previous other kinds of coronavirus.
I mean, there's a lot of beautiful figures. I was wondering if you could, I mean, there's so
many questions I could ask you, but maybe at the, how do you get started doing this paper?
So how do you start to figure out the 3D structure of a novel, novel virus?
Yes.
So there is actually a, a, a little story behind it.
And so the story actually dated back in September of 2019. And you probably remember that back then we had another dangerous
virus, triple E virus, Eastern, Queen, in civilitis virus. And can you maybe linger on it?
I have to admit, I was sadly completely unaware. So that was actually a virus outbreak
that happened in New England only.
The danger in this virus was that it actually
it targeted your brain.
So the word deaths from this virus,
it was, it was, you know,, trans, the main vector was mosquitoes. And obviously
full time is, you know, the time where you have a lot of them in New England. And, you
know, on one hand, people realize this is, this is, this is actually a very dangerous thing. So it had an impact on the local economy.
The schools were closed past six o'clock.
No activities outside for the kids,
because the kids were suffering quite tremendously
from when infected from this virus.
How do I know about this?
It was impacted.
It was in the news.
I mean, it was not impacted to a high degree in Boston, necessarily, but in the Metro
West area and actually spread around, I think, all the way to New Hampshire, Connecticut.
And you mentioned affecting the brain. That's one other comment we should make.
So you mentioned AC2 for the coronavirus.
So these viruses kind of attached to something in the body.
So it essentially attaches to these proteins in those cells, in the body, where those proteins
are expressed, where they actually have them in abundance.
So sometimes there could be in the lungs, there could be a brain, there could be in the
stomach.
So I think right now, from what I read, they have the epitelial cells inside, so the cells essentially inside
the cells that are covering the surface.
So inside the nasal surfaces, the throat, the lung cells, and I believe liver as a couple of other organs where they are
actually expressed in abundance.
That's for the AC2 receptors.
So back to the story.
Yes.
So now, the impact of this virus is significant.
However, it's a pre-local problem to the point that this is something that we would call
a neglected disease, because it's not big enough to make the drug design companies to design a new
antiviral or a new vaccine.
It's not big enough to generate a lot of grants
from the National Finding Agences.
So does it mean we cannot do anything about it? And so what I did is I thought
a bind-phimatics class and in Wustrupal Technique Institute and we are very much problem
learning institution. So I thought that that would be a perfect, you know, perfect project for the fun. I'm going case study. So I asked it, you know, so I essentially designed a study
where we tried to use bioinformatics to understand as much as possible about this virus.
And a very substantial portion of this study was to understand
the structures of the proteins, to understand how they interact
with each other and with the host proteins,
try to understand the evolution of this virus.
So, obviously, a very important question,
how, where it will evolve further, how it happened here.
So, we did all these projects, and now I'm trying to put them
into a paper where all these projects and now I'm trying to put them into a paper where all these undergraduate
students will be co-authors. But essentially the projects were finished right about mid-December.
And a couple of weeks later, I heard about this mysterious new virus that was discovered in, you know, was reported
in Wuhan province.
And immediately I thought that, well, we just did that.
Can't we do the same thing with this virus?
And so we started waiting for the genome to be released because that's essentially the
first piece of information that is critical.
Once you have the genome sequence, you can start doing a lot using bioinformatics.
When you say genome sequence that's referring to the sequence of letters that make up the
RNA, or whatever.
So the sequence that make up the entire information encoded in the protein, right? So that
includes all 29 genes. What are genes? What's the encoding of information? So
genes is essentially is a basic functional unit that we can consider. So each gene in the virus would correspond to a
protein. That so gene by itself doesn't do it function. It needs to be converted
or translated into a protein that will become the actual functional unit. Like you said, the printer.
So we need the printer for that.
We need the printer.
Okay, so the first step is to figure out that the genome,
the sequence of things that will be then used for printing the protein.
So, okay.
So then the next step, So once we have this, and so we use the existing information
about SARS, because the SARS genomics has been done
in abundance, so we have different strengths of SARS
and actually other related coronaviruses,
mirrors the bad coronaviruses, mirrors, the bad coronaviruses. And we started by identifying the potential
genes, because right now it's just a sequence, right? So it's a sequence that is roughly,
it's less than 30,000 nucleotide long. And it's a raw sequence. It's a raw sequence. No other information really.
And we now need to define the boundaries of the genes that would then be used to identify
the protein and protein structures.
How hard is that problem?
It's not, I mean, it's pretty straightforward.
So, you know, so, cause we use the existing information
about SARS proteins and SARS genes.
So once again, we are relying on the, yes.
So, and then once we get there,
this is where sort where the first more traditional
bindfamatic steps, the step begins.
We're trying to use this protein sequences and get
this 3D information about those proteins.
So, this is where we are relying heavily on
the structure information, specifically from the protein
data bank that we are talking about.
And here you're looking for similar proteins.
Yes.
So, the concept that we are operating when we do this kind
of modeling, it's called homology or template-based modeling.
So essentially using the concept that if you have two sequences that are similar in terms of the letters,
the structures of these sequences are expected to be similar as well.
And this is at the micro at the very local scale and at the scale of
the whole protein.
The whole protein. Right. So actually, so you know, so of course, the devil is in details
and this is why we need actually pre-sophisticated modeling tools to do so.
Once we get the structures of the individual proteins,
we try to see whether or not these proteins act alone,
or they have to be forming protein complexes in order to perform this function.
And again, so this is sort of the next level of the modeling because now you need to understand
how proteins interact and it could be the case that the protein interacts with itself
interacts with itself and makes sort of a a a multimeric
complex. The same protein just repeated multiple times and we have quite a few such proteins
in SARS-CoV-2, specifically spike protein, needs three copies to function. Envelope protein needs five copies to function.
And there are some other multi-meric complexes.
That's what you mean by a tractor with itself,
or anything multiple copies.
So, how do you make a good guess whether something's going to interact?
Well, again, so there are two approaches.
One is look at the previously solved complexes.
Now we're looking at not the individual structures, but the structures of the whole complex.
Complexes are multiple proteins.
Yes.
So it's a bunch of proteins essentially glued together.
And when you say glue, that's the interaction.
That's the interaction.
So the different forces, different sort of physical forces behind this.
I started to keep asking dumb questions, but is it the glue, is it the interaction fundamentally
structural or is it functional?
Like, in a way you're thinking about it.
That's actually a very good way to ask this question, because turns out that the interaction
is structural, but in the way it forms the structure, it actually also carries out the
function. So interaction is often needed
to carry out very specific function, or protein. But in terms of an error side figuring out
you're really starting at the structure before you figure out the function. So there's
a beautiful figure too in the paper of all the different proteins
that make up the able to figure out the makeup. The new, the novel coronavirus. What are
we looking at? So these are like, that's this through the step to the mentioned, when you try to guess at the
possible proteins, that's what you're going to get is these blue cyan blobs.
Yes, so those are the individual proteins for which we have at least some information
from the previous studies. So there is advantage and disadvantage of using previous studies.
The biggest, well, the disadvantage is that, you know, we may not necessarily have the coverage of all 29 proteins.
However, the biggest advantage is that the accuracy in which we can model this proteins is very high, much higher compared to
ab initio methods that do not use any template information. So but nevertheless this figure also has
an interesting beautiful and a lot of these pictures so much. It has like the pink parts, the parts that are different. So you're
highlighting, so the difference you find is on the 2D sequence and then you try to infer
what I would look like on the 3D. So the difference actually is on one D sequence.
One D, one D sign, that's right.. And so this is one of these first questions that we try to answer, is that, well, if you
take this new virus and you take the closest relatives, which are SARS and a couple of
bad coronavirus strains.
They are already the closest relatives that we are aware of. Now, what are the difference between these viruses
and these close relatives?
And if you look typically when you take a sequence,
those differences could be quite far away from each other.
So what 3D structure makes those difference to do,
they very often tend to cluster together.
And oversight and the differences that may look completely unrelated
actually relate to each other. And sometimes they are there because they correspond,
they attack the functional site. Right. So they are there because this is the functional site that
is highly mutated. So that's a computational approach to figuring something out.
And when it comes together like that, that's kind of a nice clean indication that there's something this could be actually indicative of what's happening.
Yes, I mean, so we need this information.
And, you know, the 3D structure gives us just a very intuitive way to look at this information and then
start asking questions such as, so functional part of the protein?
So does this part of the protein interact with some other proteins?
Or maybe with some other ligands, small molecules?
So we will try now to functionally inform this
redistructure. So, you have a bunch of these mutated parts. If like, I don't know, how many
are there in the new novel coronavirus thing compared to SARS? We're talking about hundreds
thousands, like these pink regions.
No, no, much less than that. And it's very interesting that if you look at that, you know,
so the first thing that you start seeing, right, you know, you look at patterns, right? And the
first pattern that becomes obvious is that some of the proteins in the new coronavirus are pretty much intact.
So they are pretty much exactly the same as SARS as the bad coronavirus, where some others
are heavily mutated. So it looks like that the evolution is not occurring uniformly across the entire viral genome,
but actually target very specific proteins.
What do you do with that from the Sherlock Holmes perspective. Well, you know, so one of the, of the most interesting findings we had was the fact that the
viral, so the, the binding sites on the viral surfaces that get targeted by the known small molecules, they were pretty
much not affected at all.
And so that means that the same small drugs or small drug-like compounds can be efficient for the new coronavirus. So this all actually maps to the drug compounds too.
Like so you're actually mapping out what old stuff is going to work on this thing. And then
possibilities for new stuff to work by mapping out the things that have mutated.
Yes.
So we essentially know which parts behave differently
and which parts are likely to behave similar.
And again, of course, all our predictions need to be validated by experiments.
But hopefully that sort of helps us to delineate
the regions of this virus that can be promising in terms
of the drug discovery.
You kind of mentioned this already,
but maybe you can elaborate.
So how different from the structural and functional
perspective does the new coronavirus
appear to be relative to SARS?
We now are trying to understand the overall structural characteristics of this virus, because
that's our next step, trying to model the viral particle of a single viral particle of this virus. So that means you have
the individual proteins, like you said, you have to figure out what their interaction is.
So is that where this graph kind of interacts on?
So, so the interact on with the essentially, so our prediction on the potential interactions, some of them that we already
deciphered from the structural knowledge, but some of them that essentially are deciphered
from the knowledge of the existing interactions, that people previously obtain for SARS,
for MERS or other related viruses.
So is there kind of interactomes?
Am I pronouncing that correctly?
Yeah, interactomes.
Yeah.
Are those already converged towards SARS for...
So I think there are a couple of papers that now investigate the sort of
large scale set of interactions between the new SARS and its host.
And so I think that's an ongoing study, I think.
And the success of that, the result would be an interaction.
Yes.
And so when you say, not trying to figure out the entire, the particle, the entire thing,
right?
So, if you look, you know, so structure, right?
So what this viral particle looks like, right?
So as I said, it's, you know, the surface of it is an envelope, which is essentially a so-called
lipid by layer with proteins integrated into the surface. So an average particle is around
is around 18 nanometers. Right? So this particle can have about 5200 spike proteins. So at least we
suspect it and you know based on the micrographs images, it's very comparable to MHV virus in mice
and SARS virus. Micrographs are actual pictures of the actual virus. Okay, so these are models.
This is actually the actual, the actual images, right? What are they, sorry for the tangents,
but what are these things? So when you look on the internet, the models and the pictures
are in the models you have here.
I just gorgeous and beautiful.
When you actually take pictures of them
with a micrograph, like what?
What are we looking?
Well, they typically are not perfect.
So most of the images that you see now
is the sphere with those spikes around. Yes, you do see the spikes. And now, you know,
the our collaborators for Texas and a NMA University, Benjamin Newman, he actually, in a recent paper about SARS, he proposed and there is some actually evidence
behind it that the particle is not a sphere, but it actually is an elongated ellipsoid
like particle.
So that's what we are trying to incorporate into our model.
And if you look at the actual micrographs,
you see that those particles are not symmetric.
So some of them, and of course, it could be due to the treatment of the material,
it could be due to the some noise in the imaging.
So there's a lot of uncertainty.
So it's okay, it's structurally figuring out the entire part.
By the way, sorry for the tensions, but why the term particle?
Or is it just...
It's a single, so we call it the virion so a virion particle
It's essentially a single virus single virus, but it just feels like
Because particle to me from the physics perspective feels like this the most basic unit
Because there seems to be so much going on inside the virus.
Yeah. It doesn't feel like a particle to me. Yeah, well, yeah, it's probably, I think it's,
the, you know, Varian is a good way to call it. So, okay, so trying to figure out the entirety of the system. Yes. So this is, so the Varian has 50 to 100 spikes,
a trimmer spikes.
It has roughly 200 to 400 membrane protein dimers.
And those are arranged in the very nice lattice so you can actually see sort of the it's it's like a
It's a carpet of
Under surface again exactly on the surface and
occasionally you also see this envelope protein
Inside and some that one we don't know what it does exactly the one that that forms the
pentamer this very nice pentamer cring and so you know so this is what we're trying to
you know we're trying first of all, to understand
how it looks like, how far it is from those images that were generated. But I mean, the are, there is a potential for the nanoparticle design that will mimic this
Virion particle. It's the process of nanoparticle design, meaning artificially
designing something that looks similar. Yes, so the one that can potentially compete with the actual variant particles
and therefore reduce the effect of the infection.
So is this the idea of what is a vaccine?
So vaccine, yeah, so there are two ways of essentially
treating, and in the case of vaccine is preventing the infection.
So vaccine is a way to train our immune system.
So our immune system becomes aware of this new danger,
and therefore is capable of generating the antibodies, then we'll essentially
bind to the spike proteins, because that's the main target for the vaccines design. And
and block its functioning. If you have the spike with the antibody on top, it can no longer interact with AC2 receptor.
So the process of designing a vaccine, then, you have to understand enough about the structure of the virus itself
to be able to create an artificial particle? Well, I mean, so also the nanoparticle is a very exciting and new research.
So there are already established ways to make vaccines.
And there are several different ones. So there is one where essentially the virus gets through the cell culture multiple times.
So it becomes essentially adjusted to the specific embryonic cell.
And as the result becomes less less, you know, compatible with the, you know, host human cells.
So therefore, it's sort of the idea of the life vaccine where the particles are there,
but they are not so efficient, you know, so they cannot replicate as rapidly as before the vaccine.
They can be introduced to the immune system, the immune system will be born, and the person
who gets this vaccine won't get sick or will have mild symptoms.
So then there is different types of the way to introduce the non-functional parts of
this virus or the virus where some of the information is stripped down, for example, device with no genetic material.
So we can't re-appear an genome exactly. So you cannot replicate, it cannot essentially perform
most of its functions. That's a bad thing. What is the biggest hurdle to design one of these,
to arrive at one of these? Is it the work that you're doing in the fundamental understanding
of this new virus, or is it in the, from my perspective,
well, complicated world of experimental validation
and sort of showing that this, like going to the whole process
of showing this is actually going to work with FDA approval,
all that kind of stuff?
I think it's both.
I mean, you know, our understanding
of the molecular mechanisms will allow us to, you know, to design, to have more efficient designs of
the vaccines. However, the ones you design the vaccine, it needs to be tested. But when you look
at the 18 months and the different projections, it seems like an exception
from historically speaking, maybe you can correct me, but even 18 months seems like a very
accelerated timeline.
It is.
I mean, I remember reading in a book about some previous vaccines that it could take up to 10 years to design and properly test
a vaccine before its mass production.
So yeah, everything is accelerated these days.
I mean, for better, for worse, but we definitely need that.
Well, especially with coronavirus, the scientific community is really stepping up
and working together at the collaborative aspects
is really interesting.
You mentioned a vaccine is one,
and then there's antiviral drugs.
So antiviral drugs, where vaccines are typically needed
to prevent the infection.
But once you have an infection,
so what we try to do, we try to stop it.
So we try to stop virus from functioning.
And so the antiviral drugs are designed to block some critical function of the proteins
from the virus. So there are a number of interesting candidates and I think, you know, if you ask me, I, you know, I think Remdesivir is perhaps the most promising. It has been shown to be an efficient and effective antiviral for SARS.
Originally, it was the antiviral drug developed for a completely different virus, I think, for a ball and bar, Marburg.
And high levels, you know how it works.
So it tries to mimic one of the nuclear ties in RNA, and essentially that stops the replication.
So, I guess that's what any viral drugs
mess with some aspect of this process.
So essentially we try to stop certain functions
of the virus.
There are some other ones that are designed
to inhibit the protease, the thing that clips protein sequences. There is one that
was originally designed for malaria, which is a bacterial, you know, bacterial disease. So, this
is so cool. So, but that's exactly where your work steps in, is you're figuring out the functional,
This is so cool, but that's exactly where your work steps in, is you're figuring out the functional,
then the structure of these different.
So like providing candidates for where drugs can plug in.
Exactly.
Well, yes, because, you know,
one thing that we don't know is whether or not,
so let's say we have a perfect drug candidate
that is efficient against SARS and against
MERS. Now, is it gonna be efficient against a new SARS-CoV-2? We don't know that
and there are multiple aspects that can affect these efficiency. So, for
instance, if the binding site, so the part of the protein where this ligand gets attached,
if this site is mutated, then the ligand may not
be attachable to this part any longer.
And how work and work of other bind-famarics groups,
essentially are trying to understand whether or not that will be the case. And it looks like for the ligands that we looked
at, the ligand binding size are pretty much much intact Which is really promising if we can just like zoom out for a second
What are you optimistic?
So there's two well, there's three possible ends to the coronavirus pandemic
So one is there's or drugs of vaccines
Get figured out very quickly, probably drugs first.
The other is the pandemic runs its course for this wave at least.
And then the third is, you know, things go much worse.
In some dark, bad, very bad direction.
Do you see, let's focus on the first two.
Do you see the antidrocks of the work you're doing
being relevant for us right now
in stopping the pandemic?
Or do you hope that the pandemic will run its course?
So the social distancing,
things like wearing masks, all those discussions that we're having will be the method
with which we fight coronavirus in the short term. Or do you think that it'll have to be antiviral drugs. I think antivirals would be, I would view that as the,
at least the short-term solution. I see more and more cases in the news of those new drug candidates been administered in hospitals.
And I mean, this is right now the best what we have.
But do we need it?
We don't reopen the economy.
We definitely need it.
I cannot speculate on how that will affect reopening of the economy, because we are deep into the pandemic.
And it's not just the states, it is also the possibility of the second wave, as you mentioned.
And this is why we need to be super careful.
We need to follow all the precautions that the doctors tell us to do.
Are you worried about the mutation and the virus?
So it's of course a real possibility.
Now how to what extent this virus can mutate, it's an often question. I mean, we know that it is able to mutate,
to jump from one species to another, and to become
transmissible between humans.
Right, so, will it, you know, so let's imagine that we have the new antiviral.
Will this virus become eventually resistant to this antiviral?
We don't know.
I mean, this is what needs to be studied.
It's such a beautiful and terrifying process that a virus, some viruses,
may be able to mutate to respond to the, to mutate around the thing we've put before
it. Can you explain that process? Like, how does that happen? Is that just the way of evolution?
I would say so, yes. I mean, it's, it's the evolutionary mechanisms. There is nothing imprinted into this virus that makes it, you know, it just the way it evolves
and actually it's the way it coerie walls with its host.
It's just amazing.
Especially the evolutionary mechanisms, especially amazing, given how simple the virus is.
It's incredible that it's, I mean, it's beautiful.
It's beautiful because it's one of the cleanest
examples of evolution working.
Well, I think I mean, one of the sort of,
the reason for its simplicity is because
it does not require all the necessary functions to be stored.
So it actually can hijack the majority of the necessary functions from the host cell.
So the ability to do so in my view reduces the complexity of this machine drastically.
Although, if you look at the most recent discoveries,
so the scientists discovered viruses that are as large as bacteria,
so these Mimi viruses and mama viruses, it actually, those discoveries made sciences to reconsider
the origins of the virus.
And what are the mechanisms and how, what are the mechanisms, the evolution mechanisms that
leads to the appearance of the viruses?
By the way, you did mention the viruses are...
I think you mentioned that they're not living.
Yes, they're not living organisms.
So let me ask that question again.
Why do you think they're not living organisms?
Well, because they are dependent,
the majority of the functions of the virus are dependent on the host.
So let me do the devil's advocate. Let me be the philosophical devil's advocate here and say,
well, humans, which we would say are living, need our host planet to survive. So you can basically take every living organism
that we think of as definitively living. It's always going to have some aspects of its
host that it needs of its environment. So is that really the key aspect of why a virus is that dependence?
Because it seems to be very good at doing so many things that we consider to be intelligent.
It's just that dependence part.
Well, I mean, it's difficult to answer in this way.
I mean, the way I think about the virus is, the critical tools that it doesn't have.
So, I mean, that's, in my way, it's not autonomous.
That's how I separate the idea of the living organs on a very high level.
Yes, between the living organism and.
And you have some note we have I mean these are just terms and perhaps they don't mean much but we have some kind of sense of what autonomous means and that humans are autonomous.
You've also done excellent work in the epidemiological modeling, the simulation of these things.
So, zooming out outside of the body, doing the agent-based simulation.
So, that's where you actually simulate individual human beings, and then the spread of viruses
from one to the other. How does at a high level
age-based simulation work?
All right, so it's also one of this irony of timing because I mean we've worked on this project for the past five years. And the New Year's Eve, I got an email from my
Pigeous student that, you know, the last experiments were completed. And, you know, three weeks after
that, we get this Diamond Princess story. And, you're mailing each other with the same you know the same news saying like
So the damper is this is a cruise ship. Yes, and what was the project? They you work on so the project. I mean it's
You know the code name it started with the bunch of undergraduates
The code name was zombies on the cruise ship
So they they wanted to essentially model the zombie apocalypse on the cruise ship.
And after having some fun, we then thought about the fact that if you look at the cruise ships, I mean the infectious outbreak has been one of the biggest
threats to the cruise ship economy. So perhaps the most frequently occurring virus is the Norfolk virus.
And this is essentially one of this stomach flus that you have.
this is essentially one of this stomach flus that you have.
And it can be quite devastating.
So there are occasionally there are cruise ships get, you know, they get canceled, they get returned to the,
back to the origin.
And so we wanted to study, and this is very different from the traditional
epidemiological studies where the scale is much larger. So we wanted to study this in
a confined environment, which is a cruise ship. It could be a school. It could be other places such as the large company where people are in interaction and the
benefit of this model is we can actually track that in the real time. So we can
actually see the whole course of the evolution, the whole course of the evolution,
the whole course of the interaction between the infected
infected host and the host and the pathogen, et cetera.
So agent-based system, multi-agent system
to be precisely, is a good way to approach this problem,
because we can introduce the behavior of the passengers,
of the crews.
And what we did for the first time,
that's where we introduce some novelty
is we introduce a pathogen agent explicitly.
So that allowed us to essentially model the behavior
on the host site as well on the pathogen site.
And over sudden we can have a flexible model that allows us to integrate all the key
parameters about the infections. So for example, the virus, right? So the ways of transmitting the virus
between the hosts, how long does virus survive on the surface for might. What is, you know, how much of the viral particles
does a host shed when he or she is a symptomatic versus symptomatic?
And you can encode all of that into this pattern.
Yeah, just for people who don't know, so agent-based simulation, usually the agent represents a single human being.
And then there's some graphs, like contact graphs, that represent the interaction between
those human beings.
So, yeah.
So, essentially, you know, so agents are, you know, individual programs that are run and parallel.
And we can provide instructions for these agents how to interact with each other,
how to exchange information, in this case, exchange the infection.
But in this case, in your case, you've added a pathogen as an
agent. I mean, that's kind of fascinating. It's a, it's kind of a brilliant, like a brilliant
way to condense the parameters to aggregate, to bring the parameters together that represent
the pathogen, the virus. Yes, as fast as anything, actually. So yeah, it was a, you know, we realized that, you know,
by bringing in the virus, we can actually start modeling.
I mean, we are not no longer bounded
by very specific sort of aspects of the specific virus.
So we end up, we started with, you know, Norfolk virus and of course, zombies,
but we continued to modeling a Bolo virus outbreak, flu, SARS, and because I felt that we need to add a little bit more sort of excitement for our undergraduate students. So we actually
modeled the virus from the contagion movie. So M-E-V-1. And you know, unfortunately that virus,
and we try to extract as much information. Luckily, this movie was
a scientific consultant, was Jan Lipkin, a virologist from Columbia University, who is actually
who provided, I think, he designed this virus for this movie based on NIPA virus and I think with some ideas behind
SARS-Fluelic airborne viruses. And you know, the movie surprisingly contained
enough details for us to extract and to model it. I was hoping you would publish a paper
of how this virus works. Yeah, we are planning
to publish. I would love it if you just say it would be nice if the, you know, if the, the,
um, the, the origin of the virus, uh, but you're now actually being a scientist and studying the
virus from that perspective. But the origin of the virus, you, you know, I, you know, the first time I actually, so this movie is assignment number one in my
Bindfamaris class that they give. Because it also tells you that, you know, Bindfamaris can be
of use, because if you watch it, have you watched it? A long time So, there is, you know, approximately a week from the virus detection, we see a screenshot
of scientists looking at the structure of the surface protein.
And this is where I tell my students that, you know, if you ask experimental biologists,
they will tell you that it's impossible because it takes months, maybe,
years to get the crystal structure of this, you know, the structure that is represented. If you
ask a biopharmatician, they tell you, sure, why not, you know, just get it modeled. And, and,
get it modeled. And, yes, but it was very interesting to see that there is actually, you know, and if
you do it, do screenshots, you actually see the Philogenetic tree, the evolutionary tree
that relate this virus with other viruses.
So it was a lot of scientific thought put into the movie. And one thing that I was actually,
you know, it was interesting to learn is that the origin of this virus was there were two
animals that led to the, you know, the, the, you know, the, the, the, the, the, the, the, the, you know, the zonotic original, the virus were fruit bat and a pig. So, you know,
so, so, so, this is, this isn't feel like we're, this definitely feels like we're living
in a simulation. Okay. But maybe a big picture, aging-based simulation now, larger scale,
sort of not focused on a crucial, but larger scale,
are used now to drive some policy.
So politicians use them to tell stories and narratives
and try to figure out how to move forward
under so much uncertainty.
But in your sense,
are agent-based simulation useful for actually predicting the future, or are they useful
mostly for comparing relative comparison of different intervention methods?
Well, I think both, because you know, in the case of new coronavirus. We essentially learning that the current intervention methods may not be efficient enough.
One thing that one important aspect that I find to be so critical and yet something that was overlooked during the past pandemics is the effect of the
symptomatic period.
This virus is different because it has such a long symptomatic period and over-sutton that creates a completely new game when trying to contain
this virus.
Interest the dynamics of the infection.
Exactly.
Do you also, I don't know how close you're tracking this, but do you also think that there's a different, like, rate of infection for when you're asymptomatic, like that?
That aspect or does a virus not care?
So, there were a couple of works.
So, one important parameter that tells us how contagious the person with asymptomatic versus asymptomatic is looking at
the number of viral particles. This person sheds, you know, as a function of time.
So far what I saw is the study that tells us that the person during the asymptomatic period is already contagious and the person has enough viruses to infect.
Yeah, and not at all.
And I think there's too many excellent papers coming up.
But I think I just saw some maybe a nature paper that said the first week is when you're
symptomatic or asymptomatic, you're the most contagious.
So the highest level of the like the plots are in the 14 day period, they collected a
bunch of subjects. And I think the first week is one of the most interesting things.
Yeah, I think I'm waiting to see sort of more populated studies was, again, a very recent one, where scientists determined
that tears are not contagious. So there is no viral shading done through through tiers. So they found one moist thing that's not contagious.
And I mean, there's a lot of,
I'm personally been,
because I'm gonna survey paper,
somehow this looking at masks.
And there's been so much interesting debates
on the efficacy of masks, and there's a lot of work.
And there's a lot of work. And there's a lot of interesting work
on whether this virus is airborne.
I mean, it's a totally open question.
There's, it's leaning one way right now,
but it's a totally open question,
whether it can travel and aerosols long distances.
I mean, do you have,
do you think about the stuff,
do you track the stuff, are you focused on the plan of format? I mean, do you have a, do you think about the stuff? Do you track the stuff? Are you focused on the? Yeah, I mean, I'm a mentor. I mean, this is a very important aspect for our epidemiology study.
I think the, I mean, and it's sort of a very simple sort of idea, but I agree with people who say that the masks work in both ways.
So it not only protects you from the incoming viral particles, it also makes the potentially contagious person not to spread the viral particles.
Who is when they're asymptomatic, may not even know that they're exactly?
In fact, it seems to be there's evidence that they don't surgical and certainly homemade
masks, which is what's needed now, actually, because there's a huge shortage of, they
don't work as to protect you that well.
They work much better to protect others. So it's a motivation for us to all wear one.
Exactly. Because I mean, you don't know where, you know, about 30% as far as I remember,
at least 30% of the asymptomatic cases are completely asymptomatic.
Yeah.
So you don't really cough, you don't have any symptoms yet you shed viruses.
Do you think it's possible that we'll all wear masks?
So I wore a mask at a grocery store and you just, you get looks, I mean, it was like a week
ago. mask at a grocery store and you just you get looks I mean this was like a week ago maybe
it's already changed because I think CDC or somebody's I think the CDC said that we should
be wearing masks like the LA they starting to happen but you just it just seems like something
that this country will really struggle doing or no I hope not I, you know, it was interesting. I was looking through the old pictures during
the Spanish flu. And you could see that the, you know, pretty much everyone was wearing
masks with some exceptions. And they were like, you know, sort of iconic photograph of the, I think it was San Francisco, this tram who was refusing to let in a, you
know, someone without a mask.
So I think, well, you know, it's also, you know, it's related to the fact of, you know,
how much we are scared.
So how much do we treat this problem seriously?
And my take on it is we should, because it is very serious.
Yeah, I, I, I, from a psychology perspective, just worry about the entirety, the entire big mess, the, of a psychology experiment that this is, whether a mask will help it or hurt it, you know, the masks have a way of distancing us from others by removing the emotional expression
and all that kind of stuff. But at the same time, masks also signal that I care about your well-being.
Exactly. So it's a really interesting trade-off that's just the...
Yeah, it's interesting, right? About distancing. Aren't we distanced enough?
It's interesting, right? It's about distancing.
Aren't we distanced enough?
Right, exactly.
And when we try to come closer together,
when they do reopen the economy, that's
going to be a long road of rebuilding trust,
and not all being huge germophobes.
Let me ask, sort of, you have a bit of a Russian accent? Russian or no? Russian accent?
So, were you born in Russia? Yes. And you're too kind. I have a pretty thick Russian accent.
What are your favorite memories of Russia? So, I moved first to Canada and then to the United States back in
1999. So by that time I was 22 so you know whatever Russia next and I got back then you know it's that for me for the rest of my life. So by the time the Soviet Union collapsed,
I was a kid, but all enough to realize that there are changes.
all the enough to realize that there are changes.
And did you want to be a scientist back then?
Oh, yes. Oh, yeah.
I mean, my first, the first sort of 10 years of my sort of, you know,
a junior life, I wanted to be a pilot of a passenger jet plane. So yes, it was like, you know, I was getting ready to go to a college to get the degree, but I've been always fascinated by science. And, you know, so not just by math.
Of course, math was one of my favorite subjects.
But, you know, biology, chemistry, physics, somehow I, you know,
I liked those four subjects together.
And, yes, also, so essentially after a certain period of time, I wanted to actually, back then,
it was a very popular area of science called cybernetics.
So it's not really computer science, but it was like, you know, computational
robotics in this sense.
And so I really wanted to do that.
And, but then, you know, I, you know, I realized that, you know, my biggest passion was in
mathematics. My biggest passion was in mathematics and later I, you know, when, you know, studying
in Moscow State University, I also realized that I really want to apply the knowledge.
So I really wanted to mix, you know, the mathematical knowledge that I get with real life problems.
And that could be, you mentioned chemistry and biology.
And I sort of, does it make you sad?
Maybe I'm wrong on this, but it seems like it's difficult to be in collaboration to do open big science
in Russia. From my distant perspective in computer science, I don't, I'm not, I can go to conferences
in Russia. I sadly don't have many collaborators in Russia. I don't know many people doing great AI work in Russia.
Does that make you sad?
Am I wrong in seeing it this way?
Well, I mean, I have to tell you,
I am privileged to have collaborators
in bioinformatics in Russia.
And I think this is the bioinformatics school
in Russia is very strong.
We have in Moscow, in Moscow, in Novosib you know, my area of research, the strong people there.
Yeah, strong people, a lot of great ideas, very open to collaborations.
So, I perhaps, you know, it's my luck, but, you know, I haven't experienced, you know,
any difficulties in establishing collaborations.
That's panthematics, though.
It could be panthematics, too.
And it could be, yeah, it could be person-by-person related,
but I just don't feel the warmth and love that I would,
you know, you talk about the semle people who are French in artificial intelligence.
France welcomes them with open arms. And so many ways, I just don't feel the love from Russia.
I do on the human beings, like people in general, like friends and just cool, interesting people,
but from the scientific community, no conferences, no big conferences. And it's just cool, interesting people, but from the scientific community, no conferences,
no big conferences.
And it's, yeah, it's actually, you know,
I'm trying to think, yeah, I cannot recall any big
AI conferences in Russia.
It has an effect on, for me, I haven't sadly been back
to Russia, so I should, but my problem is it's very difficult.
So I have to re-knowledge the citizenship.
I mean, I'm a citizen in the United States, and it makes me very difficult.
There's a mess now, right?
I want to be able to travel like, you know, legitimately.
And it's not an obvious process. They don't make it
super easy. I mean, that's part of that. Like, you know, it should be super easy for me to travel
there. Well, you know, hopefully, this unfortunate circumstances that we are in will actually promote the remote collaborations.
Yes.
And I think we've just, I think what we are experiencing right now is that you still can do science,
you know, being current in your own homes, especially when it comes, I mean, you know,
I certainly understand there is a very challenging time for experimental sciences.
I mean, I have many collaborators who are affected by that, but for computational scientists.
We're really leaning into the remote communication.
Nevertheless, I had to force you to talk to you in person because there's something
that you just can't do in terms of conversation like this.
I don't know why, but in person is very much needed.
So I really appreciate you doing it.
You have a collection of science Bobbleheads.
Yes.
Which look amazing.
Which Bobblehead is your favorite
and which real world version,
which scientist is your favorite?
Yeah.
So yeah, by the way, I was trying to bring it in, but they are
currently now in my office.
They sort of demonstrate the social distance.
So they're nicely spaced away from each other.
But so it's interesting.
So I've been collecting those bubble heads for the past,
maybe 12 or 13 years, and it's interesting enough.
It started with the two bubble heads of Watson and Creek.
And interestingly enough, my last bubble
had in this collection for now, and my favorite one, because I felt
so good when I got it was the Rosalind Franklin.
And so, you know, when I got it.
Who is the folk group?
So I have Watson, Crick, Newton, Einstein, Marie-Carrie, Tesla, of course Charles Darwin, so Charles Darwin, and
Rosanne Franking. I am definitely missing quite a few of my favorite scientists, But so, you know, if I were to add to this collection, so I would
add, of course, Kalmagorov. That's, that's, that's, you know, I've been always fascinated by
his, well, his dedication to science, but also his dedication to educating young people, the next generation.
So it's very inspiring.
He's one of the Russia's great. The high school that I attended was named after him and he was a great...
So he founded the school and he actually taught there.
Is this a Moscow?
Yes.
So, but then I mean, you know, other people that I would definitely like to see in my collections was would be
Alan Turing
Would be John von Neumann
Yeah, you're a little bit later in the computer scientists. Yes, I mean they don't they don't make them
No, I still am amazed they They haven't made Alan Turing.
And I would also add Linus Pauling.
Linus Pauling.
So who is Linus Pauling?
So this is, to me, it's one of the greatest chemists
To me, it's one of the greatest chemists and the person who actually discovered the secondary structure of proteins was very close to solving the DNA structure. and people argue, but some of them were pretty sure that if not for this, you know,
photograph 51 by Rosalind Franklin that, you know, what's on the screen got access to. He would be
he would be the one who would solve it. Science is a funny race.
Let me ask the biggest and the most ridiculous question.
So you've kind of studied the human body and its defenses and these enemies that are about
from a biological perspective, a biophinformatics perspective, a computer science perspective,
how has that made you see your own life, sort of the meaning of it, or just even seeing it, what it means to be human? Well, it certainly makes me realizing how fragile the human life is.
If you think about this little tiny thing, can impact the life of the whole human kind
to such extent. So, you know, it's something to appreciate and to, you know, to remember that, We have to bond together as a society.
And, you know, it also gives me sort of hope that what we do as scientists is useful.
I don't think there's a better way to end it. It means you thank you so much for talking today. It was an honor
Thank you very much
Thanks for listening to this conversation with me, Chikorkin and thank you to a presenting sponsor cash app
Please consider supporting the podcast by downloading cash app and using code Lex
podcast if you enjoy this podcast subscribe on YouTube review it with five stars and apple podcasts
If you enjoy this podcast, subscribe on YouTube, review it with 5 stars and Apple podcasts, supporting on Patreon are simply connected with me on Twitter at Lex Friedman.
And now let me leave you with some words from Edward Osborne Wilson, E.O. Wilson.
The variety of genes on the planet in viruses, exceeds, or is likely to exceed, that in all
of the rest of life combined.
Thank you for listening and hope to see you next time.
Thank you.