Microsoft Research Podcast - 119 - Defending DRAM for data safety and security in the cloud
Episode Date: July 8, 2020Dynamic random-access memory – or DRAM – is the most popular form of volatile computer memory in the world but it’s particularly susceptible to Rowhammer, an adversarial attack that can cause da...ta loss and security exploits in everything from smart phones to the cloud. Today, Dr. Stefan Saroiu, a Senior Principal Researcher in MSR’s Mobility and Networking group, explains why DRAM remains vulnerable to Rowhammer attacks today, even after several years of mitigation efforts, and then tells us how a new approach involving bespoke extensibility mechanisms for DRAM might finally hammer Rowhammer in the fight to keep data safe and secure.
Transcript
Discussion (0)
So our philosophy is that what we would like to see in the standard, rather than describing
the solution for Rowhammer, what we would like to see is describing extensibility mechanisms
that companies, hardware vendors, can implement their favorite form of mitigations, the one
that works best for their particular type of memory, by leveraging
these extensions.
You're listening to the Microsoft Research Podcast, a show that brings you closer
to the cutting edge of technology research and the scientists behind it.
I'm your host, Gretchen Huizenga.
Dynamic Random Access Memory, or DRAM, is the most popular form of volatile computer memory in the world,
but it's particularly susceptible to RoHammer, an adversarial attack that can cause data loss and security exploits in everything from smartphones to the cloud.
Today, Dr. Stephan Soroyo, a senior principal researcher in MSR's Mobility and Networking Group,
explains why DRAM
remains vulnerable to RoHammer attacks even after several years of mitigation efforts,
and then tells us how a new approach involving bespoke extensibility mechanisms for DRAM
might finally hammer RoHammer in the fight to keep data safe and secure.
That and much more on this episode of the Microsoft Research Podcast.
Stefan Zeroyu, welcome to the podcast.
Thank you, Gretchen. It's great to be here.
Some of my favorite people on the planet are working on making things work for us,
and you're one of those people. So first, thanks. As we begin, though, let's talk about your people
for a minute. You're a senior principal researcher in the mobility and networking group, which isn't
totally separate from systems and networking, but they're not totally the same either. So
give us a verbal Venn diagram of the two groups, why they exist, where they're different, where they overlap, and how in broad strokes each of them is working to make our lives better.
Yes, thank you for the kind words, Gretchen.
So back in the day, Microsoft Research had a single systems and networking group.
And as the group got larger, the group split into several smaller groups like the systems group, the security group,
the distributed systems group, and the mobility and networking group. But we're all systems
researchers at the end of the day, whether we work on operating systems, on networks,
on mobile systems, or on distributed systems. So I'm part of the mobility and networking group.
But over my research career, my work has focused on systems, both in terms of mobile systems and networking
systems. And for the past couple of years, these systems that I've been working on aim to improve
the security of users and the security of infrastructure. Let's get specific and talk
about the work you do within the mobility and networking group now. So sort of in general,
what big problems are you trying to solve as a researcher?
And maybe more importantly, why does the world need you to solve them?
What gets you up in the morning?
So, I do two kinds of work.
The first kind is creative work because I really value creativity very highly.
And I believe it's very difficult to come up with a truly creative idea.
The second kind of work that I do is driven by intellectual curiosity and by revisiting assumptions or turning them on their head. And I strongly believe that the role of an expert is to break preconceived assumptions and rules.
Unfortunately you have to be an expert first. In fact, trying to break
assumptions before understanding deeply an area and a problem is a very bad idea.
So I've been working on secure systems research for
almost a decade now. We built a secure network tracing system that offers very strong privacy.
So for example, network operators can monitor their networks in a way that all the sensitive
data is locked down without anybody being able to subvert it or use it in any way other than
originally intended. We built sensors that can attest their information is correct
and has not been manipulated or changed.
So as a simple example, consider a photo
where one can check whether the photo has been photoshopped
or is indeed captured by a proper camera.
Then I worked on a secure payment system called Zero Effort Payments
that was a little like the precursor of Amazon Go Store.
So our system was a little
different in that you'd pick up the food and you'd go through a cashier who would ring you up,
but you'd not have to do any explicit thing to actually pay. The system would know who you are.
And since you'd have to pre-register with the system and the payment would be processed.
So I've worked on all these things. I've also worked on a firmware TPM, which brings trusted
computing to mobile devices.
And it works in millions of smartphones and tablets today.
But for the past couple of years, I've worked on Azure security in particular.
And we started a project called Project Stema.
Stema stands for secure, trustworthy, and enhanced memory for Azure.
And we've been focusing a lot on Rowhammer attacks.
Well, let's talk about memory and computer memory specifically, since it's the foundational
storage unit for digital data. But there are many kinds of containers, as you well know. So let's do
a quick primer for the flavor that we're really most interested in today, which is DRAM. So how
does it work physically? What are its vulnerabilities, both internally
and externally? And you don't need to get ridiculously granular here, because I saw
your 114-page deck and 100 pages of it is explaining DRAM. No, I'm kidding. Don't be
afraid to get as technical as you need to set the problem up.
Okay. So DRAM is the world's most popular form of volatile memory. Pretty much
every form of computing out there has DRAM. You can find DRAM in smartphones, in tablets, in PCs.
You can find DRAM in cars. You can find DRAM in washing machines. And a DRAM cell stores a zero
or a one. And it does that by using a very simple circuit with one capacitor.
And a capacitor can be charged or discharged, and that can mean a one or a zero. So for example,
if you want to store the value 1010, you just sort of have four cells, and you have one charge
capacitor, one discharge capacitor, one charge, and one discharge, and you encode 1010 that way.
Now, capacitors leak over time. They sort of lose their charge over time. So DRAM has to continuously refresh these capacitors. And the cells are built to maintain their charge for a
small period of time, say something like 64 milliseconds. And the contract is that the hardware has to make sure
that every single cell in its DRAM
is refreshed once within 64 milliseconds.
And in that way, the cell maintains its data, its charge.
Now, DRAM cells are organized in rows and columns.
And when you read a value from DRAM, you read by row.
And the way you read this is by switching some transistors
in such a way that the capacitors are then coupled
with some sensors.
So the sensors sense whether these capacitors
are charged or discharged,
and then they can translate that into data.
Now, unfortunately unfortunately what's happening
is that when you actually sense the data and the capacitors it turns out that
rows located in the vicinity in the adjacency of this row you're trying to
read those capacitors also get affected and they get affected by having them
discharge faster than normal.
And this phenomenon is called a DRAM disturbance error, because by causing them to discharge
faster within a 64 millisecond period, you lose the content of that cell.
And in some sense, the bit flips that way.
And the bit that flips is one that you never actually meant to read or access before.
Maybe you don't even have control over it.
Maybe it's some other software component that controls it.
So that's where sort of the concern lies.
In the DRAM space, there is this Rowhammer attack.
And the contract from day one, when you build any system, any software system, anything you want, any computer,
the contract is that if you give me a piece of memory, when I write something to it, I want to be able to read what I wrote. And with Rowhammer, you violate this very
simple contract. You read a different value than the one you wrote. And doing that, you can
basically exploit systems in ways that are unimaginable before.
Well, since we're talking about Rowhammer right now, let's move into it. As you put it once, it's one of the hottest research topics in the security research community.
So give us a level set.
What is Rowhammer specifically?
And why particularly does it make cloud providers and server farmers nervous?
So I described this DRAM disturbance errors effect.
And this effect gets worse as DRAM
gets denser and denser.
And we want DRAM to get denser and denser because that's how we store more capacity.
That's how we build better DRAM.
But the phenomenon gets worse, this DRAM disturbance error.
A row hammer attack, it's an attack in which an adversary generates a workload that exploits disturbance errors
to flip the value of bits that have critical importance to the security of the system.
Like, for example, bits that form a secret key.
And cloud vendors are very nervous because the entire business model is one where you
have multiple parties share your hardware.
In particular, in this case, they share your memory, they share your DRAM.
Well, what if one of these customers becomes rogue?
They themselves get exploited through some other attack.
Can they attack other customers by flipping bits in their memory?
And yes, they can attack it in very devastating ways.
How did Rowhammer get its name?
Is it because of the rows in the DRAM?
I was explaining how when you access a row,
an adjacent row gets affected by that.
And the attack, in order to create this disturbance error,
what you have to do is have to keep accessing that row
over and over and over again.
And that's the term hammering the row. And the
attack got this name, row hammer. So if I'm an attacker, am I trying to do something specific
or am I just trying to mess you up? Oh, that's a great question. Depends, right? So as a cloud
provider, the cloud providers are nervous about both scenarios. A row hammer attack in general refers to flipping a security critical bit.
So by flipping that bit, I'm trying to target something specifically.
I'm trying to exploit something.
However, the simpler and in fact, the likelier form of attack is one where I'm just messing
up the bits.
And systems actually today in the cloud,
they're pretty good at detecting
when these bits are messed up.
But if the bits are messed up,
there's just very little you can do about that.
I see these bits and I've encoded enough redundancy
in the data to know that they're messed up,
but I can't recover to where I was before.
And there's really not a good way to solve that problem. Once the bits have flipped, it's like, I can't go back.
And the best thing I can hope for is maybe you reboot the server and let's start all over again.
And that's also very, very bad for a cloud customer because there are a lot of workloads in the cloud that have a lot of data in memory.
They do a lot of computation.
Maybe they train a machine learning model.
And then, you know, for many, many days and at some point you say, well, sorry, guys,
you have to start all over again because we've messed it up.
Before we dive into the technical aspects of your research on the Rowhammer threat,
I find this whole drama really fascinating, and I think it would be good to set the stage
and the cast of characters. We've talked about cloud providers. Who are the other players? Who
provides to the providers, and what's their motivation? Who sets and guards the standards?
And finally, who's got an eye on everybody? It's a fascinating landscape. Microsoft
is a cloud provider, and I started with cloud providers because part of our role at Microsoft
Research is also to make the cloud better. There are several other players, and one big players
are the companies that sell DRAM. And when the attack was first described or published, which
was in 2014, the hardware vendors jumped to quickly dismiss these
concerns. We knew about Rowhammer, but that's a problem that the older type of memory has.
The newer type of memory, it doesn't have this problem anymore. And in fact,
there are quotes online where vendors claim that DDR4, which is the memory that we all use today,
is Rowhammer-free. And of course, researchers have shown over and over again that DDR4 is not row hammer
free.
And then they said, oh, yes, but then you should buy this newer DDR4 that has a form
of defense called TRR.
And just this year earlier, there was a wonderful paper from an academic group in the Netherlands
that showed a huge number of DDR4 DRAM with this form of defense TRR
being vulnerable by slightly changing the form of the attack.
So basically what the vendors have done is they've patched the old way of mounting the attack,
but they haven't told anyone how they patched it,
and you just have to sort of try different things until one of the new things clicks,
and then you can bypass those defenses.
So now we have the cloud providers, the DRAM vendors, and then the security research
community. And there is this feeding loop where the DRAM vendors say, oh, yeah, we knew about it.
The new memory is safe. Give it a year or two. The security research community says, oh, it's not
safe. And sort of the cloud providers and the smartphone manufacturers as well are caught in
the middle. So Rowhammer is a problem. It's a big one. And it hasn't been ignored, but hasn't been solved either.
So when we talked before, you said that more than 40 papers have been published on this subject,
and DRAM still remains as vulnerable as ever. So what has the academic community done to date
to try to solve the Rowhammer problem? And what to date have they got right and got wrong? Right. So there are sort of two bodies of work in academia. One is the
security research community. And then there is the computer architecture community. And to give them
credit, actually, the computer architecture community were the first ones to show this
problem to sort of raise the flag saying, hey, we have a DRAM disturbance error.
And the architecture community has been very good at putting forward a whole bunch of Rowhammer
mitigation proposals. However, all these proposals, they come with trade-offs. And
implementing one of these mitigations inside of DRAM will make that DRAM ultimately more expensive
in some way. Maybe it will decrease the density. Maybe the DRAM vendors make that DRAM ultimately more expensive in some way.
Maybe it will decrease the density.
Maybe the DRAM vendors will have to add extra memory or extra sort of counters to keep track
of who's accessing what.
And the market forces in the DRAM world are in such a way that they need to use every
single piece of real estate they have to just cram more and more cells.
And that's where the security research community comes in, where they keep sort of reverse
engineering and trying different things.
And they found ways to go around those mitigations and show new forms of attack.
So to be fair to them, it's sort of also a business thing.
Like I said, people knew about Rowhammer and there were discussions in their sort of
standardization body. That's an organization called JDEK, where a lot of
hardware vendors and software companies actually participate. There's been a lot of discussion over
the years on implementing a solution. And in fact, that's what they're doing now.
There won't be a single solution for Rowhammer that will work for every single type of memory
out there and for which every single hardware vendor
will be willing to actually implement.
So our philosophy is that what we would like to see
in the standard rather than describing
the solution for Rowhammer,
what we would like to see is describing
extensibility mechanisms that companies,
hardware vendors, can implement their favorite
form of mitigations, the one that works best for their particular type of memory,
by leveraging these extensions. So that's what we're trying to sort of change and shift.
In light of all that stuff, and tell us about your most recent work that involves what you
called an end-to-end methodology to help cloud providers determine if they're susceptible to Rowhammer, because that's that upstream approach that you're
talking about instead of the patch afterwards that's impossible. So in the context of our cast
of characters and against the backdrop of computer memory solutions that have trust issues, tell us
how you are attacking this, what your methodology is, how successful it
is, and what are the key challenges that you face? So what we try to do is we try to help the software
company by building a systematic and scalable testing methodology to test whether your DRAM
is susceptible to a raw hammer attack. And to build such a methodology, you have to overcome two practical challenges.
You have to devise a sequence of instructions
that your processor executes
that hammers the memory at the fastest possible rate.
You want to create what we call
the worst-case testing conditions for memory.
The second thing you want to do,
you want to do, you want
to know where you're hammering. Remember I was telling you how DRAM disturbance occurs to rows
that are adjacent to the row you're hammering. Well, the rows that are adjacent are the worst
affected, but even rows that are nearby are affected. So a row that's sort of two rows away,
like the next neighbor or something like that can be affected. But it's very difficult to affect a row
that's very far somewhere inside of your array.
So you have to actually know
what is the row-by-row layout of your DRAM chip.
And this is, in fact, the trade secret.
What we did was we built a hardware fault injector
that allows us to, you can think of it
like short-circuiting the memory in such a way
that we can actually always create these row hammer attacks by not letting the memory refresh itself.
So if you hammer a row and the memory never refreshes, you're going to flip bits eventually
because the capacitors will loop their charge. And then you go and study the patterns of how
these bits have flipped.
And that tells you about the layout of the cells inside of the DRAM.
Because guess what?
The row you hammered, most of the bits that flipped are going to be in its adjacent rows.
And then there will be some bits flipped next to the adjacent rows.
And then fewer bits and so on.
So you create these kind of heat maps.
So you can reverse really row-by-row adjacency by this form of short-circuiting the memory, sort of suppressing refresh commands. Okay, so you reverse engineering to find out
what's going on. Yes, we have a methodology. Our methodology can reverse engineer every single DDR4
DIMM in the world. And what you actually end up discovering when you reverse engineering is that
these maps change. They change from one vendor to another, and they
can also change from one DIMM to another, depending on the DIMM's revision. It's called
post-packaging repair. So we can actually also measure how many fixes that DRAM has had before
it was shipped to you. Okay. So this methodology has to attack, for lack of a better word, every vendor's particular proprietary chip.
And within the vendors, there's different chips as well.
So you've got a lot of things you have to be looking at.
How's it working so far?
Now, that's a great question.
It's very difficult to test every single chip that a cloud provider has.
So instead, what we're doing is we are mapping the DRAM fabrication process
for different DRAM devices and for different vendors. And then within those buckets we sample
and we test. And we actually, what we do, we look at the trends. And we want to make sure that the
workloads that we see in the cloud will not generate activations that will actually start flipping bits.
Because I was telling you, the DRAM gets worse over time, not better.
So, in fact, we can even, by waving our hands a little bit, we can predict how many years in the future it's going to be until we're going to see workloads reach a point where just by using the memory,
they're going to start flipping bits. And it's our job to influence sort of a more principled
approach to fixing the problem rather than a band-aid approach. And also to keep an eye,
not just for Microsoft, but the entire cloud industry as to at which point we'll have to do
something to make sure that the workloads are not going to actually start causing bit flips.
So, for example, one of the things you might want to do when you actually detect that a virtual machine starts accessing the memory in a way that might actually induce bit flips,
you could try to slow it down or migrate it to a new DRAM or do something like that.
Yeah. So there's more solutions than just fixing the chip.
There's other mitigations.
In fact, I think the solutions will have to span the entire stack.
There'll be some fixing the chip things,
but those fixing the chips have to sort of be programmed
or used by things higher up,
both by the CPU and by the software.
Well, that leads well into the next question.
There's a lot of people that need to be involved in this.
So tell us a little bit about who you're working with as partners
and what kinds of cooperative expertise do you need to solve for X
in the system security equation?
None of these works I've been describing is mine alone.
And in Microsoft Research, I've worked with a small team of very talented people
who have expertise that is very, very different than mine. You know, I come from a computer
science background, and I am not sort of equipped to short circuit memory or anything like that.
And in fact, until a couple of years ago, I really didn't understand how DRAM works very well.
So we have that sort of expertise in Microsoft Research to basically build hardware prototypes that actually can inject these sort of
failures into the hardware, into the memory. And then I also collaborate very strongly with a group
of wonderful engineers in Azure. There is a group called the Next Cloud System Architecture, or NCSA. And these folks have decades of expertise of understanding how DRAM works
and working with JDEK and working with the memory vendors.
And they've been very good in two ways.
One was in describing to us how memory works in ways that go beyond what the manual can teach you and sort of what the concerns are
and the limitations and the forces that act when you actually build these circuits in practice.
And the second way that they've been very helpful was that when we interact with JDAG and we try to
sort of make the shift, they've been very good at coaching us on how to put forward that proposal in a way that's more amenable.
Who are your other kind of big partnership associations?
Are you working with other academics? Are you working with other industry?
Are you working with other cloud providers?
We're lucky that we have very strong collaborations with two top academic places.
One is ETH Zurich and the other one is at Max Planck in Germany.
And we are working closely with companies that can be massively affected by RoHammer.
When security researchers actually go and sort of find a new way to actually attack the memory,
they go through a process that's called responsible disclosure. So what that means is
that they will not make their findings what that means is that they will
not make their findings publicly available, but instead they're going to reach out to all the
industry involved and describe their findings and give some time to the industry to form a response.
And once this period has ended, then the research becomes public. So when these new forms of attacks
came about, there was a group of companies formed
that started looking at these problems again. And the first thing that they had on their mind is
like, look, you know, so we have these research results, but can anybody go and independently
validate them on their hardware? So we at Project Stema were the first to actually validate these
findings on server-grade hardware, hardware that is run in the data
centers.
Right.
Well, it sounds like you all have the same goals.
You don't want things to break.
You want things to work well.
Ultimately, you want customers to have safe data and things not to break.
We do.
And in fact, I was mentioning how for RowHammer and for testing memory, understanding row
by row adjacency is very, very important.
And I also said how DRAM vendors do not want to reveal this information.
In fact, the extensibility mechanisms we proposed for people to build their own forms of RowHammer mitigations,
from the beginning, we designed them in a way where DRAM vendors do not have to tell
anyone these adjacency maps. In fact, there is a large swath of Rowhammer mitigations that people
have proposed over the past five or six years that all rest on the assumption that the software
company will have complete access to this information. And these companies are very reluctant. So rather
than mandating that or forcing them to do something they don't want to do, instead,
we designed this with the assumption that, hey, you guys don't have to tell us anything.
And we think that these have a much better likelihood of being adopted in practice. And
then again, not mandating the solution, letting people build their favorite solution for the kind of hardware they want to use.
You know, the Rowhammer solution that you build for the DRAM in a server running in Azure is very,
very different than the Rowhammer solution that you should build for an IoT device
that has a little bit of DRAM and runs a little bit of code. We've reached the what could possibly go wrong segment of the podcast where I ask all my
guests what keeps them up at night. So despite the fact that the majority of your work could
actually be classified as a research response to what keeps
us all up at night, sometimes the so-called solutions actually present new problems. So
do you have any concerns about the work you're doing? And if so, how are you addressing them?
My concern is, I was telling you how we have this hardware out there, whether it's DRAM,
whether it's CPUs, whether it's chipsets, whether it's GPUs, so on and so forth.
And this hardware is very, very complex.
And one of the things that we've learned is that in the quest of higher performance, we
have designed this hardware in ways that can be exploitable.
And these exploits are done in such a way that we never thought possible before.
And what keeps me up at night is, I don't really understand everything that's going on in a DRAM
device. What if there's another way out there that you can actually mount these attacks in very,
very simple ways? And unfortunately, with a cloud and with the consolidation of the entire computing power into data centers, I'm concerned that we might have an event that wipes out an entire cloud, that wipes out, you know, a big part of our infrastructure, that shuts down the entire internet.
And, you know, we're going to have exploits and little things here and there.
We've always had those.
We're going to continue to have them.
But we really never had a single wipeout event.
So having a wipeout event would be quite, quite concerning.
So how are you thinking about that as a researcher?
I mean, it's one of those giant problems that you think, I can't possibly solve this.
But are there any strategies that come into your head, given the kind of work you do,
that say, hey, this is how
we might try to mitigate such an event? It's a very hard problem. You're absolutely right.
My part comes when it's about DRAM, because I understand DRAM better than many people.
And my part is to make sure that there won't be a wipeout event happening because of DRAM. From the point
of view of DRAM, I hope that my work plays a role in that. That's a beautiful way to frame it,
Stefan, because as you point out, there's many fronts in this war against, you know, people that
are working to keep things safe and secure and people that are working to tear things down.
So you say, hey, at least on my watch,
the DRAM part is going to be good. That's right. I love that. Well, I don't want to let you go before we talk a little bit about industry standards and the tension between gatekeepers
and practitioners in a time when technical innovation is moving so fast that regulatory
bodies have a hard time keeping up. So what are the key challenges to organizations like JEDEC
in your
field? And how would you frame the role of these kinds of gatekeepers in the future?
So the industry is changing at a fantastic pace. And the role of JEDEC was actually to standardize
how memory is used. At least let's talk about DRAM. They actually standardized things other
than DRAM, but DRAM is a big part of it. And in the 80s and in the 90s, we needed that because we were building PCs and we had a
massive number of stakeholders of people who are actually building all sorts of hardware
components.
And you wanted these hardware components when you put them in a box to all work together.
Now, there is a little less of that.
There is more of data centers and there is no need for the computers that Google puts in
their data centers to make sure they work with the computers that Microsoft puts in their data
centers. All they have to do is to offer a software platform that is common enough that people can
actually use it. But because we see this consolidation, I believe there is less of a need
for standardization happening. Because if a cloud provider buys memory from three different vendors, all four parties can agree on how they build that hardware and use it in their data center.
So I believe we're going to see an increasing amount of fragmentation that way.
Well, so what does an organization like JDAG have to do to stay alive? I think JDAG has to shift the way they view themselves as a
specification of what the functionality of the hardware is to one that specifies mechanisms that
are flexible enough to allow increasing amounts of innovation from the different stakeholders in place.
Well, every researcher has a unique life story, and it's time to hear yours.
But I want to preface this by noting that you've gone to the trouble of including what I would call a scholarly genealogy on your personal website, so we can trace your academic ancestors
back to on one side of the family, the 17th century. So tell us your story, Stefan. Upon whose shoulders
do you stand, both personally and academically? And how did you get where you are today from where
you started back in your early years? You know, when I look at it, it feels very humbling. Every
once in a while, I go back and take a pause to reflect because you look at those names and you
go, oh gosh, those are some big
shoes to fill. I had the opportunity to work with two different advisors in my graduate school.
And therefore I have two genealogies. And like you said, one goes back to the 17th centuries
and include names like Carl Jacobi and John McCarty. And Jacobi was mostly known for math
and things like elliptic functions and number theory.
And John McCarthy is known for being one of the founders of AI.
John's student was Barbara Lyskov, who just won the Turing Award a couple of years ago.
And she's sort of a role model for me.
So, yeah, I feel very, very humbled.
And I look at that periodically to sort of remind myself as to where the bar is.
Right. So tell us a little bit about you then. You've come into this in the 20th century
and have started to make your own mark. How did you get from A to B?
I was born and grew up behind the Iron Curtain in Bucharest, Romania. And I was very lucky to
attend a high school that was very strong academically, especially in mathematics and computer science.
So in the late 80s and early 90s, I was sort of studying things like algorithms and graph theory and programming and languages and so forth.
And we also did a lot of math.
And in some sense, I was very lucky to have this training in math and computer science, but it came at the cost
that I am terrible at anything else. And when I finished high school, my family decided to
immigrate to Canada and they went to Calgary, Alberta. And when I got there, Canada had a
program where immigrants who lacked English skills, they would be enrolled in free English
as a second language kind of form of schooling. And I spent three months going to English school with my mom and dad.
And my mom and I were in the same class, actually. We were in level three. And my dad was in level
one. And I remember very clearly sort of meeting my dad in the hallways during lunch and having lunch together.
It's yeah, this was I was I was about 19.
Yeah, I was 19 at the time. A couple of years later, I went to college at the University of Waterloo, which is a great school in computer science in Canada.
And Waterloo has this wonderful co-op program where as part of graduating college, you have to actually go do internships in Canada. And Waterloo has this wonderful co-op program where as part of graduating college,
you have to actually go do internships in industry. And I was lucky to do three internships
with Microsoft back in the 90s. And I came to Seattle and I saw Seattle. And when time came
to go to graduate school, one of the places I applied to was University of Washington in Seattle. So I came
to UW. And once I graduated, I wanted to join academia. So I went and took a job as a professor
at the University of Toronto. And at Toronto, I worked with some fabulous students. And then after
a couple of years at Toronto, I kind of started missing the intensity of the West Coast and the
text in here. One of the things I didn't realize before leaving Seattle is that
West Coast is really one of the best places on earth to be a computer scientist because you meet
a lot of people who understand what you're doing and speak your language.
And I talked to some of my friends and they offered me to come interview at MSR. I knew
Seattle and I came and I never left. And I had a wonderful time.
What is one interesting thing we might not know about you? Maybe it's a personality trait,
a defining life moment, hobby, side quest that has impacted your life or career.
A lot of computer science researchers in the U.S. who are not born here,
they came here to do their PhDs. But I came much, much earlier. And I did not come with a plan to actually continue my education or pursue any advanced education.
So I have a lot of immigrant stories.
And I think a lot of those sort of have marked the way I think.
People have a hard time working with me because I would insist that we work as hard as possible.
And we never have any moment of relaxation or anything like that.
And I really, truly believe that that's actually very detrimental to a researcher.
A researcher has to be a little bit more balanced. And I remember very clearly Steve
Gribble, one of my advisors, telling me over and over again, Stefan, it's not just about working
harder. It's also about working smarter. And it took me a long time to understand what he meant. You have to let time allow you to
have the flow of creativity and see things in ways that maybe others haven't seen them before.
As we close, I'd like you to take a shot at painting a picture of a future world in which
you've been wildly successful. At the end of your career, what do you hope to have accomplished as a scientist
and how will your research have made a difference in our lives? Thank you for asking this question.
I actually thought about this and I'm thinking quite a bit about it. If you had asked me this
question 10 years ago, my answer would have been, I want to make sure that my research work is being used by millions of people.
And I was very fortunate to be able to accomplish that at MSR, not just once that we avoid a form of a wipeout event, at the very
least that exploits some form of DRAM. And if, you know, a decade or two from now, we manage to
say, hey, yeah, you know, we've had compromises here and there, but for the most part, the internet worked really well.
Cloud computing worked really well.
You know, AI worked really well.
Then I hope that at least a little part of that
was due to my work as well.
Stefan Soroyu, I, for one,
I'm glad you're doing the job you're doing.
Thank you for that.
And thank you for joining us today on the podcast.
Thank you.
And thank you for joining us today on the podcast. Thank you. And thank you for your insightful questions.
To learn more about Dr. Stefan Soroya and the ongoing fight against
Rohammer attacks, visit Microsoft.com slash research.