Microsoft Research Podcast - 119 - Defending DRAM for data safety and security in the cloud

Starting point is 00:00:00 So our philosophy is that what we would like to see in the standard, rather than describing the solution for Rowhammer, what we would like to see is describing extensibility mechanisms that companies, hardware vendors, can implement their favorite form of mitigations, the one that works best for their particular type of memory, by leveraging these extensions. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga.

Starting point is 00:00:43 Dynamic Random Access Memory, or DRAM, is the most popular form of volatile computer memory in the world, but it's particularly susceptible to RoHammer, an adversarial attack that can cause data loss and security exploits in everything from smartphones to the cloud. Today, Dr. Stephan Soroyo, a senior principal researcher in MSR's Mobility and Networking Group, explains why DRAM remains vulnerable to RoHammer attacks even after several years of mitigation efforts, and then tells us how a new approach involving bespoke extensibility mechanisms for DRAM might finally hammer RoHammer in the fight to keep data safe and secure. That and much more on this episode of the Microsoft Research Podcast.

Starting point is 00:01:35 Stefan Zeroyu, welcome to the podcast. Thank you, Gretchen. It's great to be here. Some of my favorite people on the planet are working on making things work for us, and you're one of those people. So first, thanks. As we begin, though, let's talk about your people for a minute. You're a senior principal researcher in the mobility and networking group, which isn't totally separate from systems and networking, but they're not totally the same either. So give us a verbal Venn diagram of the two groups, why they exist, where they're different, where they overlap, and how in broad strokes each of them is working to make our lives better. Yes, thank you for the kind words, Gretchen.

Starting point is 00:02:14 So back in the day, Microsoft Research had a single systems and networking group. And as the group got larger, the group split into several smaller groups like the systems group, the security group, the distributed systems group, and the mobility and networking group. But we're all systems researchers at the end of the day, whether we work on operating systems, on networks, on mobile systems, or on distributed systems. So I'm part of the mobility and networking group. But over my research career, my work has focused on systems, both in terms of mobile systems and networking systems. And for the past couple of years, these systems that I've been working on aim to improve the security of users and the security of infrastructure. Let's get specific and talk

Starting point is 00:02:56 about the work you do within the mobility and networking group now. So sort of in general, what big problems are you trying to solve as a researcher? And maybe more importantly, why does the world need you to solve them? What gets you up in the morning? So, I do two kinds of work. The first kind is creative work because I really value creativity very highly. And I believe it's very difficult to come up with a truly creative idea. The second kind of work that I do is driven by intellectual curiosity and by revisiting assumptions or turning them on their head. And I strongly believe that the role of an expert is to break preconceived assumptions and rules.

Starting point is 00:03:34 Unfortunately you have to be an expert first. In fact, trying to break assumptions before understanding deeply an area and a problem is a very bad idea. So I've been working on secure systems research for almost a decade now. We built a secure network tracing system that offers very strong privacy. So for example, network operators can monitor their networks in a way that all the sensitive data is locked down without anybody being able to subvert it or use it in any way other than originally intended. We built sensors that can attest their information is correct and has not been manipulated or changed.

Starting point is 00:04:09 So as a simple example, consider a photo where one can check whether the photo has been photoshopped or is indeed captured by a proper camera. Then I worked on a secure payment system called Zero Effort Payments that was a little like the precursor of Amazon Go Store. So our system was a little different in that you'd pick up the food and you'd go through a cashier who would ring you up, but you'd not have to do any explicit thing to actually pay. The system would know who you are.

Starting point is 00:04:35 And since you'd have to pre-register with the system and the payment would be processed. So I've worked on all these things. I've also worked on a firmware TPM, which brings trusted computing to mobile devices. And it works in millions of smartphones and tablets today. But for the past couple of years, I've worked on Azure security in particular. And we started a project called Project Stema. Stema stands for secure, trustworthy, and enhanced memory for Azure. And we've been focusing a lot on Rowhammer attacks.

Starting point is 00:05:05 Well, let's talk about memory and computer memory specifically, since it's the foundational storage unit for digital data. But there are many kinds of containers, as you well know. So let's do a quick primer for the flavor that we're really most interested in today, which is DRAM. So how does it work physically? What are its vulnerabilities, both internally and externally? And you don't need to get ridiculously granular here, because I saw your 114-page deck and 100 pages of it is explaining DRAM. No, I'm kidding. Don't be afraid to get as technical as you need to set the problem up. Okay. So DRAM is the world's most popular form of volatile memory. Pretty much

Starting point is 00:05:47 every form of computing out there has DRAM. You can find DRAM in smartphones, in tablets, in PCs. You can find DRAM in cars. You can find DRAM in washing machines. And a DRAM cell stores a zero or a one. And it does that by using a very simple circuit with one capacitor. And a capacitor can be charged or discharged, and that can mean a one or a zero. So for example, if you want to store the value 1010, you just sort of have four cells, and you have one charge capacitor, one discharge capacitor, one charge, and one discharge, and you encode 1010 that way. Now, capacitors leak over time. They sort of lose their charge over time. So DRAM has to continuously refresh these capacitors. And the cells are built to maintain their charge for a small period of time, say something like 64 milliseconds. And the contract is that the hardware has to make sure

Starting point is 00:06:47 that every single cell in its DRAM is refreshed once within 64 milliseconds. And in that way, the cell maintains its data, its charge. Now, DRAM cells are organized in rows and columns. And when you read a value from DRAM, you read by row. And the way you read this is by switching some transistors in such a way that the capacitors are then coupled with some sensors.

Starting point is 00:07:16 So the sensors sense whether these capacitors are charged or discharged, and then they can translate that into data. Now, unfortunately unfortunately what's happening is that when you actually sense the data and the capacitors it turns out that rows located in the vicinity in the adjacency of this row you're trying to read those capacitors also get affected and they get affected by having them discharge faster than normal.

Starting point is 00:07:46 And this phenomenon is called a DRAM disturbance error, because by causing them to discharge faster within a 64 millisecond period, you lose the content of that cell. And in some sense, the bit flips that way. And the bit that flips is one that you never actually meant to read or access before. Maybe you don't even have control over it. Maybe it's some other software component that controls it. So that's where sort of the concern lies. In the DRAM space, there is this Rowhammer attack.

Starting point is 00:08:15 And the contract from day one, when you build any system, any software system, anything you want, any computer, the contract is that if you give me a piece of memory, when I write something to it, I want to be able to read what I wrote. And with Rowhammer, you violate this very simple contract. You read a different value than the one you wrote. And doing that, you can basically exploit systems in ways that are unimaginable before. Well, since we're talking about Rowhammer right now, let's move into it. As you put it once, it's one of the hottest research topics in the security research community. So give us a level set. What is Rowhammer specifically? And why particularly does it make cloud providers and server farmers nervous?

Starting point is 00:08:59 So I described this DRAM disturbance errors effect. And this effect gets worse as DRAM gets denser and denser. And we want DRAM to get denser and denser because that's how we store more capacity. That's how we build better DRAM. But the phenomenon gets worse, this DRAM disturbance error. A row hammer attack, it's an attack in which an adversary generates a workload that exploits disturbance errors to flip the value of bits that have critical importance to the security of the system.

Starting point is 00:09:34 Like, for example, bits that form a secret key. And cloud vendors are very nervous because the entire business model is one where you have multiple parties share your hardware. In particular, in this case, they share your memory, they share your DRAM. Well, what if one of these customers becomes rogue? They themselves get exploited through some other attack. Can they attack other customers by flipping bits in their memory? And yes, they can attack it in very devastating ways.

Starting point is 00:10:05 How did Rowhammer get its name? Is it because of the rows in the DRAM? I was explaining how when you access a row, an adjacent row gets affected by that. And the attack, in order to create this disturbance error, what you have to do is have to keep accessing that row over and over and over again. And that's the term hammering the row. And the

Starting point is 00:10:27 attack got this name, row hammer. So if I'm an attacker, am I trying to do something specific or am I just trying to mess you up? Oh, that's a great question. Depends, right? So as a cloud provider, the cloud providers are nervous about both scenarios. A row hammer attack in general refers to flipping a security critical bit. So by flipping that bit, I'm trying to target something specifically. I'm trying to exploit something. However, the simpler and in fact, the likelier form of attack is one where I'm just messing up the bits. And systems actually today in the cloud,

Starting point is 00:11:05 they're pretty good at detecting when these bits are messed up. But if the bits are messed up, there's just very little you can do about that. I see these bits and I've encoded enough redundancy in the data to know that they're messed up, but I can't recover to where I was before. And there's really not a good way to solve that problem. Once the bits have flipped, it's like, I can't go back.

Starting point is 00:11:27 And the best thing I can hope for is maybe you reboot the server and let's start all over again. And that's also very, very bad for a cloud customer because there are a lot of workloads in the cloud that have a lot of data in memory. They do a lot of computation. Maybe they train a machine learning model. And then, you know, for many, many days and at some point you say, well, sorry, guys, you have to start all over again because we've messed it up. Before we dive into the technical aspects of your research on the Rowhammer threat, I find this whole drama really fascinating, and I think it would be good to set the stage

Starting point is 00:12:10 and the cast of characters. We've talked about cloud providers. Who are the other players? Who provides to the providers, and what's their motivation? Who sets and guards the standards? And finally, who's got an eye on everybody? It's a fascinating landscape. Microsoft is a cloud provider, and I started with cloud providers because part of our role at Microsoft Research is also to make the cloud better. There are several other players, and one big players are the companies that sell DRAM. And when the attack was first described or published, which was in 2014, the hardware vendors jumped to quickly dismiss these concerns. We knew about Rowhammer, but that's a problem that the older type of memory has.

Starting point is 00:12:52 The newer type of memory, it doesn't have this problem anymore. And in fact, there are quotes online where vendors claim that DDR4, which is the memory that we all use today, is Rowhammer-free. And of course, researchers have shown over and over again that DDR4 is not row hammer free. And then they said, oh, yes, but then you should buy this newer DDR4 that has a form of defense called TRR. And just this year earlier, there was a wonderful paper from an academic group in the Netherlands that showed a huge number of DDR4 DRAM with this form of defense TRR

Starting point is 00:13:26 being vulnerable by slightly changing the form of the attack. So basically what the vendors have done is they've patched the old way of mounting the attack, but they haven't told anyone how they patched it, and you just have to sort of try different things until one of the new things clicks, and then you can bypass those defenses. So now we have the cloud providers, the DRAM vendors, and then the security research community. And there is this feeding loop where the DRAM vendors say, oh, yeah, we knew about it. The new memory is safe. Give it a year or two. The security research community says, oh, it's not

Starting point is 00:13:58 safe. And sort of the cloud providers and the smartphone manufacturers as well are caught in the middle. So Rowhammer is a problem. It's a big one. And it hasn't been ignored, but hasn't been solved either. So when we talked before, you said that more than 40 papers have been published on this subject, and DRAM still remains as vulnerable as ever. So what has the academic community done to date to try to solve the Rowhammer problem? And what to date have they got right and got wrong? Right. So there are sort of two bodies of work in academia. One is the security research community. And then there is the computer architecture community. And to give them credit, actually, the computer architecture community were the first ones to show this problem to sort of raise the flag saying, hey, we have a DRAM disturbance error.

Starting point is 00:14:45 And the architecture community has been very good at putting forward a whole bunch of Rowhammer mitigation proposals. However, all these proposals, they come with trade-offs. And implementing one of these mitigations inside of DRAM will make that DRAM ultimately more expensive in some way. Maybe it will decrease the density. Maybe the DRAM vendors make that DRAM ultimately more expensive in some way. Maybe it will decrease the density. Maybe the DRAM vendors will have to add extra memory or extra sort of counters to keep track of who's accessing what. And the market forces in the DRAM world are in such a way that they need to use every

Starting point is 00:15:21 single piece of real estate they have to just cram more and more cells. And that's where the security research community comes in, where they keep sort of reverse engineering and trying different things. And they found ways to go around those mitigations and show new forms of attack. So to be fair to them, it's sort of also a business thing. Like I said, people knew about Rowhammer and there were discussions in their sort of standardization body. That's an organization called JDEK, where a lot of hardware vendors and software companies actually participate. There's been a lot of discussion over

Starting point is 00:15:53 the years on implementing a solution. And in fact, that's what they're doing now. There won't be a single solution for Rowhammer that will work for every single type of memory out there and for which every single hardware vendor will be willing to actually implement. So our philosophy is that what we would like to see in the standard rather than describing the solution for Rowhammer, what we would like to see is describing

Starting point is 00:16:19 extensibility mechanisms that companies, hardware vendors, can implement their favorite form of mitigations, the one that works best for their particular type of memory, by leveraging these extensions. So that's what we're trying to sort of change and shift. In light of all that stuff, and tell us about your most recent work that involves what you called an end-to-end methodology to help cloud providers determine if they're susceptible to Rowhammer, because that's that upstream approach that you're talking about instead of the patch afterwards that's impossible. So in the context of our cast of characters and against the backdrop of computer memory solutions that have trust issues, tell us

Starting point is 00:17:02 how you are attacking this, what your methodology is, how successful it is, and what are the key challenges that you face? So what we try to do is we try to help the software company by building a systematic and scalable testing methodology to test whether your DRAM is susceptible to a raw hammer attack. And to build such a methodology, you have to overcome two practical challenges. You have to devise a sequence of instructions that your processor executes that hammers the memory at the fastest possible rate. You want to create what we call

Starting point is 00:17:40 the worst-case testing conditions for memory. The second thing you want to do, you want to do, you want to know where you're hammering. Remember I was telling you how DRAM disturbance occurs to rows that are adjacent to the row you're hammering. Well, the rows that are adjacent are the worst affected, but even rows that are nearby are affected. So a row that's sort of two rows away, like the next neighbor or something like that can be affected. But it's very difficult to affect a row that's very far somewhere inside of your array.

Starting point is 00:18:09 So you have to actually know what is the row-by-row layout of your DRAM chip. And this is, in fact, the trade secret. What we did was we built a hardware fault injector that allows us to, you can think of it like short-circuiting the memory in such a way that we can actually always create these row hammer attacks by not letting the memory refresh itself. So if you hammer a row and the memory never refreshes, you're going to flip bits eventually

Starting point is 00:18:38 because the capacitors will loop their charge. And then you go and study the patterns of how these bits have flipped. And that tells you about the layout of the cells inside of the DRAM. Because guess what? The row you hammered, most of the bits that flipped are going to be in its adjacent rows. And then there will be some bits flipped next to the adjacent rows. And then fewer bits and so on. So you create these kind of heat maps.

Starting point is 00:19:05 So you can reverse really row-by-row adjacency by this form of short-circuiting the memory, sort of suppressing refresh commands. Okay, so you reverse engineering to find out what's going on. Yes, we have a methodology. Our methodology can reverse engineer every single DDR4 DIMM in the world. And what you actually end up discovering when you reverse engineering is that these maps change. They change from one vendor to another, and they can also change from one DIMM to another, depending on the DIMM's revision. It's called post-packaging repair. So we can actually also measure how many fixes that DRAM has had before it was shipped to you. Okay. So this methodology has to attack, for lack of a better word, every vendor's particular proprietary chip. And within the vendors, there's different chips as well.

Starting point is 00:19:51 So you've got a lot of things you have to be looking at. How's it working so far? Now, that's a great question. It's very difficult to test every single chip that a cloud provider has. So instead, what we're doing is we are mapping the DRAM fabrication process for different DRAM devices and for different vendors. And then within those buckets we sample and we test. And we actually, what we do, we look at the trends. And we want to make sure that the workloads that we see in the cloud will not generate activations that will actually start flipping bits.

Starting point is 00:20:29 Because I was telling you, the DRAM gets worse over time, not better. So, in fact, we can even, by waving our hands a little bit, we can predict how many years in the future it's going to be until we're going to see workloads reach a point where just by using the memory, they're going to start flipping bits. And it's our job to influence sort of a more principled approach to fixing the problem rather than a band-aid approach. And also to keep an eye, not just for Microsoft, but the entire cloud industry as to at which point we'll have to do something to make sure that the workloads are not going to actually start causing bit flips. So, for example, one of the things you might want to do when you actually detect that a virtual machine starts accessing the memory in a way that might actually induce bit flips, you could try to slow it down or migrate it to a new DRAM or do something like that.

Starting point is 00:21:21 Yeah. So there's more solutions than just fixing the chip. There's other mitigations. In fact, I think the solutions will have to span the entire stack. There'll be some fixing the chip things, but those fixing the chips have to sort of be programmed or used by things higher up, both by the CPU and by the software. Well, that leads well into the next question.

Starting point is 00:21:46 There's a lot of people that need to be involved in this. So tell us a little bit about who you're working with as partners and what kinds of cooperative expertise do you need to solve for X in the system security equation? None of these works I've been describing is mine alone. And in Microsoft Research, I've worked with a small team of very talented people who have expertise that is very, very different than mine. You know, I come from a computer science background, and I am not sort of equipped to short circuit memory or anything like that.

Starting point is 00:22:18 And in fact, until a couple of years ago, I really didn't understand how DRAM works very well. So we have that sort of expertise in Microsoft Research to basically build hardware prototypes that actually can inject these sort of failures into the hardware, into the memory. And then I also collaborate very strongly with a group of wonderful engineers in Azure. There is a group called the Next Cloud System Architecture, or NCSA. And these folks have decades of expertise of understanding how DRAM works and working with JDEK and working with the memory vendors. And they've been very good in two ways. One was in describing to us how memory works in ways that go beyond what the manual can teach you and sort of what the concerns are and the limitations and the forces that act when you actually build these circuits in practice.

Starting point is 00:23:11 And the second way that they've been very helpful was that when we interact with JDAG and we try to sort of make the shift, they've been very good at coaching us on how to put forward that proposal in a way that's more amenable. Who are your other kind of big partnership associations? Are you working with other academics? Are you working with other industry? Are you working with other cloud providers? We're lucky that we have very strong collaborations with two top academic places. One is ETH Zurich and the other one is at Max Planck in Germany. And we are working closely with companies that can be massively affected by RoHammer.

Starting point is 00:23:54 When security researchers actually go and sort of find a new way to actually attack the memory, they go through a process that's called responsible disclosure. So what that means is that they will not make their findings what that means is that they will not make their findings publicly available, but instead they're going to reach out to all the industry involved and describe their findings and give some time to the industry to form a response. And once this period has ended, then the research becomes public. So when these new forms of attacks came about, there was a group of companies formed that started looking at these problems again. And the first thing that they had on their mind is

Starting point is 00:24:29 like, look, you know, so we have these research results, but can anybody go and independently validate them on their hardware? So we at Project Stema were the first to actually validate these findings on server-grade hardware, hardware that is run in the data centers. Right. Well, it sounds like you all have the same goals. You don't want things to break. You want things to work well.

Starting point is 00:24:54 Ultimately, you want customers to have safe data and things not to break. We do. And in fact, I was mentioning how for RowHammer and for testing memory, understanding row by row adjacency is very, very important. And I also said how DRAM vendors do not want to reveal this information. In fact, the extensibility mechanisms we proposed for people to build their own forms of RowHammer mitigations, from the beginning, we designed them in a way where DRAM vendors do not have to tell anyone these adjacency maps. In fact, there is a large swath of Rowhammer mitigations that people

Starting point is 00:25:33 have proposed over the past five or six years that all rest on the assumption that the software company will have complete access to this information. And these companies are very reluctant. So rather than mandating that or forcing them to do something they don't want to do, instead, we designed this with the assumption that, hey, you guys don't have to tell us anything. And we think that these have a much better likelihood of being adopted in practice. And then again, not mandating the solution, letting people build their favorite solution for the kind of hardware they want to use. You know, the Rowhammer solution that you build for the DRAM in a server running in Azure is very, very different than the Rowhammer solution that you should build for an IoT device

Starting point is 00:26:16 that has a little bit of DRAM and runs a little bit of code. We've reached the what could possibly go wrong segment of the podcast where I ask all my guests what keeps them up at night. So despite the fact that the majority of your work could actually be classified as a research response to what keeps us all up at night, sometimes the so-called solutions actually present new problems. So do you have any concerns about the work you're doing? And if so, how are you addressing them? My concern is, I was telling you how we have this hardware out there, whether it's DRAM, whether it's CPUs, whether it's chipsets, whether it's GPUs, so on and so forth. And this hardware is very, very complex.

Starting point is 00:27:08 And one of the things that we've learned is that in the quest of higher performance, we have designed this hardware in ways that can be exploitable. And these exploits are done in such a way that we never thought possible before. And what keeps me up at night is, I don't really understand everything that's going on in a DRAM device. What if there's another way out there that you can actually mount these attacks in very, very simple ways? And unfortunately, with a cloud and with the consolidation of the entire computing power into data centers, I'm concerned that we might have an event that wipes out an entire cloud, that wipes out, you know, a big part of our infrastructure, that shuts down the entire internet. And, you know, we're going to have exploits and little things here and there. We've always had those.

Starting point is 00:28:04 We're going to continue to have them. But we really never had a single wipeout event. So having a wipeout event would be quite, quite concerning. So how are you thinking about that as a researcher? I mean, it's one of those giant problems that you think, I can't possibly solve this. But are there any strategies that come into your head, given the kind of work you do, that say, hey, this is how we might try to mitigate such an event? It's a very hard problem. You're absolutely right.

Starting point is 00:28:31 My part comes when it's about DRAM, because I understand DRAM better than many people. And my part is to make sure that there won't be a wipeout event happening because of DRAM. From the point of view of DRAM, I hope that my work plays a role in that. That's a beautiful way to frame it, Stefan, because as you point out, there's many fronts in this war against, you know, people that are working to keep things safe and secure and people that are working to tear things down. So you say, hey, at least on my watch, the DRAM part is going to be good. That's right. I love that. Well, I don't want to let you go before we talk a little bit about industry standards and the tension between gatekeepers and practitioners in a time when technical innovation is moving so fast that regulatory

Starting point is 00:29:19 bodies have a hard time keeping up. So what are the key challenges to organizations like JEDEC in your field? And how would you frame the role of these kinds of gatekeepers in the future? So the industry is changing at a fantastic pace. And the role of JEDEC was actually to standardize how memory is used. At least let's talk about DRAM. They actually standardized things other than DRAM, but DRAM is a big part of it. And in the 80s and in the 90s, we needed that because we were building PCs and we had a massive number of stakeholders of people who are actually building all sorts of hardware components.

Starting point is 00:29:53 And you wanted these hardware components when you put them in a box to all work together. Now, there is a little less of that. There is more of data centers and there is no need for the computers that Google puts in their data centers to make sure they work with the computers that Microsoft puts in their data centers. All they have to do is to offer a software platform that is common enough that people can actually use it. But because we see this consolidation, I believe there is less of a need for standardization happening. Because if a cloud provider buys memory from three different vendors, all four parties can agree on how they build that hardware and use it in their data center. So I believe we're going to see an increasing amount of fragmentation that way.

Starting point is 00:30:37 Well, so what does an organization like JDAG have to do to stay alive? I think JDAG has to shift the way they view themselves as a specification of what the functionality of the hardware is to one that specifies mechanisms that are flexible enough to allow increasing amounts of innovation from the different stakeholders in place. Well, every researcher has a unique life story, and it's time to hear yours. But I want to preface this by noting that you've gone to the trouble of including what I would call a scholarly genealogy on your personal website, so we can trace your academic ancestors back to on one side of the family, the 17th century. So tell us your story, Stefan. Upon whose shoulders do you stand, both personally and academically? And how did you get where you are today from where you started back in your early years? You know, when I look at it, it feels very humbling. Every

Starting point is 00:31:38 once in a while, I go back and take a pause to reflect because you look at those names and you go, oh gosh, those are some big shoes to fill. I had the opportunity to work with two different advisors in my graduate school. And therefore I have two genealogies. And like you said, one goes back to the 17th centuries and include names like Carl Jacobi and John McCarty. And Jacobi was mostly known for math and things like elliptic functions and number theory. And John McCarthy is known for being one of the founders of AI. John's student was Barbara Lyskov, who just won the Turing Award a couple of years ago.

Starting point is 00:32:16 And she's sort of a role model for me. So, yeah, I feel very, very humbled. And I look at that periodically to sort of remind myself as to where the bar is. Right. So tell us a little bit about you then. You've come into this in the 20th century and have started to make your own mark. How did you get from A to B? I was born and grew up behind the Iron Curtain in Bucharest, Romania. And I was very lucky to attend a high school that was very strong academically, especially in mathematics and computer science. So in the late 80s and early 90s, I was sort of studying things like algorithms and graph theory and programming and languages and so forth.

Starting point is 00:32:57 And we also did a lot of math. And in some sense, I was very lucky to have this training in math and computer science, but it came at the cost that I am terrible at anything else. And when I finished high school, my family decided to immigrate to Canada and they went to Calgary, Alberta. And when I got there, Canada had a program where immigrants who lacked English skills, they would be enrolled in free English as a second language kind of form of schooling. And I spent three months going to English school with my mom and dad. And my mom and I were in the same class, actually. We were in level three. And my dad was in level one. And I remember very clearly sort of meeting my dad in the hallways during lunch and having lunch together.

Starting point is 00:33:47 It's yeah, this was I was I was about 19. Yeah, I was 19 at the time. A couple of years later, I went to college at the University of Waterloo, which is a great school in computer science in Canada. And Waterloo has this wonderful co-op program where as part of graduating college, you have to actually go do internships in Canada. And Waterloo has this wonderful co-op program where as part of graduating college, you have to actually go do internships in industry. And I was lucky to do three internships with Microsoft back in the 90s. And I came to Seattle and I saw Seattle. And when time came to go to graduate school, one of the places I applied to was University of Washington in Seattle. So I came to UW. And once I graduated, I wanted to join academia. So I went and took a job as a professor at the University of Toronto. And at Toronto, I worked with some fabulous students. And then after

Starting point is 00:34:36 a couple of years at Toronto, I kind of started missing the intensity of the West Coast and the text in here. One of the things I didn't realize before leaving Seattle is that West Coast is really one of the best places on earth to be a computer scientist because you meet a lot of people who understand what you're doing and speak your language. And I talked to some of my friends and they offered me to come interview at MSR. I knew Seattle and I came and I never left. And I had a wonderful time. What is one interesting thing we might not know about you? Maybe it's a personality trait, a defining life moment, hobby, side quest that has impacted your life or career.

Starting point is 00:35:19 A lot of computer science researchers in the U.S. who are not born here, they came here to do their PhDs. But I came much, much earlier. And I did not come with a plan to actually continue my education or pursue any advanced education. So I have a lot of immigrant stories. And I think a lot of those sort of have marked the way I think. People have a hard time working with me because I would insist that we work as hard as possible. And we never have any moment of relaxation or anything like that. And I really, truly believe that that's actually very detrimental to a researcher. A researcher has to be a little bit more balanced. And I remember very clearly Steve

Starting point is 00:35:53 Gribble, one of my advisors, telling me over and over again, Stefan, it's not just about working harder. It's also about working smarter. And it took me a long time to understand what he meant. You have to let time allow you to have the flow of creativity and see things in ways that maybe others haven't seen them before. As we close, I'd like you to take a shot at painting a picture of a future world in which you've been wildly successful. At the end of your career, what do you hope to have accomplished as a scientist and how will your research have made a difference in our lives? Thank you for asking this question. I actually thought about this and I'm thinking quite a bit about it. If you had asked me this question 10 years ago, my answer would have been, I want to make sure that my research work is being used by millions of people.

Starting point is 00:36:46 And I was very fortunate to be able to accomplish that at MSR, not just once that we avoid a form of a wipeout event, at the very least that exploits some form of DRAM. And if, you know, a decade or two from now, we manage to say, hey, yeah, you know, we've had compromises here and there, but for the most part, the internet worked really well. Cloud computing worked really well. You know, AI worked really well. Then I hope that at least a little part of that was due to my work as well. Stefan Soroyu, I, for one,

Starting point is 00:37:38 I'm glad you're doing the job you're doing. Thank you for that. And thank you for joining us today on the podcast. Thank you. And thank you for joining us today on the podcast. Thank you. And thank you for your insightful questions. To learn more about Dr. Stefan Soroya and the ongoing fight against Rohammer attacks, visit Microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 119 - Defending DRAM for data safety and security in the cloud

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.