Microsoft Research Podcast - 107 - Democratizing data, thinking backwards and setting North Star goals with Dr. Donald Kossmann

Starting point is 00:00:00 We have been programming devices, we've been programming mainframes, we've been programming PCs, we've been programming the web and so on. I think we need to go to the extreme craziness and think that the world is one big computer. I think this is the big North Star goal that we have. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga. Dr. Donald Kossman is a distinguished scientist who thinks big. And as the director of Microsoft Research's flagship lab in Redmond, it's his job to inspire others to think big too. But don't be fooled. For him, thinking big involves what he calls thinking backwards, a framework of imagining the future, defining progress in

Starting point is 00:00:57 reverse order, and executing against landmarks along an uncertain path. On today's podcast, Dr. Kosman reflect on his life as a database researcher and tells us how Socrates and innovative database as a service architecture is re-envisioning traditional database design. He also reveals the five superpowers of Microsoft Research and how we

Starting point is 00:01:19 can improve science with marketing. That and much more on this episode of the Microsoft Research Podcast. Donald Kossman, welcome to the podcast. Thanks. Thanks for having me. I like to start by situating my guests. It's such a researchy term. And you are very impressively situated here.

Starting point is 00:01:48 So as a distinguished scientist and the director of Microsoft Research's Redmond Lab, what do you hope to accomplish here? What gets you up in the morning? So what gets me up in the morning are the people. I'm working with an incredible group of people, researchers, engineers, designers, testers, program managers, biz operations people. They are all amazing, and it's an incredible privilege to be given the opportunity to be their advocate. On the research front, what gets me up is democratizing technology. I think banks have democratized money, right? And they've made it for everybody possible to have money, to grow money.

Starting point is 00:02:31 Cars have made it possible for everybody to move around in the world, right? That is democratizing mobility. So databases, which is my background, has democratized data, which has made it possible for everybody to get the best value out of their data. If I want to get value out of my data, I need to get the tools to get the value. If we kind of go back to the metaphor of a bank, right? How do I get value out of my money from a bank? It's also that I combine it with other people and the bank pools it and makes out of the mass something better and bigger

Starting point is 00:03:07 and then lets me participate in that. So my data, my genome data alone is not very useful, but of a large population of people pulling this together, correlating it with things that happen, that is actually very valuable. And I think what we need to still do, and that's where democratization things that happen, that is actually very valuable. And I think what we need to still do, and that's where democratization needs to happen, is that I, as an owner of my data, need to control how it is used and how I get the value back. And at the moment, we have just way too few offerings for that. Yeah. How does the cloud change that?

Starting point is 00:03:42 Well, the cloud at the beginning is just like a bank. It's like a vault where you put your data and it's also kind of the opportunity to do something with the data. So it is a platform that then allows everybody at some point to kind of realize their visions and their dreams on what to do with data and how to create value with that data. Prior to MSR, you had a lengthy and notable career in academia. I'm going to ask you more specifically about your life and your path to MSR later. But I think it's worth talking briefly about your time as a professor at ETH Zurich, where you were a database person in the computer science department.

Starting point is 00:04:25 Tell us a little bit about the history of database systems and what the landscape looks like now in the era of cloud computing. One of the things I always say jokingly about databases is that databases are boring and hard, and that's why they make so much money, because nobody wants to do the boring stuff and nobody can do the hard stuff, so it's kind of a good combination.

Starting point is 00:04:45 But essentially, database is a fairly old technology. But it has always been about three things. One thing is value. How do you get the best out of your data? Which is what are the features that you provide, the power of querying the data, of updating it, of correlating it and doing things with the data. The second thing has been security. How do you make sure that the data stays under your control, that you own it and determine what happens with the data?

Starting point is 00:05:14 And the third is, I would call it cost or performance, is making sure that you don't overpay for the data, right? That it's kind of cheap or kind of gets more and more affordable to do what you want to do with your data and control it. All right. So what did you do as a database professor? So one of the waves I was very involved in was the so-called semi-structured data wave. The best way to process data is if it's really structured and you know exactly what it is, right? And you have a schema essentially. And I spent a lot of time working on semi-structured

Starting point is 00:05:51 data, which has some structure that you kind of extract. And that is kind of like getting good value out of all data, not just your structured data, like your bank accounts, but also your email, the books you write, the Word documents you write, getting some value out of that. So that was a big phase of mine. Another big phase of mine was distributed databases and how to optimize them and how to make them perform in a very scalable way. All right. So that's kind of three waves you're riding.

Starting point is 00:06:19 Is there anything that you see out in the ocean right now that's a wave coming in that the database people might be facing new challenges that research could address? I think it's still about value, security and cost and will always be in the database world. But I think what we've seen of the generations or the eras of computing is this pendulum. When we started with a mainframe computer, which is kind of very centralized. Then we got into the PC era, which is kind of decentralized,

Starting point is 00:06:51 where you push to the customer. Then we went to the web, which is again centralized. We went back to the mobile phone and smartphone, which is decentralized. Then we went to the cloud, which is again logically centralized. And now we're hitting back again in this pendulum to what we now call the edge. And I think we haven't in databases even started to think about the edge because the edge for us is kind of like 9 billion new machines.

Starting point is 00:07:18 And nobody has thought about deploying databases on 9 billion machines. We're now at 100,000 or 10,000 of machines in the cloud, but 9 billion is yet a totally different thing. How are you even thinking about that? Well, of course, again, my mental framework is pretty much always the same on the technology side, value, security, and cost. But the best way to think about it is if you believe in this as being the new machine or so, what is the killer application? What do you want to do with this data at the edge? And what are the constraints? So one way to do research is to look at what is happening today and think of one assumption that is going to go away, right? Or that is added, but usually it's an assumption that goes away. Of course, right? Or that is added, but usually it's an

Starting point is 00:08:05 assumption that goes away. Of course, it has to be driven by some application. So now, if the assumption goes away, it's centrally managed. What can you do with this data now, if you have such a system? And then that kind of inspires you to think about how to build such a system. Before you put on your visionary leader hat, I know you wear a lot of hats around here, Donald. I want you to tell us about some of your own research. Let's start with Cypherbase, which is a SQL database system that stores and processes strongly encrypted data. Tell us about Cypherbase and how a database professor got involved in cybersecurity and cryptography. When I was at ETH in the late 2000s, I was working with several companies. Among others, I was working with the Swiss banks. And so there was actually a very big scandal in Switzerland, and that is that the German government paid one of the Swiss database administrators to produce a CD of all German customers that had bank accounts in that bank. And of course, the assumption was

Starting point is 00:09:06 those were all tax evaders, and most of them were. So the problem with that was that in Switzerland, this is illegal. But in Germany, it's actually totally legal. So what happened is that the Swiss Bank came to me, because it was a database administrator, and I was a database person, they came to me me and they said, OK, Donald, how can we prevent that that never happens again? And that kind of created my interest into encrypted databases or protecting data from the administrators, but still letting the administrator do their job.

Starting point is 00:09:39 They do a lot of important things with the database, but they don't have to really look at the data, right? The business needs to look at the data, but not the database administrator. And so we developed a bunch of technology, and we worked on that for two or three years. And at some point, a distinguished engineer from Microsoft visited ETH, and he came to me, and we talked about what I work on. And I told him that story, and he said, oh, actually, we're at Microsoft very interested in that problem. And so I visited Microsoft and MSR and I learned about their solution. And it was actually much, much better than what I had thought out over three years at ETH. So I said, oh, this is great. I want to work

Starting point is 00:10:20 with you. And that was the birth of the Cypher-based project. And that's what then later on became the always encrypted feature of SQL. Traditional database architecture has some significant limitations when it hits the cloud. And one of the most exciting projects that you're even currently involved with is an answer to that. It's called Socrates. expectations versus reality with the move towards database as a service paradigms in the cloud, and how this new architecture compares with the older, what you call monolithic database architecture? I think this question is best answered if I give an analogy, and that is retail. There's the brick and mortar retail, and then there's the online retail. And both are important, just like both database architectures would be important, but they were designed with different assumptions and different goals in mind.

Starting point is 00:11:34 So the brick-and-mortar, we want to kind of minimize the movement of goods. So you go there, you try your new fancy suit on, it fits, you go home, you have almost zero returns, because logistics are expensive. It is also about a kind of very specific experience that people want to have. It is all together. So this is what traditional databases do. They are designed for a particular experience and having a particular assumption. So moving data around is expensive. And the experience is when I do a query to a database, I want to immediately get the answer and I want it to be fast. Now let's go to online retail.

Starting point is 00:12:19 Online retail has this big logistics problem, but it has some other features, right? Essentially, you have virtually all products in one hand at one fingertip. And if you think about why online retail is so successful, it's because it is cheap, right? That's what got people hooked up, this low cost. And why is it so cheap? Because it never wastes any resources. If you look at a shop, there are people working there that sometimes there are no customers and they are just wasting resources. If you think about an online retailer, there are no wasted resources. All the workers are

Starting point is 00:12:57 constantly working and active. There's nobody standing around. And the same happens in the cloud and that is essentially the Socrates architecture. It is really designed for not wasting any resources. And that's our kind of goal in the cloud to drive down cost. And that's why we separate the resources. And you just use resources and put them together as you need them. All right. So I want you to tell me a little bit more about Socrates technically and how you have achieved this reduction in cost and increase in efficiency with the architecture that Socrates

Starting point is 00:13:35 presents. Yeah. So essentially what it is all about is separating concerns or disaggregating. So traditional databases are monoliths. All functionality is kind of intertwined and mingled together, but very highly optimized to have that experience, just like a shop. What Socrates does is it essentially separates compute, storage, and the log. Essentially, it separates concerns to make sure that we can optimize and can utilize these concerns in the best possible way. When we talk about disaggregation, we typically talk about disaggregation of computing resources.

Starting point is 00:14:15 And when we talk about the architecture that does it, we talk about decomposing it into mini services. So there's a mini service that runs queries. There's a mini service that logs all the updates that happen. And then there's a mini-service that serves the data. In the retail environment, there's a mini-service that gives you the catalog and presents the goods to you. There's a mini-service that is the warehouse that ships the products to you. And then there's a mini-service that does the payment. And it's kind of like the analogy here. All right. So where is this in the pipeline? Because we've got huge legacy systems and now you've got this new idea that's optimized for the cloud. In some sense, the good news is that we don't have to change the API. Another analogy is if you buy an electric vehicle, you don't have to relearn how to drive. You've changed the API. Another analogy is if you buy an electric vehicle, you don't have to relearn how to drive, right? You've changed the engine and you've done something really big underneath. And that's one of the big achievements of the engineering effort of Socrates is that we didn't change the

Starting point is 00:15:17 API. So it's all under the hood. Failure has many faces and not all of them are ugly. You've had some experience with failure. Sometimes you've even called it miserable failure. So tell us about some work that you've done that didn't work out and what lessons you learned while you were at it. My favorite story is that of a failed startup. So I've also done startups. That's kind of for me is part of the academic experience that you do startups. And I had a grandiose failure of a startup. I'm not laughing at you. Amazon kind of started with the cloud, and I started a company that built a semi-structured database for the cloud. And it was a semi-structured database because I had been working on semi-structured databases, and it was the cloud because I thought the cloud was really cool and was going to be a

Starting point is 00:16:16 game changer, and I wanted to be the first there. The problem was that these were two big bets. It was a big bet on the cloud, and it was a big bet on semi-structured data. And if I had just made the cloud bet, I would have been great. I mean, the ideas of separating compute and storage, that's exactly what I did at that time. And the semi-structured bet, I think it's still going to pan out. It's still really important. It just didn't pan out at the same time. And with a startup, if you make two bets, they need to all pan out at the same time. And that's just not going to happen, right? So bet on one miracle rather than two or three. Finding the one miracle, that is the art of doing a startup, but also doing a research project. Let's talk about superpowers. You wrote a blog post, which I loved, where you compared and contrasted superpowers of academia, of product groups, of startups, and Microsoft Research. So give us a superpower breakdown of these various institutions and entities and where you land personally on what we might call the value proposition of Microsoft Research. Essentially, what I believe the five superpowers that the company gave us. This is really when Bill Gates kind of founded Microsoft Research. This is freedom.

Starting point is 00:17:37 We can freely collaborate with everybody in the company. We are not tied to any organizational structure. The second one is we have time. We don't have product deadlines, shipping deadlines. And so we have time to really think things through. The third one is we take risks. We can fail fast. We don't have legacy. If we find that an idea is stupid, we just kill it and we just stop working on it, right? This is different for product groups. Creativity is a big part of our culture. We generate ideas constantly. This is kind of part of our job. And the fifth one is we build stuff, we execute. And of course,

Starting point is 00:18:17 we do that with the product groups. And so coming to your question, I think every kind of organization has a different mix of these. I mean, academia is creative. Our product groups are creative. Startups have some of these. But this combination is unique. And so if we want to innovate, which is kind of our mission and what we want to achieve, that's how we create value to the company. We have to use these five superpowers.

Starting point is 00:18:44 We were talking about some projects like Always Encrypted or Cypherbase. That's exactly something that academia cannot do, because academia doesn't have the execution part. They just don't have the resources to do it. A startup cannot do it either, because these projects take time, and the time to do this, a startup just doesn't have. And so that's what we're looking for. And it's actually amazing in this time how many projects really need exactly this combination of superpowers. There's been a longstanding debate between what I might call pure research purists and another group that I would call team tech transfer, who are entrepreneurs. And the argument stems around

Starting point is 00:19:26 the purpose of research and how you measure it. And one side is always yelling science and the other is always yelling impact. But you've actually argued that the argument is becoming moot. Why? Well, because it's both, right? And so I would have to kind of drill down a little bit what I think a good research project does. And it has essentially three components. It has scientific insight, right? Some ideas, some secret sauce. The second piece, it has execution.

Starting point is 00:19:56 It executes on something. It creates something. And the third one is, I call it marketing. But what it really means is having clarity on the impact. And the interesting thing is that the execution and marketing make the science better. I cannot explain it, but it's happening right now. When we do science and we execute, that actually is a feedback loop to our science. We see things that we wouldn't have seen if we hadn't executed on it. Or creating the clarity on how this is going to change the world makes us kind of question assumptions that we

Starting point is 00:20:31 might have not done if we had just stayed in the scientific world and actually makes the science much more interesting. This is what I find so amazing about the job that I have and about Microsoft Research. If I see how researchers kind of get this insight and they say, yes, the execution makes my science better and the impact makes my science better, this is kind of like really deeply gratifying. Most people think somewhat logically that in order to innovate, we need to think forward or think ahead. And you suggest in another provocative blog post that we actually need to think backwards. Tell us what you mean

Starting point is 00:21:26 by thinking backwards and then unpack why we need to do it, why it's hard to do it, and what happens when we do it. Yeah. So I wrote this blog post as a reaction to comments that I heard often. Well, we'll cross that bridge when we get there. And often what happens is you never get to that bridge, or when you get there, you're really stuck. Totally unprepared. Unprepared, and you don't know what's going on. So what thinking backwards does is it starts with what we call at Microsoft defining a North Star goal, a really good North Star goal.

Starting point is 00:22:04 And then not immediately jump, oh, what is the best direction to a North Star goal, a really good North Star goal. And then not immediately jump, oh, what is the best direction to that North Star goal, but kind of creating landmarks. And I call them landmarks because milestones are kind of like forward thinking, milestone one, what is your milestone one? But I actually think about landmark N minus one, because really what we do is we navigate uncertainty. We don't know where we will go. But if we know, oh, there is somewhere there, I need to get there. I don't know exactly what will happen on the path, but I know the dimensions. I know I can go west and east. I can go north and south. I know essentially how I can maneuver. And if I know the landmarks, right, then I can

Starting point is 00:22:44 get there. And if I do get stuck, it kind of helps me not to get frustrated. So if I know the landmarks, right, then I can get there. And if I do get stuck, it kind of helps me not to get frustrated. So if I know this is my landmark and I get stuck, I hit a dead end, which happens to all of us, I will find a solution to get to the landmark or I will redefine the landmark. It gives me much more clarity to deal with these situations. Whereas if you move forward and you hit a dead end, you're stuck. And then you often give up and you get frustrated. Well, Donald, we've reached the part in the podcast where I always ask my guests what could possibly go wrong. And I do this because every line of research that has potential for great

Starting point is 00:23:19 good also has potential for great risk or great harm. And as a leader, you don't only have to worry about your own stuff. You have to worry about the stuff of all the people that you shepherd and supervise. So what, if anything, keeps you up at night metaphorically? And what responsibility do you have to identify and then try to mitigate the potential risks of the work that you do and the work that the people here do? So as a high order bid, I'm an optimist. And I just, well, move forward. We're just talking about thinking backwards. But actually, I always think there is a way out.

Starting point is 00:23:54 And one of the reasons why I think that is because in the bad situations I've been in life, there was always somebody with me. I've always managed to never be alone because if things get bad, it's much better not to be alone. I have also, I have this, I think one of my biggest strengths is that I can detach myself from myself. So sometimes if things go really wrong, I can look at myself and say, well, Donald, you really screwed up. Okay. And then I have a different perspective and it helps me to move on. In Microsoft Research, we are about risk-taking.

Starting point is 00:24:30 We've created something called Failure to Lunch, which is a seminar series where people of the lab talk about their failure. And we celebrate the kind of what we call smart risk-taking. But usually there's something to be learned and we celebrate failure. And that is great, I think. All right. So let's move over and not talk about failure. Let's say that you succeed wildly in some of these technologies that you're chasing that are your North Star goals.

Starting point is 00:25:01 And they have unintended consequences. How do you mitigate that? That, of course, is a great question. And I think when I started computer science, we were innocent. I remember writing grant proposals and the question about ethical concerns. It was a no-brainer. And now everything we do has an ethical side. We are dealing with technology that is dangerous, and we know it, right? It can all be misused in many ways. As scientists, we have a responsibility to think about how our technology can be misused. And we have a big, big responsibility to educate society and do our best to explain the technology and possible misuses of technology.

Starting point is 00:25:47 If we do that, we kind of do the right thing. And we also play our role. It is not our decision how our technology is used. We just need to be responsible and develop technology that makes the world a better place. We should think about the positive sides. But again, I'm an optimist. All right. It's story time. We've heard a bit about your academic and professional life. Let's rewind a bit and hear how you got there. How I got into computer science is not a heroic story. I didn't know anything else what to do. It's kind of more random than really by design. I come from a family of lawyers. And so I wanted to always to become a chartered accountant, which is also, I mean,

Starting point is 00:26:33 database is just as boring as that. And so I went to the Harvard Summer School and I did a course on programming and it just infected me. I got the bug and I love programming. And so that kind of changed my plans. I studied computer science and I think I just got lucky. So from there, you went to be a professor. You got a PhD somewhere in the mix there. You're a professor at ETH Zurich. Connect the dots for us. Here's the story. So I had been working on this cipher-based project with MSR, and probably I was somehow on the watch list, and I had visited, and I got an offer to join MSR.

Starting point is 00:27:21 And for me, it was actually pretty clear that I would decline the offer. Unfortunately, not for my wife, or fortunately, not for my wife. So my wife literally said, Donald, you can stay happily senile at ETH, or you can start from scratch. And I thought, well, actually, both of those options are pretty bad, right? I don't want to start from scratch, but I don't want to be senile either. And so what we ended up deciding that I start from scratch because why have one career if you can have two careers? And so now I'm in my second career, started from scratch and has been quite a ride. Tell us something interesting about yourself that we might not know, whether it's a characteristic, a life event, a side quest, something you've done,

Starting point is 00:28:08 and how it has affected or impacted your life or career. And if it didn't even affect yours, maybe somebody else's. One thing I did this summer, I wrote a small book. It's called Wunder Informatik, The Miracle of Computer Science. So I actually wrote it in German because I have four children, three daughters. I wrote it essentially for my daughters because they're kind of asking me all these questions. Why did you study computer science? How did you get there? And I never had a good answer. I felt like I weaseled my way through a whole career as a professor without knowing why to study computer science or why I was so lucky, I had to reflect on this. And what is it that makes computer science so special?

Starting point is 00:28:54 I wrote it. I evangelized it a little bit in Switzerland. That's why I wrote it in German, because I was invited to give a commencement speech at one of the high school in Switzerland. And that was kind of, I used that opportunity to kind of advertise the book. And so I got a lot of feedback from that. It has been great because, as it always is, when you teach, you learn more than anybody else. maybe you could give our listeners a little primer on how they could go about setting their own North Star goals, including what they should do if they run into one of those dead ends that you talked about earlier. What I believe my biggest problem as a lab director or as a researcher is defining the right goal. That is the biggest problem of all. Essentially, it defines our ambitions. Getting the right level of ambition and rising because the opportunity is rising, that's a very difficult task for a researcher. So I think that the thinking backwards framework is actually great

Starting point is 00:29:58 to define a North Star goal and to get to the right level of ambition. And the way it works is if you don't find a path backward from your goal to where you are, your ambition is probably too high, right? Starting civilization in space, right, is probably too big of an ambition because we cannot execute on it, right? But if you kind of have not interesting landmarks, if they are boring, if they are not inspiring you, then your ambition was probably too small. And so the framework allows you to, first of all, reason what your goal is and kind of dream of it and the implications that it has and the impact. But it is also a way to kind of keep you honest and validate things. As we close, I want to give you the last word. And as the leader of the MSR Lab in Redmond, you're in a unique position to offer some advice and inspiration to our listeners.

Starting point is 00:30:57 What's the next big North Star goal for Microsoft Research? Yeah, so, of course, now I have to think big and I have to think beyond to be inspiring. So I think we have been programming devices. We've been programming mainframes. We've been programming PCs. We've been programming the web and so on. I think we need to go to the extreme craziness and think that the world is one big computer. I think this is the big North Star goal that we have. And I think to break it down, and we were talking about the edge and the cloud, we are kind of making the world programmable by injecting computers or micro controllers into everything. And that way we make the world programmable. But at the moment, we're still doing that in isolation. And I would love us to think of it as a one big system that

Starting point is 00:31:50 we should program. And of course, we should think about, again, what are the things that we can enable? What are the killer applications of that computer? What are the ways to optimize it kind of in the same way as Socrates? how to secure it. If everything is connected, how do you draw lines? And essentially how to program it in an efficient way so that everybody can take advantage of this world computer. Again, it ties into our superpowers. We have the freedom to work on this. We have the skills to execute, maybe not on the complete vision, but on pieces, important pieces, once we have clarity about the landmarks. We have time to do that.

Starting point is 00:32:30 We can take risks, right? Some of the things will fail on that path. We have all the ingredients here that you need to address these really, really big dreams. Donald Kossman, thanks for taking time away from your own North Star goal and coming in. Thank you so much. To learn more about Dr. Donald Kossman and how thinking backwards is moving us forward, visit Microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 107 - Democratizing data, thinking backwards and setting North Star goals with Dr. Donald Kossmann

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.