Disseminate: The Computer Science Research Podcast - High Impact in Databases with... Raghu Ramakrishnan

Episode Date: June 17, 2024

In this High Impact episode we talk to Raghu Ramakrishnan.Raghu is CTO for Data and a Technical Fellow at Microsoft. Tune in to hear Raghu's story and learn about some of his most impactful work....The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust. Hosted on Acast. See acast.com/privacy for more information.

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Disseminate, the computer science research podcast. The podcast is brought to you by Pomtree. Pomtree are the developers behind Raftree, the open source temporal graph analytics engine for Python and Rust. Raftree supports time traveling, multi-layer modeling, and comes out of the box with advanced analytics like community evolution, dynamic scoring, and temporal motifs mining. It is blazingly fast, scales to hundreds of millions of edges on your laptop, and connects directly to all your data science tooling, including Pandas, PyG, and Langchain. Go check out what the Pomtree guys are doing at www.rafty.com, where you can dive into their tutorial for the new 0.80 release. Today is another installment of our high impact series and we're going to be
Starting point is 00:01:07 talking to Raghu Ramakrishnan. Raghu is the CTO for Data and a Technical Fellow at Microsoft. He's previously worked at Yahoo where he served as the Chief Scientist for the Portal, Cloud and Search Divisions and before that he was a Professor at the university of wisconsin madison and he's won many awards across his career as well and this is just the highlight real but the cod award sig kdd innovations award and the sigmod contributions award and i'm sure a lot of our listeners will be familiar with the book that ragu co-authored with johanna skerker the database management systems book which is also affectionately known as the Cow Book. So welcome to the show, Agu. Thanks for having me, Jack.
Starting point is 00:01:47 The pleasure is all ours. So it's customary to start off by, I've obviously given a highlight reel there, the high level sort of broad strokes. So could you kind of color between those lines for us and tell us more about your journey so far in your own words and kind of, I guess, at the core of that, why did you become a researcher? So, you know, I went to Texas to get my PhD right after I graduated from IIT Madras. And when I got to Texas, there was this organization called MCC, Microelectronics and Computer Consortium.
Starting point is 00:02:27 A bunch of companies got together to fund research. And that was to compete with the Japanese fifth generation project, which you may not have heard of, but back in the day, it was seen as a threat to our technical aspirations. So all these companies got together, they funded this lab with shared IP that they could then commercialize. From my perspective as a lowly graduate student, this brought to town a group of really amazing world-class researchers, Not to mention the fact that the cafeteria there was well ahead of its time. Not only were drinks free,
Starting point is 00:03:15 but tuxedo waiters served it to you on contest. So my advisor, Phil Bouchard, was very generous. He sponsored me to go connect with folks at MCC. I hooked up with Francois Bancion, Katriel Beery. And so for me, research was something that I fell into quite naturally by being around people like these and this energy the vision that hey we have huge opportunity a huge challenge and the idea that researchers were on the front lines it was all very exciting so you know I jumped in and the world of research and haven't looked back.
Starting point is 00:04:06 Yeah, that's awesome. Yeah. So what are you working on currently today? What's your current sort of research focus? So to be honest, I would say at this point, there are a lot of people in my team who do the heavy lifting. And my job now is to bring the refreshments. Help them stay
Starting point is 00:04:28 on the job. Occasionally say, hey, how about looking there or looking here. As CTO, the fun thing is I have a big lever. The challenging thing is knowing where to apply it because
Starting point is 00:04:44 it's really impossible to be deep across the breadth of what we do in Azure Data. It's a group that drives everything, that owns everything from SQL Server to all our Azure services to some internally facing data services. It's a broad portfolio. So at any given time,
Starting point is 00:05:08 I tend to go deeper into one or two things. And these days, looking at scale-out queries and transactions, looking at what Gen AI means for all our data portfolio, those two right there, data governance, the world when governance meant ground-reworked statements in SQL.
Starting point is 00:05:33 It's no longer keeping up with the reality that most companies have many, many instances of databases spread across clouds on-prem. How do you govern the whole?
Starting point is 00:05:53 And that's not just an academic question because people are realizing how valuable data is, how intrusive it can be in terms of privacy. So there are all kinds of regulations. So how do we architect governance platforms that allow people to ensure that data is being properly used. So those are a few of the things that keep me. Yeah, that's awesome. There's some interesting things there. The thing that piqued my interest there when he was talking is the scale out transactions. Obviously, that did pique my interest. But anyway, cool. So let's talk about, because obviously this is the high impact series. Let's do a little bit of a retrospective of your career then and talk about some of your high impact work that you've done across your career.
Starting point is 00:06:38 Obviously, there's lots of it. I mean, if I just look based off your top citations, obviously that's a very coarse metric for how to view things. But we've got papers like Batch, Peanuts, Incognito, all these sorts of names. But I kind of want to ask you, what are you most proud of in your career? And does that correlate with the work that's had the most impact? That's a good question. I'd say there is a correlation, but probably not an exact one. And in terms of the things that stick in my memory, at least, some things are pure sentiment. Early papers I wrote with Passewag, with Petrielle, with folks in Texas, when I was working on my PhD, they mean a lot because they meant I learned how to do research, how to get into this field.
Starting point is 00:07:30 And then when I became a professor, some of the early papers I wrote then mean a lot because now I was learning a whole bunch of different things, how to work with a group of students, how to maybe share some research directions without getting in their way. I never anticipated how much I would learn from them. Then I did a startup. Then I came to Yahoo. I transitioned to the world of industry from academia. I guess what I'm trying to say is, for me,
Starting point is 00:07:59 while there are some papers that I can point to as having big impact, like you mentioned Peanuts, the YCSB benchmarking paper was another. Early work on magic sets, or the verge clustering algorithm. While you can point to things like this, for me, it's all about the people. I think of my work as I travel back in time more than the technical results. The thing that comes alive for me is the people who were involved in the work I did. And I will always be lucky, I guess, in that regard to have the chance to work with some really smart people. It's really nice people that I enjoy being with. And speaking of work, I would be remiss if I didn't say something about some of the work.
Starting point is 00:08:50 We talked about scale-out transactions. So, in fact, we just had our paper accepted at the upcoming Sigma DG. So, you know, the whole notion of how to support transactions when the underlying data is optimized for analytics, columnar images spread out across a bazillion machines with some rigorous multi-table asset semantics, like snapshot isolation in our case. How do we support this? And there's something that I've had the good luck to work on with some amazing people here
Starting point is 00:09:28 at Microsoft. But net-net, for me, my career, I see through the lens of the people I work with. The thing I'm probably the most proud of is my students. Most of them are smarter than I am. Many, many of them have done much more impactful things than I have. And it's all because of me. I take full credit. That's fantastic.
Starting point is 00:09:52 That's a really nice touching message. I like that. That's really cool. I did want to touch on, because I think this probably has had, of all your work, probably the biggest impact on me because we kind of used the the book the
Starting point is 00:10:06 textbook for our database module at university and that was that was the book that you wrote so i mean tell us more about the i guess the book in terms of the background of it and kind of what you're what you think about it today is it still kind of i don't know relevant in some sense like or is it does it need updating but for me it's kind of it's a't know, relevant in some sense, like, or is it, does it need updating? But for me, it's kind of, it's a bit timeless, right? I mean, the lessons in there kind of will stand true for a long time. But yeah, I just kind of wanted to get your opinion on the book and kind of what you thought about it today. Thanks, Jack. No, that's, that book cost me so much. The origins of the book really are in the courses
Starting point is 00:10:46 we taught at Wisconsin. So really, while my name is on the book, all of the people who taught courses there, David Wegg, Mike Carey, Janice Unides, Jeff Norton, right? They all had some part to play.
Starting point is 00:11:01 DeWitt and Carey in particular shaped some of the hands-on software that we used. So a characteristic of the Wisconsin way of teaching databases was we built systems. We built hands-on systems, and our courses were built around course projects, right? Many ways, many roles. And the book itself grew out of the course notes and the lectures and it went through a couple of beta editions alpha edition before it eventually made it out in that cow jacket
Starting point is 00:11:35 you know we even had a t-shirt competition for people who guessed what the cover meant. Oh, really? If you go and look at it carefully, there is a tree and there's a bee buzzing around. So, of course, it's a bee tree. Like any well-designed bee tree, where are the keys? Hanging off leaves. And how do the bees get there? They start at the root.
Starting point is 00:12:02 Fantastic. I never knew that. That's brilliant. You look next to the tree, there's a pool. But of course, it's a buffer pool with fragments of
Starting point is 00:12:12 tables in there which are relational tables with grids and rows and columns. So I had a lot of fun, as you can tell, working terrible punks in that tunnel.
Starting point is 00:12:24 And the number of cows corresponded to the edition and white cows you ask was concept the dairy state okay see i did not know that but there we go yeah yeah yes deep significance that you absolutely have no use for in a line of that color. You know, I think the book still is entirely valid for many, many topics. For many years, I resisted revising it. And then the area got away from me so fast, I didn't have time to revise the book.
Starting point is 00:13:05 If I look at the last 10, 15 years, Hadoop, big data, scale-out systems, cloud, separation of state and computer, you know, so much has changed. Earlier, we talked about how the nature of governance has changed from database-centric to data estate-centric. The idea that database machines were dead is dead. The world has changed in so many ways. And in fact, the last couple of years, Johannes and I, Johannes is also a technical fellow at Microsoft now.
Starting point is 00:13:44 He, I, a few others at Wisconsin Roots and some others. We have gotten together. We have been teaching internal courses. And in the course of doing this, it's given us a chance to pull together a lot of material that is missing in the house. And we talk about updating the cow book. And we'd like to do it actually as something in the public domain. We are working to get all the legalities lined up, but if we can,
Starting point is 00:14:18 we'd like to create this as an open source resource that other instructors can contribute and as they feel can be a resource for our community we'll see if that works out yeah well you hit it here first listeners there's there's going to be a sequel fantastic maybe there's a sequel sequel joke in there as well i don't know but anyway cool so kind of going off then see i mean i can maybe preempt the answer to this question given what you said about sort of the high impact work you've done over your career and about the people. But I want to ask you kind of a few things. What inspires you and who or what has had the most impact on your career?
Starting point is 00:14:59 What inspires me? I don't know. I enjoy engaging with the problem, solving puzzles. But also over the years, if I think back, there was a time when I published extensively in the theory side of databases. And over time, I've shifted to where most of what I do is much more grounded in systems. And I think deep down, even when I did theory, it was database theory. I wanted it to be tied to something that was practically relevant. And that pendulum has flown.
Starting point is 00:15:35 So I moved to Yahoo in part because of that. I had a chance at some point to build things that were directly going to be engaged with by billions, hundreds of millions of people that appealed to me. So I'd say that, like most people, the things that excite me have been some things that have stayed constant. I like challenges. I like puzzles.
Starting point is 00:15:59 But some things have also evolved, right? Over time, things like working on things that have a more practical impact, working on things that directly translate into products that affect a lot of people. And these are not all always compatible with each other. Sometimes you make trade-offs. And I made that transition. I can see that. If you think about where I am today, I'm CTO for a product division. There's an applied research group that I still have. But frankly, the bulk of my time is spent thinking about a commercial
Starting point is 00:16:38 product rather than research papers. So I have come the full arc of that journey, I guess. In terms of the people, I find it hard to just name a few, but I'll just, without a rank order, just off the top of my head, a few very different reasons why they influence me. Jeff Ullmond has always been a mentor for me and some of the papers he wrote on magic sets and the like were what led to my thesis research. Avi Silbershads, my advisor, has always been one I looked up to. So people like that, François Bancino, Trillian Beery, in my formative days they mattered great deal and I learned a lot. When I came to Wisconsin if I think about the faculty there
Starting point is 00:17:32 not only were they great colleagues, great researchers you hear this very true unfortunately often mean that research universities in the US don't place that much importance on your teaching. This was not true in the CS department
Starting point is 00:17:55 at Wisconsin. All of my database colleagues were better classroom teachers than me. And I tried hard, but these guys were just too damn good. We had something called the Power Award, which students elected a professor as Professor of the Year
Starting point is 00:18:14 and gave them a cover. In all my 20 years at Wisconsin, for 19 of those years, I never came within sniffing distance of the cover. The last year, I never came within sniffing distance of the cow. The last year, I think students rigged it. The nation present, and I won it.
Starting point is 00:18:33 So, if you think about it, the group at Wisconsin taught me the value of teaching. Students have always mattered to me. Every one of my students, starting with Sudarshan,
Starting point is 00:18:50 my first born, mattered to me. And I will always remember that they were huge. The people I got to work with afterwards, here, like I said, I could keep naming names. To me,
Starting point is 00:19:06 that's been the single biggest gift in my professional career, the chance to work with some really good people and really nice people. Lots of names, I could just go through that. But I'll stop here. In terms of papers, there were a number of theory papers
Starting point is 00:19:21 from back in the day, papers by Chandra Haril, papers by Ron Fagan that influenced me, that I looked at. Paris Kamalakis and his papers. In later years, more applied papers, as you might imagine. Some of the benchmarking papers that David wrote led to my own benchmarking work. The DynamoDB paper that came out of AWS led to my interest in distributed systems, led to peanuts. Lots of things.
Starting point is 00:19:53 Yeah. Yeah. He'll be here a long time. Okay. One last question real quick then. And this would be kind of the focus of the podcast in some sense and you're good you're in a good place to answer this question because you've had a foot in both camps and that's the interaction between academia and industry and bridging the gap
Starting point is 00:20:14 and what your current position is on it and how we can be better as a research community engaging with industry and vice versa and what that kind of looks like at the moment and what your take is on that? It's a great question, Jack. And there's a related question, so I'll give you a two-for-one. Should I do a master's or a PhD? Okay. The thing about all these questions is there's no one right answer. The only version of these questions that's ever made sense to me is, what should I do?
Starting point is 00:20:48 Right? And the answer is different for every individual, right? And if you take industry versus academia, academia, you definitely have the ability to work on a broader range of things. You can work on pretty much anything. But with a master, if you're working on very system-y topics, you typically need resources. And grant funding is not always easy to come by. So that de facto constrains you. In fact, one of my reasons for moving from academia
Starting point is 00:21:20 was I worked on internet-scale systems. And rather than writing big ground proposals, I could get as many machines as I needed. But yeah. But also in industry, you have access to real problems, customers, data, things that even with the best of will, you can't share back with academia for privacy reasons. So that said, I will say this.
Starting point is 00:21:48 I see a trend where people from academia are moving to industry. And fewer people are moving from industry to academia. I hope we reach a balance. It's good for people to move back and forth. But at the end of the day day if we don't have that strong bridge i know people in virtually every cs department in the u.s most places around the world i have friends and that's one of the things that i think makes me useful that i know people
Starting point is 00:22:22 in academia vice versa there are people in academia that know people in industry that makes them very valuable. And I think part of it is we serve as bridges. Our students need to be educated in the things that matter commercially because what's happening commercially defines the bow wave of our field in many ways, not always, but in many ways. And what's going on in academia can often disrupt what is going on. So think of Park, came out of Berkeley,
Starting point is 00:22:55 made a material difference in the path that Hadoop and big data systems took. Hadoop itself originated at Yahoo. After the very early days Doug cutting K2 Yahoo that in turn influenced a lot of the work they've been doing in academia this back and forth, this rich this is part of what makes our community so vibrant
Starting point is 00:23:17 so I think people should do what moves them what moves them will vary over time and that's okay don't be afraid to break out of your box. But I certainly hope that balance helps. I think that's a great place to leave it. Ragu, thank you so much for taking the time to speak with me and we'll see you all next time. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.