Disseminate: The Computer Science Research Podcast - High Impact in Databases with... Raghu Ramakrishnan
Episode Date: June 17, 2024In this High Impact episode we talk to Raghu Ramakrishnan.Raghu is CTO for Data and a Technical Fellow at Microsoft. Tune in to hear Raghu's story and learn about some of his most impactful work....The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust. Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate, the computer science research podcast.
The podcast is brought to you by Pomtree.
Pomtree are the developers behind Raftree, the open source temporal graph analytics engine for Python and Rust.
Raftree supports time traveling, multi-layer modeling, and comes out of the box with advanced analytics like community evolution, dynamic scoring, and temporal motifs mining. It is
blazingly fast, scales to hundreds of millions of edges on your laptop, and connects directly
to all your data science tooling, including Pandas, PyG, and Langchain. Go check out what
the Pomtree guys are doing at www.rafty.com, where you can dive into their tutorial for the new 0.80
release. Today is another installment of our high impact series and we're going to be
talking to Raghu Ramakrishnan. Raghu is the CTO for Data and a Technical Fellow at Microsoft.
He's previously worked at Yahoo where he served as the Chief Scientist for the Portal, Cloud and
Search Divisions and before that he was a Professor at the university of wisconsin madison and he's won many
awards across his career as well and this is just the highlight real but the cod award sig kdd
innovations award and the sigmod contributions award and i'm sure a lot of our listeners will
be familiar with the book that ragu co-authored with johanna skerker the database management
systems book which is also affectionately known as the Cow Book. So welcome to the show, Agu.
Thanks for having me, Jack.
The pleasure is all ours.
So it's customary to start off by, I've obviously given a highlight reel there, the high level
sort of broad strokes.
So could you kind of color between those lines for us and tell us more about your journey
so far in your own words and kind of, I guess, at the core of that, why did you become a
researcher?
So, you know, I went to Texas to get my PhD right after I graduated from IIT Madras.
And when I got to Texas, there was this organization called MCC, Microelectronics and Computer Consortium.
A bunch of companies got together to fund research.
And that was to compete with the Japanese fifth generation project, which you may not have heard of, but back in the day,
it was seen as a threat to our technical aspirations.
So all these companies got together, they funded this lab with shared IP that they
could then commercialize. From my perspective as a lowly graduate student, this brought to town
a group of really amazing world-class researchers, Not to mention the fact that the cafeteria there
was well ahead of its time.
Not only were drinks free,
but tuxedo waiters served it to you on contest.
So my advisor, Phil Bouchard, was very generous.
He sponsored me to go connect with folks at MCC.
I hooked up with Francois Bancion, Katriel Beery.
And so for me, research was something that I fell into quite naturally
by being around people like these and this energy the vision that hey we have
huge opportunity a huge challenge and the idea that researchers were on the front lines
it was all very exciting so you know I jumped in and the world of research and haven't looked back.
Yeah, that's awesome.
Yeah.
So what are you working on currently today?
What's your current sort of research focus?
So to be honest, I would say at this point,
there are a lot of people in my team who do the heavy lifting.
And my job now is to bring the refreshments.
Help them stay
on the job.
Occasionally say, hey,
how about looking there or looking here.
As
CTO, the fun thing is
I have a big lever.
The challenging thing is knowing where
to apply it because
it's really impossible to be deep
across the breadth of what we do in Azure Data.
It's a group that drives everything,
that owns everything from SQL Server
to all our Azure services
to some internally facing data services.
It's a broad portfolio.
So at any given time,
I tend to go deeper into one or two things.
And these days,
looking at scale-out queries and transactions,
looking at what Gen AI means for all our data portfolio,
those two right there, data governance,
the world when
governance meant
ground-reworked statements in SQL.
It's
no longer keeping up
with the reality that
most companies have
many, many instances
of databases spread across clouds
on-prem.
How do you govern the whole?
And that's not just an academic question because people are realizing how valuable data is,
how intrusive it can be in terms of privacy.
So there are all kinds of regulations. So how do we architect governance platforms that allow people to ensure that data is being properly used.
So those are a few of the things that keep me. Yeah, that's awesome. There's some interesting
things there. The thing that piqued my interest there when he was talking is the scale out
transactions. Obviously, that did pique my interest. But anyway, cool. So let's talk about,
because obviously this is the high impact series.
Let's do a little bit of a retrospective of your career then and talk about some of your high impact work that you've done across your career.
Obviously, there's lots of it. I mean, if I just look based off your top citations, obviously that's a very coarse metric for how to view things.
But we've got papers like Batch, Peanuts, Incognito, all these sorts of names. But I kind of want to ask you, what are you most proud of in your career?
And does that correlate with the work that's had the most impact?
That's a good question.
I'd say there is a correlation, but probably not an exact one.
And in terms of the things that stick in my memory, at least, some things are pure sentiment.
Early papers I wrote with Passewag, with Petrielle, with folks in Texas, when I was working on my PhD, they mean a lot because they meant I learned how to do research, how to get into
this field.
And then when I became a professor, some of the early papers I wrote then mean a lot because now I was learning a whole bunch of different things,
how to work with a group of students,
how to maybe share some research directions without getting in their way.
I never anticipated how much I would learn from them.
Then I did a startup.
Then I came to Yahoo.
I transitioned to the world of industry from academia.
I guess what I'm trying to say is, for me,
while there are some papers that I can point to as having big impact,
like you mentioned Peanuts, the YCSB benchmarking paper was another.
Early work on magic sets, or the verge clustering algorithm.
While you can point to things like this, for me, it's all about the people.
I think of my work as I travel back in time more than the technical
results. The thing that comes alive for me is the people who were involved in the work I did.
And I will always be lucky, I guess, in that regard to have the chance to work with some
really smart people. It's really nice people that I enjoy being with. And speaking of work, I would be remiss if I didn't say something about some of the work.
We talked about scale-out transactions.
So, in fact, we just had our paper accepted at the upcoming Sigma DG.
So, you know, the whole notion of how to support transactions when the underlying data is optimized for analytics,
columnar images spread out across a bazillion machines
with some rigorous multi-table asset semantics,
like snapshot isolation in our case.
How do we support this?
And there's something that I've had the good luck to work on with some amazing people here
at Microsoft.
But net-net, for me, my career, I see through the lens of the people I work with.
The thing I'm probably the most proud of is my students.
Most of them are smarter than I am.
Many, many of them have done much more impactful things than I have.
And it's all because of me.
I take full credit.
That's fantastic.
That's a really nice touching message.
I like that.
That's really cool.
I did want to touch on,
because I think this probably has had,
of all your work,
probably the biggest impact on me
because we kind of used the the book the
textbook for our database module at university and that was that was the book that you wrote
so i mean tell us more about the i guess the book in terms of the background of it and kind of what
you're what you think about it today is it still kind of i don't know relevant in some sense like
or is it does it need updating but for me it's kind of it's a't know, relevant in some sense, like, or is it, does it need updating? But for
me, it's kind of, it's a bit timeless, right? I mean, the lessons in there kind of will stand
true for a long time. But yeah, I just kind of wanted to get your opinion on the book
and kind of what you thought about it today. Thanks, Jack. No, that's, that book
cost me so much. The origins of the book really are in the courses
we taught at Wisconsin.
So really, while my name is on the book,
all of the people
who taught courses there,
David Wegg, Mike Carey,
Janice Unides, Jeff Norton,
right?
They all had some part to play.
DeWitt and Carey in particular
shaped some of the hands-on
software that we used.
So a characteristic of the Wisconsin way of teaching databases was we built systems.
We built hands-on systems, and our courses were built around course projects, right?
Many ways, many roles.
And the book itself grew out of the course notes and the lectures and it went through
a couple of beta editions alpha edition before it eventually made it out in that cow jacket
you know we even had a t-shirt competition for people who guessed what the cover meant. Oh, really? If you go and look at it carefully,
there is a tree and there's a bee buzzing around.
So, of course, it's a bee tree.
Like any well-designed bee tree,
where are the keys?
Hanging off leaves.
And how do the bees get there?
They start at the root.
Fantastic.
I never knew that.
That's brilliant.
You look next to the tree,
there's a pool.
But of course,
it's a buffer pool
with fragments of
tables in there
which are relational tables
with grids
and rows and columns.
So I had a lot of fun,
as you can tell,
working terrible punks
in that tunnel.
And the number of cows corresponded to the
edition and white cows you ask was concept the dairy state okay see i did not know that but there
we go yeah yeah yes deep significance that you absolutely have no use for in a line of that color.
You know, I think the book still is
entirely valid for many, many topics.
For many years, I resisted revising it.
And then the area got away from me so fast,
I didn't have time to revise the book.
If I look at the last 10, 15 years, Hadoop, big data,
scale-out systems, cloud, separation of state and computer,
you know, so much has changed.
Earlier, we talked about how the nature of governance
has changed from database-centric to data estate-centric.
The idea that database machines were dead is dead.
The world has changed in so many ways.
And in fact, the last couple of years, Johannes and I, Johannes is also a technical fellow at Microsoft now.
He, I, a few others at Wisconsin Roots and some others.
We have gotten together.
We have been teaching internal courses.
And in the course of doing this,
it's given us a chance to pull together a lot of material
that is missing in the house.
And we talk about updating the cow book. And we'd like to do it actually as something
in the public domain. We are working to get all the legalities lined up, but if we can,
we'd like to create this as an open source resource that other instructors can contribute
and as they feel can be a resource
for our community we'll see if that works out yeah well you hit it here first listeners there's
there's going to be a sequel fantastic maybe there's a sequel sequel joke in there as well
i don't know but anyway cool so kind of going off then see i mean i can maybe preempt the answer to
this question given what you said about sort of the high impact work you've done over your career and about the people.
But I want to ask you kind of a few things.
What inspires you and who or what has had the most impact on your career?
What inspires me? I don't know. I enjoy engaging with the problem, solving puzzles.
But also over the years, if I think back, there was a time when I published extensively in the theory side of databases.
And over time, I've shifted to where most of what I do is much more grounded in systems.
And I think deep down, even when I did theory,
it was database theory.
I wanted it to be tied to something
that was practically relevant.
And that pendulum has flown.
So I moved to Yahoo in part because of that.
I had a chance at some point to build things
that were directly going to be engaged with by billions, hundreds of
millions of people that appealed to me.
So I'd say that, like most people, the things that excite me have been some things that
have stayed constant.
I like challenges.
I like puzzles.
But some things have also evolved, right?
Over time, things like working on things that have a more practical impact, working on things that directly translate into products that affect a lot of people.
And these are not all always compatible with each other.
Sometimes you make trade-offs.
And I made that transition.
I can see that.
If you think about where I am today, I'm CTO for a product division. There's an applied research
group that I still have. But frankly, the bulk of my time is spent thinking about a commercial
product rather than research papers. So I have come the full arc of that journey, I guess. In terms of the people,
I find it hard to just name a few, but I'll just, without a rank order, just off the top of my head,
a few very different reasons why they influence me. Jeff Ullmond has always been a mentor for me and some of
the papers he wrote on magic sets and the like were what led to my thesis research.
Avi Silbershads, my advisor, has always been one I looked up to. So people like that, François
Bancino, Trillian Beery, in my formative days they mattered
great deal and I learned a lot. When I came to Wisconsin
if I think about the faculty there
not only were they great colleagues, great researchers
you hear this very true
unfortunately often mean that research universities
in the US
don't place that much
importance on
your teaching. This was not true
in the CS department
at Wisconsin.
All of my database colleagues
were better classroom teachers than me.
And I tried hard, but
these guys were just too damn good.
We had something called the Power Award,
which students elected a professor
as Professor of the Year
and gave them a cover.
In all my 20 years at Wisconsin,
for 19 of those years,
I never came within sniffing distance of the cover.
The last year, I never came within sniffing distance of the cow. The last year,
I think students rigged it.
The nation
present, and I won it.
So,
if you think about it,
the group at
Wisconsin taught me the value of
teaching.
Students have always mattered to me.
Every one of my students,
starting with Sudarshan,
my first born, mattered to me.
And I will always remember
that they were huge.
The people I got
to work with afterwards,
here, like I said,
I could keep naming names.
To me,
that's been the single biggest gift
in my professional career, the chance to
work with some really good people
and really nice people.
Lots of names, I could just go through that.
But I'll stop here.
In terms of papers,
there were a number of theory papers
from back in the day,
papers by Chandra Haril, papers by Ron Fagan that influenced me, that I looked at.
Paris Kamalakis and his papers.
In later years, more applied papers, as you might imagine.
Some of the benchmarking papers that David wrote led to my own benchmarking work.
The DynamoDB paper that came out of AWS led to my interest in distributed systems,
led to peanuts.
Lots of things.
Yeah.
Yeah.
He'll be here a long time.
Okay.
One last question real quick then.
And this would be kind of the focus of the podcast
in some sense and you're good you're in a good place to answer this question because you've had
a foot in both camps and that's the interaction between academia and industry and bridging the gap
and what your current position is on it and how we can be better as a research community engaging
with industry and vice versa and what that kind of looks like at the moment and what your take is on that?
It's a great question, Jack.
And there's a related question, so I'll give you a two-for-one.
Should I do a master's or a PhD?
Okay.
The thing about all these questions is there's no one right answer.
The only version of these questions that's ever made sense to me is, what should I do?
Right? And the answer is different for every individual, right? And if you take industry
versus academia, academia, you definitely have the ability to work on a broader range of things.
You can work on pretty much anything.
But with a master, if you're working on very system-y topics,
you typically need resources.
And grant funding is not always easy to come by.
So that de facto constrains you.
In fact, one of my reasons for moving from academia
was I worked on internet-scale systems.
And rather than writing big ground proposals,
I could get as many machines as I needed.
But yeah.
But also in industry, you have access to real problems,
customers, data, things that even with the best of will,
you can't share back with academia for privacy reasons.
So that said, I will say this.
I see a trend where people from academia
are moving to industry.
And fewer people are moving from industry to academia.
I hope we reach a balance.
It's good for people to move back and forth.
But at the end of the day day if we don't have that
strong bridge i know people in virtually every cs department in the u.s most places around the
world i have friends and that's one of the things that i think makes me useful that i know people
in academia vice versa there are people in academia that know people in industry
that makes them very valuable.
And I think part of it is we serve as bridges.
Our students need to be educated in the things that matter commercially
because what's happening commercially defines the bow wave of our field
in many ways, not always, but in many ways.
And what's going on in academia can often disrupt what is going on.
So think of Park, came out of Berkeley,
made a material difference in the path that Hadoop and big data systems took.
Hadoop itself originated at Yahoo.
After the very early days
Doug cutting K2 Yahoo
that in turn influenced a lot of the work
they've been doing in academia
this back and forth, this rich
this is part of what makes our community so vibrant
so I think people should do
what moves them
what moves them will vary over time
and that's okay
don't be afraid to
break out of your box. But I certainly hope that balance helps. I think that's a great place to
leave it. Ragu, thank you so much for taking the time to speak with me and we'll see you all next time. Thank you.