Microsoft Research Podcast - 114 - Project Orleans and the distributed database future with Dr. Philip Bernstein
Episode Date: April 8, 2020Forty years ago, database research was an “exotic” field and, because of its business data processing reputation, was not considered intellectually interesting in academic circles. But that didn�...�t deter Dr. Philip Bernstein, now a Distinguished Scientist in MSR’s Data Management, Exploration and Mining group, and a pioneer in the field. Today, Dr. Bernstein talks about his pioneering work in databases over the years and tells us all about Project Orleans, a distributed systems programming framework that makes life easier for programmers who aren’t distributed systems experts. He also talks about the future of database systems in a cloud scale world, and reveals where he finds his research sweet spot along the academic industrial spectrum. https://www.microsoft.com/research
Transcript
Discussion (0)
It's very unusual to have to build a data management service where it could be a blob store, a JSON store, a relational database.
It could be anything, and it could be any one of many products for each of those data structures.
And yet, you only want to have to build this feature once, and then have it run successfully no matter what underlying storage system you're
plugging in.
You're listening to the Microsoft Research Podcast, a show that brings you closer to
the cutting edge of technology research and the scientists behind it.
I'm your host, Gretchen Huizenga.
Forty years ago, database research was an exotic field, and because of its business data
processing reputation, was not considered intellectually interesting in academic circles.
But that didn't deter Dr. Philip Bernstein, now a distinguished scientist in MSR's data management,
exploration, and mining group, and a pioneer in the field. Today, Dr. Bernstein talks about his
pioneering work in databases over the years and tells
us all about Project Orleans, a distributed systems programming framework that makes life
easier for programmers who aren't distributed systems experts.
He also talks about the future of database systems in a cloud-scale world and reveals
where he finds his research sweet spot along the academic-industrial spectrum.
That and much more on this episode of the Microsoft Research Podcast.
Phil Bernstein, welcome to the podcast.
Thank you.
You're a distinguished scientist and a bit of an OG at Microsoft Research,
and you've been at the forefront of innovation in database technology for several decades.
But currently, you're working at MSR under the umbrella of the Data Management,
Exploration, and Mining Group, or DMX.
Before we dive deeper on you and your specific work, give us an overview of DMX
and how database research is situated in
the broader framework of Microsoft research today. Well, Microsoft has a huge database business.
And the database business in general, from the very beginning in the 70s, was largely driven
by research. And so research has always been a very important ingredient in improving database products.
There's this need to innovate all the time.
And that comes both from the engine side, building the core technology to manipulate large amounts of data, complex data, but also the tools to make it possible to design the database, to be able to manage it.
And the DMX group covers both. It covers
base engine technology for manipulating data, for building cloud services, and then also for tools
to integrate data, to find ways to reduce the cost of ownership by reducing the level of effort
on the part of database administrators. So it's the full range. And
that's where the data management exploration, being able to look around and understand your data,
and mining to really get into the tools to analyze the data afterwards.
Well, let's situate you now. How would you describe your research identity in terms of
what gets you excited about the work you do and what gets you up in the morning?
Well, I look for high impact.
I'm trying to figure out what to work on that's going to make a difference and also where my incremental value is going to be high because there aren't enough people working on it or paying attention to the problem.
There are two technical areas where I've focused mostly over the decades. One is
transaction processing, which is how to build systems like online retail or banking systems,
money transfer systems, those sorts of things. And that stuff is very low level. You're very
deep into the database engine. And then on the flip side, much higher level, the integration of data. So the data is in the database. You've got to manage it. How do you tie it together? And I've worked on both and I've kind of flip-flopped back and forth between them, depending on the problem of the day and where the short and medium term opportunities tend to be.
Well, I want to take it back for a minute because you just mentioned a couple of topics that I think are important.
You've done some seminal work in transaction processing and distributed databases.
So let's go back several years. Give us a snapshot of the computing landscape when you started.
And then tell us what changes you've seen over the years,
what things look like for researchers in the cloud era, and why understanding the past is helpful
when innovating for the future. Well, it's been a long road. I mean, I started my research in the
mid-1970s, so it's over 40 years ago. At that point, the database business was small, and it was barely a business. It And hardly any of the fundamental issues had really been explored at all,
and the ones that had been explored certainly not in any depth.
So the opportunities were everywhere.
And in fact, early in my career, I had to stop working on some problems,
not because they weren't interesting, but because there were just too many problems to work on. I had to focus more in order to get something done.
Also, those were the mainframe computers. I mean, there was no distributed anything.
We knew it was coming, but it hadn't come yet. Database management was all in a glass
house and it was glass-enclosed, air-conditioned room used for business data processing, period.
No personal computers.
You talk to people about working on computing, and that was considered very exotic.
Now everybody's got one, at least, and has a pretty good feel for what they all do.
In their pocket.
Yeah, it's really different.
Okay, so when did you start seeing changes, and how did that impact what you were doing as a researcher?
There have always been changes, so it's hard to say that there was any given point where the changes were really big.
I started out looking at database design for my Ph.D. research, but as soon as I left and embarked on my own career, I got involved in distributed databases,
which seemed like one of the next big things. And I worked on it for many years. I stopped for a
while. And then with cloud computing, it all came back and I'm working on it again. So this is sort
of a pendulum. These topics come and go depending on what the workloads are that are needed, what the computing environment
is that has to support the data management. Well, the cloud presented a massive jump in scale
for distributed systems writ large. So you say you kind of came back into it. Was it because,
hey, this is a big new nut to crack and I want to be in on it? Certainly I wanted to be in on it, but I was also asked to be in on it.
Help!
Microsoft was starting to think about changing its database strategy,
its base products, to work on commodity hardware,
to scale out on large numbers of inexpensive machines running in the data center.
And they kind
of looked around and you know who is it that we have on staff who knows
something about this and I was one of the people that they tapped and so we
developed a new strategy and it took many many years for that to unfold this
was back in 2006 so it's more than 15 years ago. But where we are now with the products, we actually landed roughly where we were trying to get back from the beginning.
Well, okay.
So rounding out this four-part question that I kind of just laid out and I'm still walking through with you because I'm really interested in this.
Bill Buxton was on the show and he talked about the long nose of innovation and things come and go, and the idea that you can't innovate for the future unless you really understand the past.
So why, from a database perspective, is understanding the past helpful when you're
trying to innovate for the future? The set of mechanisms that we use to solve database problems,
they don't change very fast. Back in the early days, we were learning
about certain base technologies for the first time, but now there's this repertoire of ingredients
that you put into solving a database problem. I'm very sympathetic to graduate students who
are trying to learn this stuff because, you know, I learned it slowly over a period of many years as it was unfolding, but people getting into the field, they learned it
in a very compressed amount of time and they don't necessarily have a deep understanding of
why things are the way they are. And so when they encounter a problem, they're trying to solve it
just based on an understanding of the problem and then trip over some approach that they think, oh, I'll bet that would be helpful.
But then they don't realize this is actually a variation on something that has been applied in several other contexts before. Well, let's get specific and talk about some of your current work.
And there's a project you've been working on called Orleans,
which you've called somewhat generally a distributed systems programming framework
or a programming model and runtime for building
cloud-native services. Both are pretty high level. So tell us what is Orleans and what's
the motivation behind it or the pain point that prompted it? So maybe we should start with what's
a programming framework. So it's a form of middleware. That is to say that it's generic
software. It's not application specific, but it's not a low-level
platform either. Generally, a framework takes a bunch of services that are available from operating
systems, networking, distributed systems, and packages them up to be easier to use by integrating
them in some nice way. And so Orleans is a programming framework. That's what it does, is this integration of lower-level services.
The problem it's addressing is that of building distributed applications that run in data centers, in the cloud, on large numbers of machines.
And the reason why this is a problem is that mainstream programmers who have learned how to build applications are generally not distributed systems experts,
and there are many ways to go wrong when you try to carve up an application
and get it to run on a lot of servers.
It needs to be elastic.
That is to say, without changing the application,
you need to be able to add servers if the workload increases
or reduce the number of servers if you don't have so many customers using it.
It needs to be reliable because these machines are relatively inexpensive
and they actually fail at a significant rate,
and so you don't want the whole thing to come tumbling down
every time you lose a server.
So if you're going to spread the workload on multiple machines,
maybe you don't do it so well and one of the machines becomes a bottleneck.
It's sort of the whole thing grinds to a halt because this one machine is being overtaxed.
So these are the kinds of problems that an application developer faces, and Orleans is basically trying to factor them out so that you don't have to worry about them at all.
The framework does all that.
You just focus on building the
application. All right. So that's kind of like the problem statement of why it exists. Tell us
how you would define it. Maybe from a practical standpoint, it'd be good to just mention the kinds
of applications that you would use it for. And these are what are sometimes characterized as stateful,
interactive services. What do I mean by that? Well, maybe easiest to see by example,
Internet of Things, games, telemetry to monitor some other system, typically a computer system,
social networking, mobile computing. In all of these cases, the application is managing information
about something going on in the world. That's the main application function. And so the
second characteristic is that these applications are all object-oriented in the technical sense,
like object-oriented programming language. In Internet of Things, the objects are, well,
they're things. You know, they're sensors, they're devices of various kinds. In Internet of Things, the objects are, well, they're things,
you know, they're sensors, they're devices of various kinds. In games, they might be things
like players, games, scoreboards, and the like. Obviously, for mobile computing, they're mobile
devices. So in all these cases, your application that's running in the cloud has objects that are
surrogates or models of the physical thing or logical thing in the case has objects that are surrogates or models of the physical thing or logical thing,
in the case of games, that are out there in the world. And so what you're doing is the application
is spreading its workload across servers by spreading the objects around. Now, if you want
those objects to be spread around on multiple servers, they better not share memory because
they may not be co-located on the same server,
which means that the only way they're going to be able to interact
is to send messages to each other.
And another decision that was made in Orleans
is to have the objects be single-threaded,
that there's no internal parallelism in these objects.
And the reason for that is that programming in that case is much more
challenging for application developers, because now you've got parallel activities that are going
at the state of this object, and they can trip over each other. And so they need to synchronize,
and engineers historically have a hard time getting that right. Conceptually, it doesn't sound that bad,
but when you actually have to write programs that run at high speed
and they access the shared state of the object,
it actually is quite hard to get it right in all cases.
So Orleans said, no, we're just not going to allow that.
So objects are single-threaded, and they don't share memory,
and they communicate by exchanging messages.
Now what's new in Orleans is something called the virtual actor model and the characteristics
that I just described of single threading, no shared memory, message-based communication,
in the technical literature that's often called an actor. It's just another word for object that has these characteristics.
And in the virtual actor model,
the application developer does not control
when the object is instantiated, when it's activated,
where it's placed on machines.
All of that is handled by the framework.
What Orleans does in that case is that it will
first look around to see if the object is running, and if it is, then it will
perform the function that was requested. If it's not running, then Orleans will pick a server
on which to activate the object, will spin up the object on that server, and then we'll do the invocation that was requested by the application,
and we'll remember where the object's located so that future calls can go to that copy that's already running.
If the object isn't used for a while, Orleans will notice that also, and it will deactivate the object and free up its resources. So it's sort of
like a paging system in operating systems where you bring in pages of memory as needed
and then evict them when they're no longer needed. It's sort of the same thing here but
it's being done with objects. And this was a new concept when Orleans was developed.
And when exactly was Orleans developed? The
project started in like 2008-2009 in there. Let's drill in a little more
technically. You've alluded to several of the things that I think are important
about the project itself and then unpack some of the big challenges you've
addressed like scalability and reliability in a cloud
scale world?
David Schoenbrod, reliability and scalability are natural consequences of the
virtual actor model. So let's look at scalability. Remember that if you invoke an object and
it's not running, Orleans will place it on a server. So it's up to Orleans to balance the load
across all these servers. Ideally, when you activate an object that was not running,
you want to put it on a lightly loaded server so that you don't overload any other servers.
So Orleans is in charge of keeping the load balanced across the servers, and that enables
scalability. Let's look at the reliability
part. Suppose a server fails. Well, obviously all the objects that were running on that server are
immediately gone. But the next time any of those objects are invoked, Orleans will recognize the
fact that they're not running anymore, and so it will just resurrect the object. It will just activate it on one of the servers that is healthy, that is something that previous actor-oriented systems
all exposed to the application developer.
Orleans lets you forget about that.
But there is one consequence of this, which is that when an object is activated, what
state is it in?
What does it know about itself?
And that is an application programming problem
because at the moment that that server fails
and the objects go away,
their state in main memory is lost.
And so when the object is reactivated on another server,
it's going to be entirely up to the application program
for that object to reinitialize its state. And state is another
word for data, and reading data to initialize an object is just another way of saying it needs to
do data management. And that's how I got into this game, was that I said, gee, I think you folks
could use some help, because this is a pretty big
burden on the application developer to figure out how to do all this state management.
Well, talk about how this open source project has evolved and grown over the last few years.
How have you added to the work and why have you moved in those directions?
Well, we've gained a lot by being open source.
Orleans was one of the first projects that went open source.
As I said a little while ago, I got into this because I could see that application developers
had to do a lot of state management and that the standard abstractions that are part of the database repertoire are relevant to building these sorts of applications.
So maybe I can just start adding them, you know, add indexing, add transactions, add geo-distribution, replication,
and just make it easier for the application developer. I wasn't even sure if
this was research, because it was just applying what I knew about data management to yet another
product, if you will. But it turned out that it was research, which I didn't really see going in.
And it's research for two reasons. One is that it uses storage that's actually cloud storage.
It's not storage that's running on the server with the application.
That's very unusual.
When you build a data management system, you expect to be able to control storage.
I mean, that's such an important ingredient in doing data management.
But here, the storage is a service. And the second is,
because it's plug-in, it can be anything. Again, it's very unusual to have to build a data
management service where it could be a blob store, a JSON store, a relational database.
It could be anything. And it could be any one of many products for each of
those data structures. And yet, you only want to have to build this feature once and then have it
run successfully no matter what underlying storage system you're plugging in. And that
is a pretty unique challenge. It's not something I had ever seen done before. So it has required to rethink these abstractions from the beginning.
So what have you done additionally or how have you build a transaction system, you have to keep track of which transactions have succeeded, which is called committing the transaction, and which ones have not.
And that's generally done in a log, and that log is in storage. And the rate at which you can run the transactions is heavily dependent on the rate at which you can record that information in the log.
So it's a good idea to have one log and be able to simply append these descriptions of transactions that start and commit in this log.
But there's a problem here, which is that cloud storage doesn't offer a log and so every database
system i know of has a log and here we're going to implement transactions and there is no log
what are we supposed to do and you know so we said well we got plenty of storage so i guess
we're going to have to do our own log on top of cloud storage, which is what we did. And that worked, but it created some complexity in the system that our
customers didn't like very much. And we had to go back and do it again a different way because they
didn't really want this custom log we had built. And so what we did was we redid it so that we managed the state of
the transaction as part of the state of the object. So we piggyback our own log information
on the storage that's used by the object. Interesting. And that was something we hadn't
seen done before. And how was that received? They like
it a lot. That one stuck and that's what's shipping. Every research project has at least one
not yet. I'm putting that in air quotes, probably a lot more than one, meaning things that we don't
support yet, we can't do yet, that aren't on the map yet. What are the open problems that you still
face in this arena?
And how do you think you're getting closer to,
or at least thinking about getting closer to solving them?
Well, one of the big things that we don't do that we want to do
is what's now called serverless operation.
What that means is that when you develop the application and deploy it,
you're unaware of the fact that there are many servers out there.
That's not reality today with Orleans
because Orleans is simply a programming framework.
And when you develop your application,
you have to explicitly reserve servers in Azure
and then deploy your application on those servers.
So you're very much aware of the fact that there are servers
and they're your servers.
You've reserved them to run your application.
Now, what we'd like is to have this be a serverless service
where you don't know about any servers.
You still write your application in the way you always have
and you just drop it in the in-hopper and press a button,
and our infrastructure on Azure then just grabs that code, and we take care of all this server
stuff of provisioning the server and uploading the code to those servers and deal with the failures
and add servers and reduce the number of servers and all of that stuff in a way that's completely transparent
to the person or the group that's running this application operationally.
So serverless operation is a big one.
And the other is kind of related, which is just automating system management,
capacity planning.
Let's say you're a game developer and you've got thousands, tens of
thousands perhaps of gamers playing. Just monitoring it, figuring out what's going wrong,
looking at the behavior of the users. Right now that's all part of the application. And yet it's
something that every application developer faces. So why should everybody have to do this
in a custom way on their own?
Can't we do something to automate it?
And then third is, I mean,
there's still data management abstractions
which are not built into Orleans
that I would be interested in adding someday.
You know, we've added some.
We've got transactions, indexing,
geo-distribution are in there.
But there are certainly others
that we could add over time depending on the need of the applications and competing priorities. Phil, you've been in every situation along the research spectrum from academia to industrial
research to product and back.
And in that sense, you're kind of a walking, talking example of human tech transfer yourself.
Talk about your experiences in each of these areas and what the value is having had
experience in each of them as you've landed here at Microsoft Research. Sure. Well, you know, I have
done all this other work. I was a professor. I did a startup. I was working for a hardware company
for some years in product development. Now I'm in industrial research in a software company.
But I've been at Microsoft for 25 years,
so that says something about which one I prefer.
But let me talk about some of the common features
in all this work.
New ideas are coins of the realm.
I mean, your job is to come up with new ideas to solve,
in other cases, to solve a problem
that maybe others have identified.
The highest impact on this kind of work is generally done in teams. So you're always
working with other people. Customer pain points are generally good motivation for research.
Partnerships are often worth nurturing. There are many activities that you're required to do as a
researcher.
It's not just doing research, but it's also participating in the research community.
It's writing research papers, it's reviewing research papers written by others, and everybody
feels under time pressure to do well at all of these things.
And so learning how to align your activities so they're all pointing at the same goal are important.
So all that is true for everybody.
But beyond that, academic research, product development, and industrial research are different in many ways.
Academic research tends to be entrepreneurial, that the professor is generally running their own research group.
That means you're writing grant proposals.
You're expected to teach.
There are committees. Everybody's got to do their share of committee work.
So it's a very complex job, but when it works out well, it's super exciting. It's really like
running your own company, although you're doing it in the context of a research group.
Product development is quite different, and so is industrial research, because
you end up doing a larger fraction of the work yourself.
In product development, you're writing specifications, you're writing a lot of code.
Speed is a virtue. You've got to be willing to live with the fact that there's often insufficient time to do the complete solution you want
because the product's got to go out the door at a certain time. And if you're not ready with your piece, well, the train's going to leave the station whether you're on board or not.
And so in order to ship products, you have to learn how to steer a path to the right technical compromises
of what goes into the product and what gets saved for the next version.
And when you're in academia, you don't have to do that.
You just basically include everything you want to include,
and it doesn't have to be product quality, so it's okay.
On the other hand, shipping to a large audience is really a kick.
I mean, getting feedback from grateful customers,
it's a unique emotional experience that is really wonderful
when it all works and makes working long hours super worthwhile because you've really done
something that has a tangible effect in the world. Now, where does industrial research fit in this?
It's somewhere in between, right? It's research, but you're doing it in an industrial setting.
Well, the main thing is that we have more time. We are not
under the same time constraints, so we can actually work out the details.
We have more control over selecting our problems, and so we can identify problems maybe the product
group isn't even ready to think about yet. And as I said, de-risk them, you know, get it to the point
where the product group can pick it up and feel like they can put it on a schedule and they know
how long it's going to take and have lots of confidence that it's actually going to work in
the end. We talked about what gets you up in the morning, Phil, but now it's the time on the
podcast where I ask what keeps you up at night. So what kinds of things keep you up at night and
what are you and your colleagues doing about it? What I think about most is, am I working on the
right problem? Really problem selection is everything in research. If I solve it, is it
going to have high impact? Is it likely to be something that a product group is going to pick up? You know, what is the barrier to actually making it real?
Maybe I understand the nature of the problem very well,
but I don't have any really brilliant research idea on how to solve it.
And sometimes I worry, you know, whether I'm just too far ahead of my time,
which is a unique thing about industrial research.
You know, we don't tend to work on problems that the product group can solve. They're every bit as smart as we
are, and they have a lot more people. And so anything that they're going to do in the next
couple of years, it's really not a good idea for us to work on. We have very little added value.
And we don't really want to be working on stuff that's 10 years out. That's
a good thing in a university, but you've got to pay the bills. So we tend to work in this two to
five year range. And there are times that I just get it wrong. I just, I think this is going to be
an important thing four or five years from now. And two years into it, it still seems like it's
going to be four or five years.
And it's just the goalposts keep moving out.
And I think maybe this was not the ideal place to be.
So that's probably the biggest thing that I worry about outside of doing the work itself.
You have a long and varied path in high tech.
We've alluded to it a bit in our conversation.
But I'd love to hear your story. Tell us about your roots, your journey, and your ultimate path. You know,
give us the Reader's Digest version. Is Reader's Digest even a thing anymore?
Give us the Twitter. I'm old enough to know. Give us the tweet version. Sure. And it really is a
journey. And when I look back on, there are so many forks in the road where if I had taken the other path, it would have turned out very differently. And I had no idea how. So I got a PhD in computer science and I had a choice between a research lab and a university. I went to a university. I became a professor at Harvard. That all sounds very impressive, except that at the time, Harvard's computer science department was not very good.
And so it's...
You made it good, Phil.
It was impressive to people in the real world, but in the computer science world, it's like, why would you go there?
And I had done a lot of consulting on the side, partly to enrich my understanding of real problems and partly because universities
don't pay very well. And that led to a gig with a startup doing a computer development,
and they ultimately offered me to be in charge of their whole software operation. And so I left
academia, I became vice president at a startup for two years. And after about a year and a half, I decided I
really hated it. And that was not the right place for me. And I actually went back to a university
for a couple of years, completed some research that I had been doing before that sojourn at a
startup. And then they shut down the university, which was a bit of a shock. But it was a startup university. It was
called Wang Institute of Graduate Studies. Its goal was to create a professional degree program
in software engineering, which is still a very good idea, much like a law school versus a
philosophy department or a medical school versus a biology department. Anyway, I had to go do
something else. So I went to work for a hardware
company, Digital Equipment Corporation, and worked on their transaction processing products for a
while, and then their middleware for data integration for a while. And then they started
unraveling. I seem to have a history of this. And I don't think I was a cause, but because of this work I had done
on metadata management and integration, I got a call from Microsoft to be architect for a
development that they were doing in this area, Microsoft repository. And so I took it and that's
what brought me to Microsoft. I worked in that product for four years, at which point it became clear to me that there were just other things that the company thought were more important.
And so I moved back into research.
And that's where I've been ever since.
Are you an East Coast kid?
I grew up on the East Coast, yeah, in New York.
And then went to school in Canada, University of Toronto. And then
after that, I moved to Boston and I had this long string of jobs in Boston, Harvard, the startup,
back to another university, then Digital Equipment Corporation. So all living in the same place.
You did the big jump to the West Coast by Microsoft.
Yeah.
Tell us something we don't know about you.
I've been asking this in the context of whether it's a personal trade or a defining life moment that may have influenced a career in research.
But if I'm honest, I actually just want to know what goes on in the lives of researchers outside the lab.
So however you want to answer it, Phil.
Something about me that you wouldn't ordinarily know.
I am fascinated by finance,
investing. Now you might think that, oh boy, you know, he really likes to make a lot of money and
I'd love to make a lot of money. I'm actually really bad at it. I mean, it's like I don't
manage my own savings. I delegate that to a professional. But what I like about it,
it's endlessly complex. It's always changing, and there's one success metric.
There's no way to fake it.
Either you're making money or you're not.
So I'm just totally hooked.
I mean, I read a lot about it, and it's a hobby.
I get no really personal benefit.
My wife makes a joke that I sound very good.
You know, people ask me about investments, and I sound extremely knowledgeable and all.
And then she looks at me and she says, but how come you can't make any money?
Well, you know, you can't have everything.
Well, Phil, it's time to wrap up.
Before we go, I want to give you a chance to offer some parting advice to our listeners.
And many of them are just getting started on their path to high-tech
research. And you're a veteran in the field, so you're in a unique position to impart some wisdom.
Knowing what you know, and having done what you've done over the course of your career,
what thoughts would you share with our audience? I'll give you the last word.
Thanks for the opportunity. I actually have strong opinions on this one.
I think the most important thing is to know what you're optimizing.
And I think there are only four possibilities, money, power, fame, or personal happiness.
Now, everybody wants all four, but if you don't prioritize one of them over the others,
you might not get any of them at the level that you really
want. There'll be many forks in the road along the way. And if every time you face that fork in the
road, you choose based on a different optimization criterion, you're lowering the chances that you're
going to get the one that you want most. But beyond that, there are many other little snippets of
advice. I'll try to do them quickly.
Early in your career, choose your research area for the long term.
It's so easy to pick something because it's a hot topic, but if you want to succeed in
a big way, you want to be an expert at something that's going to be super important 15 years
from now.
When you've gotten past that apprentice journeyman stage, you're now
considered to be an expert. And now this thing is super important. And you've had 15 years to
really become one of the best people working in that area. So choose a topic where your incremental
value is higher, which means probably it's going to be an unpopular topic, which means you have to
be brave. Exploit what you're good at, but also work around what you're not good at and look for opportunities to grow.
Also, you want to exploit synergies with your environment.
Based on what's around you, you can get research leverage from the fact that your company is really good at a certain something,
and therefore you have
a competitive edge in working in that area.
But despite all of this, you still want to be flexible.
Opportunities will show up randomly and that may turn out to be the most important thing
in terms of your long-term success is that you grab the right opportunity at the right
time which might have been leaving behind something that you had actually invested opportunity at the right time, which might have been leaving
behind something that you had actually invested quite a bit of time in. And then finally, a piece
of advice I got very early in my career as a researcher, which is if you want to be good,
write a lot of research papers. If you want to be great, never publish a weak one because
you want people, when they see your name on a paper,
they want to say, oh, his or her papers are always super interesting. I got to read this one.
If only every third paper you write is like that, much less likely to get their attention.
I'll stop there. I could stay here for a long time because you're giving me some advice that I could use.
These are great.
Phil Bernstein, thank you for coming on.
My pleasure.
Thank you, Gretchen.
To learn more about Dr. Philip Bernstein
and the latest research in database management,
exploration, and mining,
visit Microsoft.com slash research.