ACM ByteCast - Luiz André Barroso - Episode 20
Episode Date: September 27, 2021In this episode of ACM ByteCast, Rashmi Mohan hosts 2020 ACM-IEEE CS Eckert-Mauchly Award recipient Luiz André Barroso of Google, where he drove transformation of hyperscale computing infrastructure ...and led engineering for key products like Google Maps. Luiz is a Google Fellow and Head of the Office of Cross-Google Engineering (XGE), responsible for company-wide technical coordination. Prior to that, he was Vice President of Engineering in Google Maps and led the Core team, the group primarily responsible for the technical foundation behind Google's flagship products. Prior to Google, Luiz was a member of the research staff at Digital Equipment Corporation and Compaq, where his group did some of the pioneering work on multi-core architectures. He co-authored The Datacenter as a Computer, the first textbook to describe the architecture of warehouse-scale computing systems. Luiz is a Fellow of ACM and AAAS. In the interview, Luiz looks back on growing up in Brazil, and how family played a part in his early affinity for electrical engineering which progressed to computer engineering. He recalls his master’s advisor, who stimulated his fascination in Local Area Networks and queuing theory, and how this got him interested in computer science. Luiz also talks about his first job in computing at IBM Research in Rio de Janeiro, and his PhD days at USC in Los Angeles, which got him involved in computer architecture and gave him an early taste of both research and practice in memory systems. He shares of his unique experiences in moving from hardware to software engineering at Google and from areas of high professional expertise to “areas of ignorance,” and how an engineering education prepared him to scale new heights.
Transcript
Discussion (0)
This is ACM ByteCast, a podcast series from the Association for Computing Machinery,
the world's largest educational and scientific computing society.
We talk to researchers, practitioners, and innovators
who are at the intersection of computing research and practice.
They share their experiences, the lessons they've learned,
and their own visions for the future of computing.
I am your host, Rashmi Mohan.
Having extensible skills in our industry is always a boon, but rare is the individual
that straddles the world of hardware engineering and software programming with elan and expertise.
Our next guest started out as a hardware engineer, pioneering significant areas of computer architecture,
and then seamlessly went over to the world of computer programming to scale and conquer new heights.
Luis Barroso is a Google Fellow and the head of the Office of Cross-Google Engineering,
which is responsible for company-wide technical coordination.
He drove the transformation of the company's hyperscale computing infrastructure
and has also led engineering for key products like Google Maps. He's a published author,
a prolific researcher, and most recently, the winner of the 2020 ACM IEEE CS Eckert
Moushdi Award for leading the design and development of warehouse-scale computing
in the industry. Louise, welcome to ACM ByteCast.
Oh, thank you very much, Rashmi.
I'm delighted to be here.
We are super excited to have you.
And I'd love to start with a simple question that I ask all my guests, Luis.
If you could please introduce yourself and talk about what you currently do, and also
give us some insight into what brought you into the field of computing.
Sounds good.
Yes, I am Brazilian by birth.
I've been in the US for over 30 years now.
I came to the US for my PhD, like so many of us immigrants in this field.
What I currently do, I worked at Digital Equipment Corporation before I came to Google.
I've been at Google for just about 20 years now.
As you mentioned, I'm currently leading the office
of Cross-Google Engineering.
And this is a relatively new office
in which my responsibilities are
to coordinate the technical roadmaps
that actually need to be consistent
across all of our products.
So if you think about technology behind Gmail
or photos or search or assistant,
there are bits and pieces of it that are common to all of our products. And those are the kinds
of things I tend to focus on, the building blocks that all our developers internally at Google use.
And in particular, this year, I've been focusing on two key areas for us. One is, of course,
machine learning infrastructure, the other one being privacy and security. And I should tell
you what drove me into this field of work to the second part of your question. I always,
since I was eight years old, I guess I knew that I wanted to become an engineer and actually an
electrical engineer. My grandfather,
who was a physician in the Navy, for some reason had a hobby in amateur radio. In Brazil,
if you had an amateur radio back in the day, you have to be a little bit of an electronics hobbyist because those things are breaking all the time. So I spent some time with him. I was
fascinated by what he was doing. Very early on, I decided that that was really cool. And that's what I wanted to do.
And from then on, it went from electrical engineering to then sort of computer engineering
and then from computer science over the years throughout grad school and a couple of companies.
That's great.
You know, I mean, it's nice to have that inspiration at home and definitely somebody to sort of
tinker with.
Were there other role models, Louise, that as you sort of got into the world of computing, as you studied computer science in
college or electrical engineering, were there other role models that sort of influenced
your interest, especially in the field of distributed computing?
Yeah. I mean, of course, lots of role models across. I was lucky to be able to work on very early efforts on local area networks during my undergrad,
and I found that fascinating.
The idea that relatively inexpensively at the time, you could actually have multiple PCs sort of talking together
and coordinating to resolve a task.
There was a project in my university that was looking at some categories
of local area networks using CSMACD, which is a collision-based arbitration protocol for access,
but we had interest in real time. So we wanted to actually have limits so that even when the
network was very congested, we still could guarantee that some packets would get through by a given deadline.
I found that fascinating.
It went from there to a class in queuing theory.
And I think queuing theory was the area that just got me infatuated with computing itself as opposed to electrical engineering. I had a wonderful master's advisor in Brazil,
Professor Daniel Menasseh,
who probably is very much responsible
for my going to this field
because he was not only a great mentor,
but a wonderful teacher.
So he probably was the one who got me going
in this general area.
I just found queuing theory magical.
You could write these equations that you knew
that were based on principles that
real computing systems could absolutely not adhere to. And yet, somehow they worked. They predicted
what would happen, say, when a particular wide area network was getting congested or not,
and helped to design systems that performed well by just using math. I just thought that was just mind-blowing.
That certainly sounds like something that, yeah, it's incredibly fascinating. And to see the
application of that into what might be a real-world problem that the industry might be facing.
What was that journey like, Louise? I mean, so you were a student and you were working on these problems with your advisor. How did you then start to look at it as like,
how do I actually apply this into the real world? I mean, did you get a job straight out of college
or did you stay in academia for some amount of time? Yeah. So during my master's in Brazil,
Daniel actually got me an internship at the IBM Rio Research Center. And that was my first, I guess, computing job, so to speak.
And it was a real privilege to actually have access
to the kind of computing and not just computing,
but just the IBM library.
You know, I could actually read papers
that were otherwise very hard to get in Brazil.
So it was a tremendous boost for my enthusiasm
in computer architecture to be part of that group
during my master's. When my master's ended, you know, as opposed to the situation today where
Brazil was booming with entrepreneurial efforts in computer science and, you know, lots of great
universities graduating, lots of great students and really interesting jobs in Brazil for computer
scientists. Back in the day, it wasn't quite as much. It was very early.
And the idea of actually going for a PhD seemed to be the most fun thing to do because the jobs
that were going to be available for me at that time in Brazil didn't seem quite as exciting.
So I applied for PhD programs in the US. And with the help of Daniel as well, I ended up going
to get my PhD at USC in Los Angeles.
Got it. Yeah. And what did you do during your PhD? What were kind of some of the problems that you were solving then?
Yeah. So during my PhD, I think that's when I really began to get interested in computer
architecture. We had a strong computer architecture group at USC at the time. And my advisor,
Michel Dubois, was working on a lot of interesting problems in memory systems,
in particular, cache consistency models, cache coherency protocols. You have to remember those
are the times where we're beginning to think about shared memory multiprocessors are going
to be the way to go. And for shared memory multiprocessors to shine, we needed to solve coherence and consistency problems.
So much of my PhD was working both on the practice as well as the research aspects of
memory systems. My PhD thesis was on different kinds of snooping cache coherency protocols
for non-bus-based systems at the time, but I also was part of an NSF-sponsored project to build a multiprocessor emulation platform,
which was possibly one of the first large-scale emulations using FPGAs, which is a technology
that's very useful and very widespread today.
It was quite exotic at the time.
So I had the chance at USC both to work on real system design
as well as to do some research, which was a treat. Absolutely. I think that kind of the opportunity
to do that sort of work, which is not just research-based, but also the collaboration
with the real world problem is definitely something that validates what you're doing,
as well as gives you real data to actually take your research forward.
Yeah, I think you're right.
I have a feeling we probably may talk a little bit more about this,
that research was my way to find interesting problems to work in computing.
And yet my passion was always more in the very, very applied end of research and really engineering.
So the opportunity to be able to actually put in practice as opposed to sort of only do the theoretical work was really, really important to me.
Yeah, no, that makes a lot of sense.
And I would say that's probably one of the main reasons
why we thought you would be an excellent guest
on our podcast,
because that's what we try to focus on as well.
Like, how do you bring that research into practice?
Most of our audience are also practitioners
who are saying, okay, how do I, you know, there are these great problems that
are being solved in the academic world, but here are some of the, you know, on the ground problems
that I have and how can I apply that research to make my life easier? But from there, Louise,
I mean, you know, obviously some of the most prolific work that you're famous for is the
warehouse style computing. I was wondering if you could just explain that concept to our audience
and how did you get into it?
Sure.
We're fast-forwarding now to around maybe 2004, 2005,
when the size of the minimum Google data center, if you will,
was beginning to be much bigger
than any third-party co-location
facility could provide us. So we had to build our own data centers. When we began doing that,
we suddenly were a very vertically integrated company because we are now in the position,
we had already been designing our own servers. We're beginning to design our own networking
switches. We were designing our own storage systems, our own sort
of distributed file system appliances as well. So at that stage, I don't know if it was the first
time in history, probably not, but certainly it was a unique time for us in which we were designing
just about every piece of that thing, of that data center, from the building shell to cooling infrastructure to the power substations
and power distribution, UPS systems, emergency generators, and of course, the servers and the
hardware and the software. And the interesting thing about Google scale at the time is that products like Google search or Gmail didn't run in,
in a one or two machines or in one or two racks of machines.
These were such large scale deployments at the time that Gmail is a software
that runs in the building and search is a piece of software that runs in the
building.
The moment you kind of realize that,
it becomes pretty obvious that that building is indeed the hardware in which your software,
say search, is running on. And when you begin to look at the design of that whole facility
from the perspective of a computer architect, that that is your design. Your design is not
this machine that then you wire up with networking with other machines.
The computer you are designing is that entire building.
Opportunities for efficiency, both in terms of performance, cost, and then energy efficiency, appear from everywhere.
Because it was an area that really didn't have that much work. We had no other known team that was taking
such a holistic approach. So that was a really exciting moment because we really had blank
sheets of paper to think through. A lot of engineers with very little expertise, but not
short of confidence. It was a fun time. Yeah, that's great. I actually listened to the six-part series that you put out on Google's
data centers. I thought it was fascinating. It was definitely something I'd recommend to the
others. But I think I heard you say in one of those interviews that you were thinking about
being thrifty. So it was not that as a company, maybe you were early enough where you
didn't have like a massive bank balance that you probably do now in terms of, you know, money is no
cost, is of no consideration. But how did that play into some of the design choices that you made?
That's a great question, Rashmi. Just to be clear, money is absolutely important today.
Okay, I take that back. It's a very competitive field.
But you are absolutely right in that there is a,
Sergey likes to say that, I don't know if that's his
or he's quoting somewhere else, that scarcity breeds clarity.
And certainly it was the case for us, right?
We just could not afford to build the kinds of data centers or servers or buy the kinds
of networking gear that other people were buying.
We just didn't have that kind of money for the scale we had.
I joined Google during a time where we actually were not making a profit at all, right?
So we really were forced to be very, very thrifty. And honestly, even before I
joined Google, I think the people that were there before me were already pioneering this idea of
thrift. We were already beginning to put together our own motherboards for our servers using desktop
class components as opposed to server class components, which were much more expensive.
Because we just realized we couldn't afford the server class components. So we had to make
the cheaper desktop class components work. That's just one of them. We couldn't afford the really
fancy storage appliances that you could buy at the time. So we decided to put just regular desktop
class disk drives in every server and instead create a distributed
file system that was later we published a paper about it called GFS that has evolved into something
called Colossus at Google today because we just could not afford to buy the fancy storage appliances
that existed at the time. And the same thing happened with networking. Networking today,
I think, is at a price point that is probably much
more reasonable than it was at the time, especially high-performance data center class networking.
Buying the kind of bandwidth we needed within the data center using the vendors at the time
was just something we could not afford. So we decided to see if we could build distributed
switching fabric that was built out of inexpensive switching
components as well. So a lot of it was driven by the fact that we really didn't have another
option. We didn't have the money to do anything else. Got it. I appreciate that very much. So
the question I have then, Louise, is today there are, smaller startups or smaller companies in the position that you were in.
Do you see that hunger? Do you see that innovation coming from those areas as well? And also, like
you said, it's still a strong consideration, even at a place like Google. Do you feel like that
drives you to constantly improve and build more efficiencies, either from a cost perspective or
as an energy perspective? Or what are the other key considerations that you keep in mind as you keep looking at this
problem to look for more innovation?
Yeah, this is very much still part of our DNA, Rashmi.
You know, many of our teams brag about, you know, two, three, four percent performance
improvements, right, all the time.
We have many ways internally to actually recognize
people that work on performance and efficiency. So this kind of work is still very much elevated
at Google. And it's something that I think it's going to continue to be part of our success going
forward. Even though we actually have a little bit more cash reserves these days than we had
sort of back in the day, the headwinds coming from, you know,
the end of the NAR scaling and all of that
are very significant.
So for us to continue to have viable,
high performance, very compute intensive services
that do more and more amazing things,
it requires an obsessive focus
on improving our efficiency at the harder and the softer level
just about every month, right? And to a point that I think that every once in a while,
we may even overcorrect in that we sacrifice so much complexity at times to get the next half a
percentage point that at times we look at it and scratch ourselves in the head
and say, you know, maybe that's actually was too much because we may have sacrificed other things
in order to get that set of efficiency. So if anything, I'll say that the problem we have today
is to make sure that we don't over-optimize things to a point that we create a complexity
that makes our systems actually difficult to evolve.
Got it.
Yeah, no, that makes sense.
Do you think, Louise, that obviously the scale of any sort of compute that Google sees is
not probably what, if I looked at the average business out there, is going to see?
Is this a problem for other companies as well, other than the handful that sees the kind
of scale that you do?
What should everybody else be thinking about?
I guess that's my question.
Yeah.
So I think there are two classes of companies out there, right?
The companies that were not born in, if you will, the digital era, so to speak, right?
And those probably will continue to have more of your typical enterprise or your bank or your grocery store or your drugstore.
Those probably are less likely to have the kinds of computing requirements that a company like Google has.
They have different kinds of requirements that are very important for their businesses.
But we increasingly see a very large number of what some
people call digital first companies, companies like Snap, for example, or companies like Twitter.
And there's a longer tail of companies of that caliber that begin to have rather significant
needs for computing capacity. And in that case, I think that over time, there's going to be a larger number of companies, more than the top four or five large technology companies today, that will actually have the same for Google's own use that we're seeing over time becoming more and more useful for our cloud customers.
And it'll be really exciting to be able to continue to offer more and more of those.
We already do this today with things like Spanner, for example.
And so that, I think, is a really great opportunity in which these companies can take advantage of these kinds of problems, but they don't have to actually solve the entire data center building and efficiency problem from scratch.
They can take advantage of a solid infrastructure provided by good cloud providers.
Absolutely.
Yeah, I think that helps them focus on the business problems that they're trying to solve without having to worry about the underlying hardware or just managing of infrastructure.
But one of the other questions I had, Louise, as I was listening to or trying to do more research on the kind of work that you do was also, I mean, we've heard about Moore's Law and how it may not be as valid today or is probably on the decline, if you will, in terms of how valid it is for us today.
What are some of the strategies to continue to build efficiencies? I know that in some cases,
building efficiency in the software side is considered a way to extend Moore's law, if you
will. What are your thoughts around that? Yeah, this is really important. I mean, to be clear,
Moore's law is ancient history, right? As originally articulated. I happen to be about the same age as Moore's Law. Moore's Law was
articulated pretty much when I was born. I was rather sad to see it go away, but I got over it
because that happened like 15 years ago. The kinds of improvements we see coming from the circuit
level today are nowhere near anything that I experienced for
the majority of our professional life, which is scary and exciting because it's a new challenge
for all of us. You mentioned one of the aspects of it, which is how can we now work on efficiencies
at the various layers of the software to continue to squeeze continuing improvements in performance
over time.
And we love to do that at Google.
Like one of the things that a lot of our infrastructure teams do,
say teams that build database systems or data processing systems, for example,
or say machine learning accelerators, is to say,
how can we recreate that magic from the 80s and 90s
where you could write a program on an x86 machine, and you could go to
sleep for three years, and you wake up and you ran that program again, without having to do anything
to even maybe recompile the program. And the program will run at least twice as fast.
That is something that the hardware itself alone, or if you the circuits themselves are not giving
it to us.
So we need to find other ways of doing that. I mentioned one of them, improvements in the
software stack. The other ones are a rich area of exploration for us, which is hardware acceleration
for some specialized kinds of computing. We started the TPU program, which is a set of chips that Google built to run deep learning
applications several years ago, which is one way that we have been able to achieve several orders
of magnitude, well, I don't know, several orders of magnitude, but certainly 30, 40x improvements
in overall performance or energy efficiency for these kinds of applications,
for building a hardware that's very dedicated to that. And this acceleration, hardware acceleration
trick, if you will, it's not a panacea, but there are still quite a few areas that we can still
apply them. More recently, Google talked about a chip that accelerates encoding for video processing so that video conferencing systems like the one we're using today becomes much more efficient to run at the data center level because the processing of audio and video streams is optimized by special accelerators.
Finding the right things to accelerate is tricky because if you want to build hardware for something,
it has to meet these two criteria.
It has to really be specialized enough to something
that it has a big edge over general-purpose computing.
But it also needs to have a big enough market
because otherwise, you know, you don't want to build something
that's going to also require, only require 10 chips, right?
You want to build an accelerator for something that's going to be very broadly used.
When you need the intersection of these two factors, then the number of things that can
really benefit from hardware acceleration goes from potentially on the ideas level,
a very large number to a more modest number that I think is still quite interesting and
we're still exploring it. It is actually a high bar for big gains in hardware acceleration.
Yeah, no, that's an incredibly valuable point that you bring is really to identify
what are the areas where that acceleration is going to be valuable as well as needed in the
future. In some ways, you also have to predict where this is going to go and start to sort of
think about those problems ahead of time. But going back to your introduction, Louise,
you're one of those few people that actually navigates the world of data center and infrastructure
optimization, and also Google Maps and Earth. So what drew your interest? I mean, how did you
sort of build skills in both these sort of somewhat diverse areas of computing?
You know, I'm generally driven by areas of ignorance.
I don't know. I think some of us are just wired that way.
In fact, when I came to Google, you know, I was a micro architect.
I was designing a chip at Digital.
Came to Google making a relatively rather late career change to learn
how to be a programmer. And a programmer, of all things, for internet services, something that I
was certifiably incompetent at. So I tend to kind of work that way every four to five years.
If I begin to feel like I'm finally kind of really understanding what I'm doing,
the world is so interesting and the computing and engineering is so interesting that I'm
suddenly much more interested in poking an area of complete ignorance than continuing
to build my expertise in an area that I have already been sort of working on for a while.
So that was the story with Maps.
It was actually a double career change in the sense that I both went to an area of complete technical ignorance, but I also went from being an engineer, an individual contributor, to managing a team, to becoming then a VP of engineering in GEO, which is our internal name for the Google Maps team.
And boy, it was just a fascinating area.
Fascinating technically, and we can talk a little bit about that. Fascinating
because of the sense of mission that the team had, the energy on the team. This is a group of people
who really understood how important the tools and technology they were building were to people all
over the world. And you could see that in the enthusiasm that they brought to work sort of every day.
You know, imagine the idea of being able to make it easy to find local shops so that, you know, the overall vitality of our small downtown sort of all over the world, if you will, is preserved in the Internet age.
You can still find your local shop.
You can still find your local shop, you can still find your local deal. That's one of the things that Google Maps as a product does, which I think is truly inspiring.
And in the case of Google Earth, it's a team that is really interested in storytelling and getting people to more deeply understand how amazing our planet is and what are the things
we can do to keep it this way.
So if you go to Google Earth these days, it went from being a platform for GIS, geo-nerds,
to being a platform for storytelling about the world, while still actually being a great platform for mapping nerds.
It was an incredible area to be in. From a technical standpoint, it was an
amazing challenge because we wanted to map the entire world. And it's kind of expensive to do
that, especially if you can't imagine sort of driving every street in the world every other
day to see what's changing. It's kind of a big planet. And in particular, how do you do that
in an economically viable way, even in areas
where Google's business was not very strong, say parts of South America where I come from, for
example. So there was a challenge to make it, you know, to go back to thrift, to make it cheaper and
cheaper and more efficient, to create a real time accurate representation of the world that helped you discover what is around you.
And we went into this journey of being probably one of the most aggressive adopters of machine learning technology
to automate this process of understanding the world,
whether it's from imagery we took from space or from cars or from contributions to Google Map users,
and using ML to make that process maintainable
to keep the map up to date and increasing its coverage.
And this mission of making sure that everybody in the world
has a rich digital map experience
that helps them navigate and discover things in the world
anywhere they live or work,
was just something that really drove us.
It was a fantastic time to be a part of that journey.
You know, the sheer passion that you have for the product as well as the mission
definitely comes through in your answer.
And I have to say, for me personally as well, and I'm sure that's for many users,
Google Maps is one of my favorite products as well,
simply because for the amount that's for many users, you know, Google Maps is one of my favorite products as well, simply because, you know, for the amount that, you know, I used to travel, just the feature of
having an offline map to be able to navigate an unknown area is fascinating. And I have to say
that, you know, even Google Earth, like you said, I think it opens up the world for you. I volunteer
with an organization that works in education, especially in India. We use Google Earth there
to basically, you know, introduce like the Taj Earth there to basically, you know, introduce
like the Taj Mahal to somebody, you know, in a remote village in India who may have never seen
it, and literally brings history to life. So, you know, I completely understand the larger purpose
behind working on a project like that. What I do also want to ask you, though, Louise, is I mean,
there's two parts to your answer that I want to sort of dig into a little deeper. One is when you
make a switch like that in your career, right, that comes at a risk. I mean, you've built your credibility in a certain
area for some amount of time, and you're now going back to being completely a novice in a certain
area, you're going to be the person asking the dumbest of questions, even if there isn't such
a thing. How do you prepare yourself for something like that? I mean, I would say most of us want to
do that. We want to try a variety of roles. But there is a certain fear of like, you know, am I giving up all that
I have built over these number of years? Yeah, you're right. There's no two ways about it,
right? It is scary. And it's also not the only recipe for career progression at all, right? I
mean, we both of us probably know so many people who are the most successful people in our field who actually continue to work on one particular area of engineering or computer science.
And they are unbelievably successful at that.
But it's fun that, you know, somehow you're going to figure out how to do things.
And, you know, I think it requires a little bit of suspension of disbelief. hinted to humility. And you overcome these things when the reasons why you switched were
based on excitement and passion. If you're doing career switches because you feel like it's going
to be the, you know, important step in advancing your career, you use a, you know, given kind of
sort of value systems and considerations for it.
And people should absolutely think about those. When you make switches that are more based on,
I'm really excited about this. This thing looks really interesting. Then I think you are able
to tolerate, if you will, the fact that you are going back in many ways and being the least
informed person in the room for a while. The magic that happens sometimes though, not all the time,
is that I've seen that when we hire new senior people in particular at Google sometimes. And there's a magic that happens
when somebody that's clever and has experience that comes into a new area with a fresh pair of
eyes that can see things that those of us who think of ourselves as experts in those areas
may miss. And every once in a while, I think I was able to contribute a little bit in some of
these career switches by being that guy that's going, wait, wait, wait, but why are we doing
it this way? And it just turns out that the people who had been living in that area hadn't really
thought about questioning that. And sometimes it's something that was based on a good reason
three years ago, and that reason had a sell by date, and it's time to do something else.
So that's a part of it that I think that is reassuring, is that especially if you bring to a new area, which is the fact that
you are probably not dumb and ignorant about that area. And you can contribute by bringing
a different perspective. Yeah, that's great. I think you just articulated the overall business
case for diversity in your team, right? And we talk about, of course, gender and other forms
of diversity, but really diversity of background as well is so critical to uncover some of these challenges that
maybe the team that's currently been very close to the problem hasn't seen. And I think the other
thing that I really enjoyed what you said is that, you know, we typically tend to think that the risk
is lower earlier in your career to make these sort of switches. But from what you're saying,
it sounds like you're kind of, you probably are, you know, there are certain traits that you develop over a period of time
in your career that better set you up for making these switches and still contributing and adding
value. So thank you. Thank you for sharing that. I think you're right, by the way. And when I talk
to interns coming to Google, I do think that, you know, sometimes students and interns get so anxious about the decisions they make
super early in their careers, right?
I don't think they're actually that important.
So I think that even at that level, even if you probably don't want to be doing a huge
amount of job hopping earlier on, those early decisions are not that consequential in the
long run.
Got it.
Yeah.
And that's very liberating, Luis.
So thank you. Thank you for saying that. The other part of your answer from prior questions
that I wanted to tap into was, you know, in one of your interviews, you spoke about teamwork
being very integral as a quality for today's engineers, right? I mean, always, we all understand
the value of teamwork. But you also spoke about the interdisciplinary nature of your team when
you were building the data center infrastructure, leading to the success of that.
I'd love to hear more about that, because as I was hearing you talk about the fact that
you had, you know, engineers from various disciplines contributing towards building
the solution, it just sounded like there was some magic happening there.
Yeah, there are two aspects to your question.
First is teamwork itself, and the second one
is looking at things across layers of the stack or even across engineering disciplines.
Let me talk about the teamwork first. When I joined the field a while ago,
I think that there was this vision that computer science and programming and research,
or sometimes even harder design, was more like an individual sport. Certainly programming, and maybe harder design at that time,
wasn't quite that way anymore. For the things that we do at Google today,
the complexity of what we're tackling and the scale of the things we're building are inconsistent
with individual superstars just carrying the day. We love to have individual
superstars everywhere, but they are superstars because of their contributions to a team.
Because things at Google don't get accomplished by Louise or anybody else. They get accomplished
by teams that are incredibly high performing teams. So we begin to judge our people and ourselves
more and more, not by their individual brilliance, and more so by, you know, when they are on the
team, is the team better? And when they are not on the team, does the team suffer? I think this
is a key aspect that I think many disciplines go through. I think if you look at certainly particle physicists these days, these are huge teams,
right?
A lot of things in biology, these are huge teams because of the complexity and scale
of the things that they're solving.
And I think computing has gotten there.
So the importance of knowing how to build high-performing teams is much more important
today than it was when I graduated.
So that's one aspect of it. The other aspect of it is how fun it is when things that seem impossible to solve
at one level of the stack begin possible if you actually have access to the level, say,
above or below it. I'll give you a couple of very quick examples. We're building servers at the time,
and the efficiencies of our power supplies were terrible, like it was across the industry.
You feed a power supply that transforms AC to DC with 100 watts, and you lose 20, 25 watts
just on the power supply itself. That was a bit of a bummer. And we realized that there was one way
to make power supplies more efficient without making them more expensive, which is to say,
well, the power supplies that we buy at the time, because of the standards they're built on,
had to provide power at various voltage levels, you know, 5, 3.3, minus 3.3, 12 volts, right? All of these power rails,
because these were building blocks that could be used in sort of any kind of computer.
Well, we knew exactly what computer we were building and we didn't need all these power
rails. And it turns out that if you just say, let's build a power supply that just gives us 12
volts, nothing else, you can actually simplify the design of that
power supply and achieve efficiencies that weren't available elsewhere. But you can only do that if
you're designing a motherboard as well, because now you have to design a motherboard where there's
no 3.3 volts coming from the power supply, that if you need that, you need to actually create that
yourself. So by designing these two things at the same time, in one side is a power electronics engineer, on the other side is a hardware engineer, that suddenly,
because they're working together, they find a way to create efficiencies that wouldn't be possible
otherwise. Does that make sense? Absolutely. Yeah, it makes a lot of sense. But you know,
the follow up question to that is, not every organization has the wherewithal maybe to have all of those disciplines within the organization.
So do you see opportunities for this being like industry-wide collaborations?
Yeah, I think that those can be certainly industry-wide collaborations.
The open compute area is one that we collaborate with industry in these areas, for example.
And that's one way of doing. But even in companies that say are not dabbling with harder design, the kind of story that I told you could actually happen
in the software layers between, say, a database layer and an application layer of a given team
in a completely software company. That if those folks suddenly begin to talk to each other,
they might be able to find opportunities for optimization that otherwise would not have been available if they were just trying to think about solving
their problems in isolation.
Yeah, no, you're absolutely right.
I mean, and sometimes it just takes us taking a step back from, you know, just delivery
on a specific roadmap to then think about a larger problem that we're trying to solve
and come together as maybe a consortium of people
working in different disciplines within the company, but to solve a more sort of maybe
futuristic problem that will actually bring all of these pieces together and look for ways in
which we can collaborate. Yeah. The other example I'll give you that does go across companies is
when we're trying to figure out how to make data centers more energy efficient. And we did that in
several dimensions in the cooling systems and the power distribution systems. But in this case, we're
trying to figure out how to make the actual chips, the actual computing part of the data center more
energy efficient. And for that, we needed help from microprocessor design companies and from memory
manufacturers. And what we began to do at the time is doing some studies at Google to understand
what are the features we need from microprocessors, for example, to make the whole data center more
energy efficient as a whole. And this went into something we called energy proportionality,
which is something that we needed help from the microprocessor manufacturers to be able to
understand and engage with. So we wrote papers about it. We had great
relationships with the manufacturers at the time, we still do. And across these two companies,
we're actually able to make significant advances in energy proportionality, which is one way of
achieving energy efficiency in the data center. Excellent. Thank you. That's a great example of
that collaboration that happens across those boundaries of a company.
You know, Luis, I'd love to know from you for our final bite, what is it that you're most excited about in the field of distributed computing or in the cross-Google engineering organization that
you're currently a part of? What are you most looking forward to in the next few years?
I'll give you two things related to the two things I'm spending the majority of my time today. I think we're in the early days of taking advantage of what machine
learning can do for society. And we are super excited about making machine learning more
efficient, more easy to use, more applicable to different domains, eventually to make it to a
point where somebody who has no training in programming whatsoever can build an intelligent
system based on machine learning, which I think it would be an amazing way to democratize the
technology. I'm very excited about that entire kind of area. The other area that I'm fascinated
by that I think is really meaningful for us as a society is the issue of online safety.
I think that all of us live a lot of our lives online,
and many, if not all of us, are a little bit nervous about this.
It seems to be a scary world for us,
for us and especially for those of us who have children, right?
And this is something that's front of mind, right?
So one of the things that we're dedicated at doing at Google is to make sure that the online experiences we have with any of our products are the safest places you can be on the internet.
And this is a combination of amazing advances in security, advances in privacy, and advances in understanding the quality
of content and in particular misinformation that are, to me, a grand challenge for our
era in which companies like Google and computing as a whole has a very significant role to
play.
And I am very much motivated by the work that we are doing at Google to make sure that the
safest place you could possibly be online for yourself and for your family will continue to
be Google for years and decades to come. That's very inspiring, Louise, especially because I
think, you know, given in the last year and a half during the pandemic, the number of people
who have suddenly grown to be online has obviously exploded.
And I think the average age of somebody getting on the internet has also come down significantly because that's been our primary way of sort of communicating with each other, of learning,
of doing business.
So yeah, I'm super excited.
I think we're all looking forward to hear more about all the amazing things that you
do.
Thank you so much for talking to us at ACM ByteCast.
Oh, it's a real pleasure, Rashmi.
Thank you.
ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board.
To learn more about ACM and its activities, visit acm.org.
For more information about this and other episodes, please visit our website at learning.acm.org.
That's learning.acm.org.