Software Misadventures - War stories from early days of engineering at LinkedIn | David Henke (LinkedIn, Yahoo)
Episode Date: January 4, 2024At the personal request of Reid Hoffman to emerge from early retirement, David joined LinkedIn in 2009 during a period of rapid growth to help stabilize the chaos, cultivating a much-needed culture of... “Site Up and Secure.” Before this, David served as SVP of Engineering and Operations at Yahoo!, overseeing their Search Marketing organization and the Production Operations infrastructure for the entire company. Throughout his career, David has held multiple leadership positions and is recognized as one of the top operations executives. David’s intensity, passion, courage and commitment to work have always been deeply admired by his colleagues and his wisdom, well captured in one line axioms, better known as Henkeisms, are still echoed at LinkedIn. This episode was first published almost 3 years ago and we are sharing it again because it’s been one of our favorites :) Hope you like it too! Segments: [00:01:37] “This is my freaking site” poster [00:04:10] David’s first 2 retirements and starting at LinkedIn [00:09:41] IC to Management [00:17:20] Site-Up Culture [00:21:58] Re-architecting LinkedIn’s release process [00:27:23] War stories from Yahoo: The 10G Massacre [00:32:06] “Go to work every day willing to be fired”: Project Panama at Yahoo [00:43:33] The power of Individual Contributors Show Notes: David Henke on LinkedIn Learning to Lead - David Henke’s talk on leadership which he delivered at his alma mater - UCSB Project Inversion at LinkedIn Stay in touch: 👋 Let us know who we should talk to next! hello@softwaremisadventures.com
Transcript
Discussion (0)
There's an old adage, quality, schedule, features, pick two.
Okay, engineers typically pick quality and schedule.
Not always, but typically.
Product people typically pick schedule and features.
So there's this natural tension
between quality, schedule, features, pick two.
Of course, when I asked Jeff Weiner this question,
what do you think he said?
Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Gwan.
As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers,
founders, and investors to chat about their path, lessons they've learned, and of course,
the misadvent changes over the years. And for all the new listeners, we wanted to make sure that
y'all didn't miss a favorite conversation of ours with David Henke. David is an amazing storyteller
and has many amazing stories to tell from his time running engineering and operations at LinkedIn,
Yahoo, and Silicon Graphics. Without further ado, let's get into the conversation.
David, it's such an honor to have you with us today.
Welcome to the show.
My pleasure.
Good to be here.
So while preparing for this conversation, I was reading about you.
And when I went on Google and searched David Henke, one of the first hits I got was an
image of you pointing at a screen with some metrics and graphs.
You were pointing at the screen, but you were also looking at the camera.
And what appeared is that you were screaming at the camera.
So I thought we would start with asking you about the story behind that picture.
Okay, well, you're referring to a poster that said, this is my freaking site.
And you can substitute freaking for whatever word you
would like, but I can assure you it's nastier than freaking. By the way, just so you know,
in India, they made us change it to, it's my precious site, because they were offended by
freaking. What happened was, site up, which is something that's important, especially to a
company the size of Yahoo, with so many properties and so many important pages, was not going well. And they wanted to bring
attention to the site. And at the time, I was running production operations, which is really
the data centers, the networks, the 28 pops, and the 35 data centers, and also the 1 million
computers. But the sites were actually run by different
groups like Messenger and Search and Mail and so forth. And things weren't going well. And they
wanted me to bring attention to the site. So I pointed to something that matters to me greatly,
which is sponsored search, which was my first job at Yahoo, which is how we made half of our money.
And if you know anything about calculus, the area under the curve is money.
Right? And we don't want to lose money. And we were losing money because we had some sponsored search problems. And I was trying to make everybody or remind everybody, it's your
freaking site. And site up does matter because that's the business we are in.
Yeah, well, site up does matter. And it is something that we are going to dig a lot into.
But remind me, was this poster also kind of stuck on walls
in various buildings of the company?
This poster, unfortunately, I don't really,
I'm not a very photogenic person
and I'm not a very good speaker,
but I will tell you my ugly mug was in every floor
of every building at Yahoo
and it was in every data center.
And people who did not know me would go, is that you? Man, you are one ugly son of a gun.
I would disagree that you're not a good speaker. I think you're a great speaker,
as we are seeing here today. Well, thank you for that. I appreciate it.
So you've had an amazing career in tech, and you've had a huge impact on companies like LinkedIn and Yahoo, AltaVista, Silicon Graphics.
One thing that while I was researching, I learned that you retired three times throughout your career.
Can you tell us more about these times?
I can. I'm trying to think how to do this nicely. So I left Silicon Graphics after eight years.
And Silicon Graphics was arguably the greatest company I ever worked for. Jeff Wiener sometimes
gets mad at me because LinkedIn really is a great company. But Silicon Graphics, just the fact that
I was there for eight years, I thought we had the best engineers on the planet. And they left to go
to companies like Google, like Netscape.
But when I left after eight years, it was because we had just bought Cray Computers,
and that was becoming the largest player in a dying market, because Google had shown us that
the large number of small computers was better than this mammoth supercomputer. So I just quit.
And I had enough money. I always made money. I was good at that. And I just said, I'm done. And I Times and the Mercury, San Jose Mercury.
And he wanted me to run operations.
And I had never run operations before.
I'd always been an engineer.
So I said, what the heck?
I'm sitting on my butt.
Might as well go get another job.
So I went to work for this little company called Zip2, Elon Musk's first company.
We sold it for $310 million, which got him into the business.
But also, I made some dough. We sold it to Compaq, which which got him into the business, but also I made some dough.
We sold it to Compaq, which at the time owned Alta Vista.
Okay.
And that's how I got into Alta Vista.
And then after four years of that, I retired.
That's retirement one.
I retired because my children were the age of high schoolers, freshman and sophomore in high school.
And it's the one time your kids do not want to be with you.
Young people won't know this yet, but trust me, they don't want to be with you.
But I wanted to be with them, so I retired for three and a half years.
And I did not work.
I took karate.
I took Spanish.
I took real estate courses.
But I also spent a lot of time with my kids.
And then I got
another call, and this was from Yahoo. And that's when Yahoo asked me to come back and help them
with their sponsored search, which was Overture, a company they had bought. And that was responsible
for half their dough. And it was a hell of a thing, and we can talk about that later. But I
came down and did that for two years, and I did production operations for Yahoo for two years.
And at that point, Yahoo capitulated on search and search monetization,
which I was very disappointed with.
And so I retired again.
And then within a few months, somebody I had worked with at Yahoo,
a guy named Jeff Wiener, who's the CEO of LinkedIn
and now is chairman of the board of LinkedIn.
He called me and said, I want you to hear about this company.
And I said, you know what, Jeff, I think I'm not going to be working anymore.
Yahoo kind of left a bad taste in my mouth.
And he brought me in and I talked to Reid Hoffman.
I talked to all the engineers.
But the most important guy I talked to was the CFO.
He said, let me explain this business that we have based on this data that we have,
this incredible set of data and what we're trying to do.
And it just blew me away.
So I thought, heck, I might as well try it one more time.
And that was arguably probably the best business decision I ever made.
And I think looking back, LinkedIn is the best company I've worked for.
We are extremely grateful that you chose to work at LinkedIn. I think Austin and I here have a job
probably because of you, because you built in that site of culture at LinkedIn when you came in.
And before we go there, you mentioned that you were always on the engineering side and
operations is not something that you had at least led before.
So when you went to Silicon Graphics, or sorry, Zip2 at the time, did you know you would enjoy this?
Or were you passionate about operations engineering?
Or do you just want to take a chance?
I actually didn't know if I would enjoy it.
And in fact, I did not enjoy it.
I don't like it, as many of you,
when things do not work. And I will tell you, things did not work at Zip2. It was one of those Microsoft shops. It didn't scale. I was a good enough programmer and encoder where I actually
reviewed changes before they got on the site. Because at that point, I was like, this isn't
getting onto my site. I remember Elon Musk himself trying to write code to get on the site, because at that point, I was like, this isn't getting onto my site. I remember Elon
Musk himself trying to write code to get on the site. Not a good engineer. A very great entrepreneur,
but not a good programmer. And I will say, I didn't know a lot about data centers. I didn't
know a lot about networks and networking. I didn't know a lot about the scale and the issues.
The beauty of AltaVista, when we took
that over, was that was a large problem. It was a large search engine before Google, but they used
the big alpha chip deck machines, and that wouldn't scale because of cost. And so you learn a lot real
fast. And I realized that engineering and operations really need to work very closely together if you're going to scale on the Internet.
And that's a lesson a lot of little companies need to learn, including LinkedIn, when I joined them.
Yeah, it's pretty amazing that you learned all of that on the job.
Also, you mentioned that for the first lot of years of your career, you were a programmer.
But then you went from the ic
track to the management track how did that happen well that's a long story so i was a founder in my
two startups and i was the principal programmer the first startup i wrote 73 of the code
oh wow and we'll all see programming back then if you're interested in that. When I got to Silicon Graphics, I thought I
was a real hot shot programmer. And then after the app, I would say out of the 200 programmers that
were there, I would rank myself 197. And it was a little humbling. And I thought, well, I can look
at this one of two ways. I can feel bad about that, or I can feel really good that I can learn
from these other 196 really smart people that are way better programmers than I am.
And that's what I chose to do.
I work for the tools group and also compilers.
So I was involved with the C++ compiler.
I was the one who brought Java over from Sun, what a piece of shit Java was at that time, 100 times slower than C.
We brought Purify into the company, pure software.
But at the time, I was still an individual contributor. And we were supposed to get to a
64-bit computing model. We were the first ones to do it in the major computer makers.
And everything worked except the tools and the compilers. And our group was choking on the tools.
I wasn't part of that exercise.
So they came to me as an individual contributor, and they said, we need you to work on this
problem. I said, well, what if I don't want to work on it? And they said, then you're fired.
And that's the way Silicon Graphics was. So they made me manage a group for the first time in my
life of people, and I put them into two shifts working around the clock
to get our debuggers and our performance analysis tools
and our C++ front ends and all these systems
to work with the 64-bit computing.
You would think this would be an easy problem,
coming from 32 bits to 64 bits.
It's a really hard problem.
And we worked two shifts around the clock,
day and night for three months
created a minimum viable product and all of a sudden
Silicon Graphics could ship their 64-bit computers
and I realized as an individual contributor
I can do this amount of work
but I just got these gentlemen and these ladies to work really hard
on this problem and we were heroes
and Silicon Graphics was really good about taking care of their heroes.
They sent me to Hawaii for five days in a five-star hotel with presents on my desk every day,
plus bonuses, of course.
That sounds amazing.
A very good company to work for that way.
But anyway, that's when I became a manager and decided that I could do more with that.
Now, if you ask a person like me,
and especially Kevin Scott, who's the CTO of Microsoft, and I hired for LinkedIn,
he would rather be programming in Python right now than doing anything. So once you're an engineer,
you're an engineer, and you have to give up something to be a manager or leader. So don't
forget that when you're making that transition. So as you got into management, I'm sure at some point you would also miss writing code.
But what aspects kind of got you going to continue on the management route?
Well, let's get back to the code part.
I still coded because, so for example, when we did the transition to 64 bits, I wrote all the test cases.
When I was the manager of the group that moved to Purify to get Purify to work, I wrote the Purify torture test, everything you could do wrong in C programming.
And it was so good that the pure software people made me an honorary member of their engineering team.
Now, I was the manager at the time, but I was writing the test code. Even when I was at LinkedIn, we had Hack Day, and I wrote a hack that basically
scraped all the members of LinkedIn. At the time, there were 140 million. And I could tell you
their names and their titles and their companies. And I scraped it without having access to LinkedIn
directly. And then I handed it as the hack to my security team,
who worked for me, of course, and they were quite embarrassed.
And then they wrote a thing called Sentinel, which fixed this problem
so dummies like me couldn't scrape LinkedIn.
So I kept my hand in, but at the end of the day,
you have to decide, you know, you're going to trust your team,
and I always trusted the team.
And now, you know, at this point, even when I left LinkedIn, everybody was smarter than I was, which is great. That's the
way it should be. Yeah. Well, so going from the code part, what got you going on the management
track? Like, what did you like about it? What I liked about it is the ability to handle and do
many more things. So I could get a lot more work done and accomplish a lot more if I could direct
traffic. And at that point,
I became like a coach. I don't know if your audience knows this, but I'm a big fan of the
Los Angeles Lakers. Sorry about that, Warriors fans, but I grew up here and I have season tickets.
Well, you make a championship team not by just having LeBron and AD on the team. You make it
by having rebounders and defensive specialists. And I assembled those teams. At LinkedIn, when they did the going away for me, there were 22
LinkedIn employees in the room. Out of those 22, five of them were there before I got there. The
other 17 I hired personally, including Kevin Scott and Bruno. And that was my team. And that's what I realized. With that kind
of a team, you're going to win championships. You're going to win. I love that.
Makes sense. So I'm very eager to jump on some of the time at LinkedIn. But before I do that,
you mentioned when you went to AltaVista, like operations or something that you
learned kind of on the job, like learning about networks, data centers.
Well, first of all, it sounds challenging.
And throughout your career, like you were still writing code and just staying close to the ground while managing a big organization.
What did that learning on the job about data centers and networks look like?
If I imagine, it just sounds hard to do all of that.
Well, when you realize that at the end of the day,
it's either hardware or software,
and what you'll find,
and this is where I have friends in this business.
So my first startup, we wrote software
to design integrated circuits and PC boards.
So I have some background in hardware engineering,
but the people that know about hardware and software combined make for good people.
The old way of doing operations where you had engineers and they would hand over a fence to the operations team
and the operations team would run it doesn't work in the Internet.
It doesn't scale.
It's not fast enough.
It doesn't deal with the internet. It doesn't scale. It's not fast enough. It doesn't deal with issues
fast enough. And so I was actually a good candidate to be somebody to learn about operations. I
actually find the best operations personnel are people that were engineers first and then became
operations personnel. If you're strictly operations and you don't know how the code works, then you
don't know how the internet works. Oh, yeah. A huge plus one to that.
So talking about your time at LinkedIn,
you mentioned the third time you came out of retirement
was when you came to LinkedIn
and led engineering and operations.
Yes.
Every SRE at LinkedIn has heard this,
that the number one priority after talent is site up and yay
that took a while to get that right yeah and like every sri who joins linkedin from the get-go like
from the boot camp to every like if they go to a post-mortem after an outage or if they are speaking
with a team for a new design like site up is a culture that's talked about a lot and it's something that's attributed to you uh and many people
at linkedin who are still there uh since your time talk about site up yeah that's that comes from
david henke uh and i was reading about you and it seemed like that's true for your time
at yahoo as well and i realize we probably can't go into all the details, but I would love to know, what did it take to bring that culture at a company like LinkedIn?
Like you mentioned, it took a while to get there.
Like what major changes needed to be happening?
I think that you used the right word, culture.
So let me tell you what happened. I come from Yahoo, and Yahoo and Google know how to run at scale on the Internet. Good for them. And that was a good experience. Come to LinkedIn, the fucking site is down every day. Every day. okay i they had a word that i never heard of called throttling you know what throttling means
where i come from it means you're on your motorcycle and you're pulling back and you're
gassing it right that's what throttling means to me that's not what it means these guys it means
you're throwing bits on the floor because you can't handle this many requests never heard of this term before. Okay? This happened every day.
And I'm like, this is not good.
This is not a good outcome.
Why is this happening?
And then you start exploring it.
So the first thing we did at LinkedIn was we had a daily operational meeting.
And the beauty of a daily operational meeting is I get to hear everything that's wrong.
And unfortunately, there's many things that are wrong at this point. But unlike Jeff Lehner, who's probably the
greatest QA person that LinkedIn ever had, all 30 of the things that he reported that day aren't
going to kill me. But these three things are. And let's go over these three things and figure out
what we're doing. At that point, you're bringing attention to the problem. Then you're getting people to realize, you know, if we weren't having so many problems,
maybe we could spend more time working on things that are of more interest to us
so we can get out of some of these problems.
The other problem was just the sheer culture of the organization.
LinkedIn was a product company.
Reid Hoffman's a wonderful guy, and he's a product guy.
Jeff Wiener's a wonderful CEO, product guy.
Deep Nishar, a very technical guy, but he was the head of product at LinkedIn from Google.
And so you have three of the most powerful guys at LinkedIn, and they're all product guys.
There's an old adage, quality, schedule, features.
Pick two.
Engineers typically pick quality and schedule. Not always, Features. Pick two. Engineers typically pick Quality and Schedule.
Not always, but typically.
Product people typically pick Schedule and Features.
So there's this natural tension between Quality, Schedule, Features. Pick two.
And of course, when I asked Jeff Weiner this question, what do you think he said?
I want all three.
That's not the deal, Jeff. That's not the deal Jeff
that's not the problem we're trying to solve for here
bottom line, LinkedIn didn't treat site up
importantly, and now we did
we had to, because if you can't keep the site running
and the service running, who gives a damn about this next feature
I'm all believer in growing fast
and I think that's a lesson I
learned from Mr. Hoffman, Reid Hoffman. Go as fast as you can, grow as fast as you can,
but the site's got to work. The other thing that was important to me was security. And, you know,
when, you know, 500,000 of your users have the password 1, 2, 3, 4, 5, 6, that's not a very
secure system. And you want to start building that out as well.
Now, we weren't moving money at LinkedIn like a bank, but we still wanted to make it as
secure as possible.
So I wanted site up and security to be on top after our talent.
Talent always comes first because without the people, you can't do anything.
Then that and then everything else.
Now, that doesn't mean we spent all our time working on site up and security. What it means is when push came to shove, that took
priority. And that was a difficult and time consuming argument with Jeff, and with Deep,
and with Reed, and with the rest of LinkedIn, because LinkedIn wasn't used to that.
But we figured it out, and it was in our best interest.
And at the end of the day, it really, really paid off.
So I want to touch on what you said, quality schedule and features pick two.
It's a challenge, especially during high growth times.
Like if you're spending too much time getting the perfect technological solution,
but not for the right product, you won't survive.
On the other hand, if you build the right product but, like you said, if the site is not working, that doesn't work either.
So while you want to grow as fast as you can but still want to keep the site up, how have you seen successful teams manage this?
Well, what you realize is you look for the things that are
killing you. So if you're constantly fighting
the site, then you really don't
have time to add new features, right?
You're just not going to do that. The other thing
that was killing LinkedIn was
the release process. Again,
before your guys' times, but
we used to release every two weeks
very badly.
It was a huge Java monolith.
Remember, I'm the guy that ported Java from Sun to Silicon Graphics when it first came out.
What a piece of shit.
And the memory management model was horrific.
It's much better now.
Bottom line is that we shipped this big piece of junk every two weeks, and we had to fix it forward.
Sometimes we were there till the next morning, sometimes till the next afternoon, just trying to get our site to work again.
And it was horrific, right?
So we finally went to the product people and to the rest of LinkedIn, the engineers, and said, we want to redo this.
We want to re-architect how we deploy. And we want to do this in a way that at the end of the day,
we will all be better off. And we had four principles. The project was called Inversion.
Everything at LinkedIn is in something. It's kind of a nice name. Inversion. So we had four
principles. One, trunk development. Everybody checks into the
trunk. This is the way it should be. This is the way we did it at SGI. By the way, if you broke
the build at Silicon Graphics, you're fired. Okay? So anyway, don't break the build because
we're all checking into trunk. Two, got to be 24-7 testing against the trunk. We're constantly
testing against it because we're all using it as the common base.
Three, canary in the coal mine deployment. Instead of deploying to all 100 nodes, I'm deploying to
one. If that works, three. If that works, five. If that works, 10. If that works, 30. If that works,
70. If that works, all 100 nodes get deployed to. At any moment, if it does not work, undo. You have to be able to undo.
And with those four principles and the machinery behind that called inversion,
we changed how code was delivered, deployed, tested, undone, and the pace at which this was all done at LinkedIn. People could release anything
at any moment. And if it didn't work, we undo it. This makes the engineers happy.
They can go as fast as they can. This makes the ops people happy. If it doesn't work,
we're undoing it. This makes the product people happy. We're shipping more features than
we were before. This makes the salespeople happy. We're making more money. Everybody's happy. And I
will tell you, that was a very good thing for us to do. Yeah, it's really interesting to see
how the project inversion came about. And I can imagine that this was the right thing to do for LinkedIn
at that time. And it's not easy to make that call and say like, hey, this is what we have to do.
And a lot of, I can imagine a lot of engineers coming in and saying, well, I thought I was going
to work on, you know, like the latest and greatest technology. Why am I being put to do these other
tasks, which I thought would be done already. And I can imagine someone in your position has to be able to keep
such a large group of engineers motivated.
Okay, so to your point, think about not just the engineers,
but think about the product people.
Right.
Their whole world is new features, new product development,
and they were effectively put on hold largely for six months.
Right.
So what I'm curious about is like, how, how were you able to influence them to kind of
convince them and keep them motivated to be like, Hey, this is, we have to do this.
And this is how it's going to pay out.
Like in the short term, I understand this is going to really, really suck, but you're
going to really enjoy it, you know, after this time period.
That's right.
And first you go to the leadership, right?
Make no mistake.
We went first to Deep Nishar, the VP of product.
And the good news is Steve's a pretty good engineer.
And so he understood the mess we were in.
Then we went to Jeff Wiener.
And the nice thing about Wiener is
he spoke every two weeks in front of the whole crew. Every two weeks. And I always asked him why.
And he says, because you have to say something 42 times before anybody will remember it. That's one.
And two, he said, we got a lot of new people. They've never heard any of this shit. We got to
do it over and over again. Well, in this case, he's going to speak to the whole company about
inversion because we need
him to. We need to get buy-in from the product people, from the finance people, from the sales
people, as well as from the engineers. And we did. Nice. So I want to pivot now. So we love
discussing stories about production outages on the show and also the lessons learned, I imagine, leading up to
Project Inversion and probably during it, there were probably many of those. So you've seen other
many product outages, not just at LinkedIn, but also other companies. Could you share maybe one
or two of these war stories from your experience? Yeah, some of these actually bring back very bad memories for me.
So if I start crying at some point, you'll understand.
So probably the greatest outage in the history of Yahoo that I was on board with
was what I call the 10G massacre.
So 10G is Oracle, Oracle 10G. And at the time, again, we had a legacy system that
was failing, that was responsible for half of Yahoo's money. And the 10G massacre went from
Spark to, we were going to change all of our databases. And we're going to go from Spark to Intel, single computer to RAC,
32 bits to 64 bits,
Solaris OS to Linux OS,
Big Endian to Little Endian
in the byte representation,
EMC to NetApp storage,
and finally Oracle 9i to Oracle 10g.
This was our migration path.
Not an all one, though, right?
Remember, I had been at Yahoo for one month
just trying to keep this site going.
And this had all been tested.
I was assured by everyone this was all tested.
And we had certifications and all this stuff.
And like anything else,
when you make that many changes all at once,
it's probably not going to work. So I called it at the end of the day, the 10G massacre.
We loaded it up on a weekend, which was typically our non-traffic time. And by the morning,
when ASIC started coming online on Monday morning, things turned to shit in a hurry, and it was bad.
And effectively what we learned was this version of Oracle on Linux in this environment was beta at best, and it was crashing constantly.
I now know what an ORA 600 is.
Never knew that before, but you've got to know that because that's basically you're screwed and you don't know why.
But I'm going to preserve the data if I can.
That's an Oracle fatal error.
Anyway, it took us two months to get out of this nightmare.
Two months.
During this time, I would get up at 7 in the morning, walk to work, and I would leave every morning at 2.
So I got five hours of sleep, max, and I was living in Pasadena at the time, because that's where Overture was.
And at 2 a.m. in the morning in Pasadena, there's only two groups of people that are out there.
And they would ask me, can I have a cigarette?
Because I used to smoke at the time.
Sure.
And then they would ask me, why are you out here so late?
I said, well, it's hard to explain, but I'm responsible for half of the money at a company,
and we've just shot ourselves in the foot, if not the head.
And then they started asking me personal questions questions like, what's my name?
Because they would see me every night.
After about three weeks, they said, David, how's it going today?
Any better?
And I said, well, you know, we stayed up for most of the day and so forth.
After five weeks, they said, we have a good feeling about this, David.
At this point, I'm buying them beer and cigarettes and shit. And remember, I see the same crew every
night at 2 a.m. After two months, we sorted all this out with a lot of help from Oracle and experts
from a lot of places, including a lot of people flown down from headquarters in Sunnyvale for Yahoo.
And at the end of it, I went to a liquor store, bought five cartons of cigarettes and as many
bags of booze as I could carry. And I took it out there at 2am to my friends. I said,
I will never see you again, I hope. And I never did. That's the 10G massacre.
Yeah, that sounds fairly horrifying. And I'm glad that you guys are able to get up at
some point. So you wrote a series of blog posts related probably quite to this on the LinkedIn
blog titled Every Day is Monday in Operations. As I was reading through one of those posts,
you're talking something about the Panama Project, where you wrote one of the axioms that go like, go to work every day willing to be fired. Can you elaborate more on
this or share any related stories? I can. So Panama is, as you know, the Panama Canal is
probably one of the greatest construction efforts in the 20th century. It's an amazing story. I
recommend to all your listeners to read it. It's like the
thousand-page book in one of my bibliographies on this topic. But we called our project to rebuild
sponsored search for Overture in Yahoo Panama. And it was very similar in terms of many things.
One, in the Panama Canal, they had to keep the workforce alive because of yellow fever
and because of malaria, and they didn't know the cause of it. In our case, we just had to keep the workforce alive because of yellow fever and because of malaria,
and they didn't know the cause of it. In our case, we just had to build a team. When I got to Yahoo,
there were 27 people working on Panama. When I ended up, there were 500 people working directly
on Panama. The second problem was how to engineer the solution, and we had to create brand new
engineering and solutions to make this work and infrastructure as well, just like they did on the Panama Canal.
For those of you who don't know it, the Suez Canal connected two seas, but the Panama Canal connects two oceans.
And they literally had to build a lake at the top of Panama, get the water in there and use that to float the boats and to lower the boats in the locks.
And that's the engineering solution that
worked. But the other thing to remember is a long-term project in the Panama Canal was a
long-term project. The French had started it. They dug up one-third of it and quit.
And it almost bankrupted France. And then the Americans took over because of military
reasons. And Teddy Roosevelt was smart enough to know that we needed this, nothing else for our
military. But you can't give up. So there's a famous part where they're cutting what's called
the Calabra Cut in the Panama Canal, and it's a very mountainous area. And once again, because
it's a tropical rainforest, it fills in with mud and water. And the engineer goes to the chief engineer, he says, what do I do now?
We just filled up the trench, the glaber cut. By the way, glaber means snake in Spanish.
We filled it up again. What are you going to do? And the chief engineer says, what do you think
you're going to do? Dig. Well, that's what we had to do on the Panama Project. It took us one and a
half years to do this. Now, getting back to go to work every
day willing to be fired, right near the end of the Panama Project, the boss shows up. The CTO of
Yahoo, she would remain nameless at this time, but he's my boss. And I got all my lieutenants in
there. And I had written this 25-page spec for what it meant for acceptance criteria for Panama.
Right?
And he says, we got to ship now.
I said, but we're not, we haven't checked off everything on the 25-page spec.
So he starts taking things off my list.
And I start getting angry.
And this is not, not good.
And I will say, I didn't behave well. I took my badge
and I threw it at the big boss and I walked out of the room and I quit because he was trying to
undo my list in front of my staff. At the end of it, he came and we talked and we both apologized
to my staff because neither of us handled that one very well. Bottom line,
we didn't relax the criteria for releasing it and we did a hell of a job. And I'm still proud
of that project and very proud of the people that worked on it. Yeah, that's really great to hear.
Yeah, you mentioned like you work with many engineers even at that time. And I imagine a big part of your role has been to,
to grow them as well and to actually see key qualities that,
that you're like, this,
this is what I want to see in a really good engineer.
What kind of key qualities have you kind of discovered from a variety of
engineers that you feel like they're going to go places and do great things in
the future?
Well, again, obviously you've got to have the smarts, but that's not enough.
Necessary, not sufficient.
So believe it or not, culture once again comes into play, right?
You could be the smartest guy on the planet or a gal, and if you can't get along or figure
out how to get along with this team, you're off the team.
You also have a certain, some engineers have a certain knack for exploring or thinking about what could be.
And you're always looking to them because those are the ones that are going to take it to the next level.
And that's another trait that I look for. By the way, I interviewed almost every engineer we hired in the old days.
I spent 35% of my time in LinkedIn hiring. Imagine that. Remember, I was there all day,
all night, but still 35% of my time hiring people. And I would talk to junior interns,
as well as senior leaders. Didn't matter to me because everybody that joins this company,
that is the number one priority for us,
hiring the best and the brightest,
but also the cultural fit.
That does matter.
Since you talked about your boss a little bit
in the previous conversation,
in what Austin was talking about
during the Panama project,
I was looking at your LinkedIn profile
and Kevin Scott, CTO of Microsoft,
actually has a recommendation for you.
And one thing that he says is that
you are the best boss he's ever had.
Yeah, that actually pisses off all his other bosses, by the way.
That's okay. I'll take it.
I have only one other recommendation that I post, by the way.
That's from a guy named Chi Lu.
Yeah, and I have a question on that one too.
Go ahead.
Okay, go ahead.
I just want to make sure we mention him because everybody's got a hero in this business.
And my hero is Chi Lu.
Okay.
Why don't you tell us why?
When I got to Yahoo, Yahoo was what I call the loose confederation of warring tribes.
You had a lot of really smart people working on a lot of very different things, not necessarily together.
Chi just wanted to do search and search monetization, and he was really good at it, and he attracted people to do it.
So his team was very loyal to him, and I eventually worked for him at the end of the day in search and search this. I even had a cron job to do this
because I was so lazy to stay up just to see what he would respond. All but four hours of the day,
he responded. Anyway, that's Chi. So I'll get back to my question about Kevin Scott's
recommendation about you being the best boss. But since you mentioned hard work, also on these recommendations, like both from Kevin Scott and Chi Lu, I actually
want to read it out just a couple lines from both of their recommendations. So this is from Kevin
Scott, that I knew within a day of working with him that David's passion and commitment to his
work and to his employees were almost superhuman. And Chilu says something on the similar lines where ferocious intensity is another hallmark
of Henke.
I still remember the days where Henke can fight off a major site outage with his enormous
willpower.
So you mentioned Chilu was one of the hardest working folks or the hardest working guy you
worked with.
My question for you is, these two people that you respect a lot are saying that your abilities with the intensity
you came in with and the passion you have are amazing. So how did you develop this passion
and intensity that you brought to work every day? Well, that's a good question. By the way,
I'm not like Chi Lu and Kevin Scott. They're both introverts. They're both a lot smarter than I'll ever be. But I don't like it when shit doesn't work. It really bugs me. And I don't like to fail. do you love winning or do you hate losing? Ask yourself that question sometime. I hate losing
more than I love winning. Doesn't mean I don't love to win. I just hate losing. And I hate it
when things don't work. And, you know, the nice thing about LinkedIn is there was no end to it
there. And Yahoo and Overture, no end to it there either. So let's go fix that. And I was often brought in to fix things like that. And I know
what to do. In terms of, you know, willpower to overcome it. That's not me. That's the team.
What I had to do was get the team in place, and make sure everybody knew what they were going to
do. And once you get enough people marching in the right direction, you'll solve any problem. Well said.
And coming back to my question about Kevin Scott's recommendation for you being the best boss,
I know I've repeated that now three times.
I don't know how many people that have pissed off.
He can't repeat all the reasons he might have said that.
But you can interview Kevin someday and maybe ask him that question.
Oh, yeah.
We would love to have Kevin on the show as well.
He's quite a character. Let's get him character. We'll just leave it at that.
Okay. So my question actually is, what do you think makes for a good boss?
A good boss is, first of all, somebody that listens to you. Listening is a very underrated
skill. You want somebody that actually hears what you're saying,
right, but also can speak it back to you. So this gets back to one of my deficiencies of
communication. Email is not communication. Texting is not communication. IRC chat is not communication.
Communication is when you and I are talking to each other, I say something to you, you can say what I said
back to me, and I have to agree that you got it right. That's communication. I had to learn that
the hard way in marriage counseling, and that didn't work. But anyway, bottom line is,
communication is really hard. And believe it or not, I would use this trick with some of my teams.
Okay, person A, you need to talk to person A, say what he said,
and make sure he agrees that you've got it right. You would be amazed how hard this is.
And that's something we need to focus on. The other thing that I used to do as the boss was,
I wanted to know what my employees wanted to do. Look, what is it you want to be when you grow up?
What is it you want to do next? What is it you want to do next?
What is it you want to work on?
So I don't wait every six months or one year to have a talk like that with them.
I wait maybe at the max a week, and we'll hash it out.
And by the way, there's usually three things.
No one wants to hear more than three things that you want to work on, trust me.
Fourth thing, and their eyes are waving over their head, and they could give a shit.
So pick three things, and instead of saying you suck at this, say, what if you did this instead?
Maybe we can talk about that.
What if this, for the three things that we're trying to work on?
Because there's always something to work on, and that's true for all of us.
Yeah, yeah.
That is absolutely true.
Yeah. So you've been leading, you've been in a lot of leadership roles at this point and worked
with many, many leaders as well. I'm pretty sure there's a ton of parallels with, you know, being
a good boss, but I imagine there's other independent contributors or individual contributors
that, you know, also have like the leader attribute,
just not in like more managerial position.
So what to you makes a great leader beyond like I know you are?
Well, let's talk about individual contributors for a second.
I'm going to turn your question a little bit because it's a good LinkedIn story as well.
So when I got to LinkedIn on day two, I asked the VP of operations at the time,
show me the DR site. Tell me about it. How long will it take me to cut over if I lose this data
center? He says, I don't think you understand. I said, I don't think you understand. It's a
simple question. Is it going to take four hours, eight hours, 12 hours, a day? He says, I don't
think you understand. I said, I don't think you understand.
He said, well, we have 90% of the computing we need in the second data center, the DR site.
I said, that's fine. We can buy the other 10%. I said, what else? He says, well, we don't have
any software on those computers of our software. We don't have any data replicated on those
computers, and we don't have any configuration parameters set up.
Basically, they're just machines cooking the machine room.
I said, so basically what you're telling me is we don't have a DR site.
Is that what you're telling me?
And he goes, that's right.
So, by the way, that's not good.
I had to go right to the CEO and right to the board of directors and say, if we lose this data center, we are out of business.
Everyone, nod your head of business. Everyone,
nod your head. Everyone understands this, right? Okay. So fast forward, we built a DR site. Thanks
a lot to Neil Pinto and a whole bunch of people. And we built it from scratch. And it worked.
We know it worked because we cut over to it in about eight hours. Pain in the ass, but we cut over to it.
Then we wanted to build Multicolo. Now, Multicolo is a lot harder because you got to flip traffic,
but the hardest part for LinkedIn was all the data sources had to be replicated so that they were
consistent. And that was a really hard problem. So I took the best engineer I could think of at
LinkedIn that we hired from Yahoo, a guy named Swee Lim, individual contributor, not a manager,
not a director, blah, blah, blah, but a real smart engineer. And I said, Swee, we've got to make
multi-color work, just like we used to do at Yahoo, just like they do at Google. Flip traffic,
who gives a shit if we lose a data center, right?
Everybody in the company at that point worked for Sui Lim,
the individual contributor.
He became the leader.
Jeff Weiner got in front of the entire company.
If this gentleman comes into your office and asks for something,
do what he says.
And we built Multicolo in one year, roughly. Now, it runs in many, many
data centers. And I remember the last time I was in the NOC at LinkedIn, it was green, green, green,
red. Red was one data center down, the other three are data centers up. And Mr. Hankey,
isn't this a beautiful thing? We don't give a it works perfectly and i said that is a beautiful thing but it took an individual contributor to your point to lead
all of linkedin on a cross-functional project which was a real pain in the ass but we did it
and i imagine even like sweet like you said uh extremely smart um and i'm assuming that
another big part was that he was able to bring other people along which i think is something
that a lot of engineers you know over time slowly start to develop um which made him so
uh useful in that particular position that's right people want to want to be part of something. They want to follow
people that are very good. They want to learn
from people that are very good.
Shit, I learned a tremendous amount from Sweet.
How does this work? How does that work?
What's the biggest problem we are faced with
to get this replication problem to work?
And by the way, I think
he just left LinkedIn and he's over at Databricks now.
Good for them.
Very good guy.
But it just shows you that you don't have to be a manager or director of EP.
You could lead as an individual contributor.
And as long as you – you need the backing of the other leadership.
But the nice thing is, so we had the backing of the CEO of the company.
Again, Jeff Wiener would stand up every two weeks.
This is Swee Lim.
If he walks into your office, do what he says.
I have all of that. It's amazing that Jeff and the entire company supported Swee on this project
of Multicolo. And I mean, today, we see that. LinkedIn runs out of multiple data centers. We
do traffic shifts. And it's just amazing. That's one of the first things when I came to LinkedIn that I was extremely
impressed as, Oh, we just click a button and it all happens.
It's like magic.
But what does it take for an IC to build that trust,
that level of trust with the leadership? Like it's not easy.
Well, part of it is, is I have to talk to once again,, the CEO and the product people and the salespeople. So, you know, the nice thing
about talking with Mr. Wiener is he likes philosophy. So, we talk about things like
existential issues. Existential is basically death or life, right? So, he gets the fact that if we really can't make this work, we're just going to be in a
bad spot. And the nice thing is we also came from a company that understood how to do this. So
people know it can be done. Google, it's not a problem. They do it all the time. Facebook,
they do it all the time. But LinkedIn did not, and Yahoo knew how to do it. So it's important to say it can be done.
Once it's done, everyone's going to breathe easier.
Believe it or not, of my four years there, until we had Multicolo, I did not breathe easy.
I would ask for a lot of startups who are on this accelerated path of growth probably are still working towards
developing that site of culture, which they want, but it's not there yet. What advice would you have
for them, either teams or early stage companies? Well, one thing is you're in a better spot now, right? So when I started at LinkedIn, AWS was just coming out with Amazon.
Google didn't have the cloud.
Microsoft didn't really have Azure.
But AWS was starting, and we knew some of the first users of it because they were ex-Yahoo guys that went over to Netflix.
And Netflix was going to go all in with this.
And believe it or not, at that time,
this was years and years ago,
it wasn't reliable.
It wasn't elastic.
It wasn't cost effective.
There was the system administration tools sucked.
The security system sucked.
And it was like, much as we wanted to use it,
we wouldn't use it for all of those reasons.
Now, it's great. And you can go to
AWS, you can go to Google Cloud, you can go to Azure, and you can be pretty sure that you don't
have to deal with data centers and networks and the computing. That doesn't mean your software
doesn't have to be resilient, and your monitoring systems don't have to be excellent, and your monitoring systems don't have to be excellent and your scaling systems don't have to be knowledgeable.
And so you have to build that in, at least architect for that, up front.
That's one of my suggestions.
I know you want to go as fast as you can.
And I'm a big believer in Mr. Hoffman's blitz scaling.
That's something I had to learn the hard way.
The faster, the better. But that's
why you invest in things that help you go faster, help you scale. Because you don't want to be in a
position of what I call the going out of business business. Let's say your success is so good,
but you can't keep up with the demand because you can't scale it fast enough.
Even if by throwing computing at it, you can't scale it fast enough.
Then you're in a bad spot.
And if you're moving money, that's what I like to talk about, you know, like the encryption guys and the Bitcoin guys and the banks versus LinkedIn.
LinkedIn had to run at three nines.
We did not at first, but my goal was three nines, 99.9 uptime.
The money guys can't do that.
They have to run it four nines or better because they're moving people's money.
And if they screw it up, they're out of business.
And I think it's important to grasp that as you're running as fast as you can
because it seems like they're at odds with each other.
They don't have to be.
You just have to engineer and architect your solutions to scale.
Always think 10x.
Yeah, makes sense. So we're starting to wrap up and I have a few more questions for you.
I want to make sure we respect your time. But before I go on to these questions,
are there any other war stories you would like to share with our listeners?
I have many. Maybe if people like them, I literally could go on forever.
And so I will not do that. I will go to your questions. Okay. Well, we will save that for
another time. We, or at least I heard that you read a lot and you also like gifting books to
your staff, to the people who work with you. What are some of the
books that you've gifted the most? So I not only gift books, but it's required reading.
Okay. So if you work for me directly, you have to read these books. And
actually, I'll just lay some, I'm just going to look for a list but I can't
find it. So here's some of the
books that I really recommend.
But I have a bibliography and
your readers can go find it on
the internet. It's a leadership
talk that I've given maybe 30
times. It was recorded
by UC Santa Barbara
and I managed not to use the F word in that
thing because it was my
alma mater. But at the end of this deck that this leadership speech is done, it has a bibliography
of all these books. Okay, we'll end that on the show notes for sure. And definitely allow your
readers just to look at the bibliography because they're good reads. But I have many, many good
reads. Some of them go way back, you know, like the
Mythical Man Month is a software book. Six Hats is kind of a fun read. But some of the more
important books are, one of them is a philosophical book, but written by the Toltecs, who predated the
Aztecs. And it's called The Four Agreements. And it's very important to me. The Four Agreements
are, do your best, which you guys all do, which is great.
Be impeccable with your word.
That's number two.
Impeccable is Latin without sin.
Don't assume anything.
Well, that's Ops 101, right?
And the hardest one, don't take anything personally.
I love this one because that may be the hardest one
for most people to appreciate,
but I'll give you an example.
David Filo, the founder of Yahoo, smart guy, engineer, and I worked for him at one point when the CTO left and we were looking for a new CTO.
And he called me an idiot about 500 times while I worked for him.
And I said, David, I have the largest group in the company and I have the largest budget because I run the data centers and the computing and the networks, and you've given me all this responsibility, and I'm an idiot.
What does that make you?
And I'm thinking, he must have just kicked his dog that day or something.
I don't know.
Bottom line is, don't take it personally.
I don't think he really thought I was an idiot.
Maybe he did, maybe he didn't, but I don't really worry about it.
So that's a very good book for me because you can use it in your life as well as
in work. Go to the bibliography because I think it, you know, he talks about things like how they
built the Panama Canal. It's a hell of a read. You want to see how something's done, a big project's
done, because not everything is a quick and dirty project anymore. Some things are much harder to build than others.
The graph at LinkedIn, the network,
we did it three times when I was there.
That's a hell of a thing.
And it's a hell of a thing to get right.
And it may be the most interesting data structure we have.
And we just released a new database to store that graph.
You did?
Well, I didn't know that.
See, you're on iteration number N
where N is greater than I remember. Well, thank you for that graph. You did? Well, I didn't know that. See, you're on iteration number n, where n is
greater than I remember. Yeah, well, thank you for that recommendation. And David, we could go on and
on with you. And we could ask you a lot more questions, but probably we'll save that for
another time. It's been such a pleasure and an honor to speak with you. Thank you so much for
taking the time. Anytime. And if you work at LinkedIn, you're probably part of the best SRE group
in the universe, at least in our universe. And the reason I know that is ex-Googler,
Kevin Scott, now CTO of Microsoft, knows how good Google is at this. And he said,
Bruno, your team's better. So there you go.
We'll take that. Thank you so much again. We really appreciate it.
Take care.
Hey, thank you so much for listening to the show. You can subscribe wherever you get your podcasts
and learn more about us at softwaremisadventures.com. You can also write to us at hello at softwaremisadventures.com.
We would love to hear from you.
Until next time, take care.