Y Combinator Startup Podcast - Things That Don't Scale, The Software Edition
Episode Date: October 27, 2022Dalton Caldwell and Michael Seibel on software hacks that don't scale. Companies discussed include Google, Facebook, Twitch, and imeem. Watch the first video on doing things that don't scale here: htt...ps://youtu.be/4RMjQal_c4U Apply to Y Combinator: https://www.ycombinator.com/apply/
Transcript
Discussion (0)
We'll get a founder that's like, oh, how do I like test my product before I launch to make sure it's going to work?
And I always come back and tell the founder's the same thing.
Like, if you have a house and it's got full of pipes and you know some of the pipes are broken and they're going to leak, you can spend a lot of time trying to check every pipe, guess whether it's broken or not and repair it, or you can turn the water on.
And like, you'll know, like, you'll know exactly the work to be done.
Hey, this is Michael Seibel with Dalton Caldwell.
Today we're going to talk about what does it mean to do things that don't scale, the software
edition.
In this episode, we're going to go through a number of great software and product hacks that
software companies used to figure out how to make their product work when perhaps they didn't
have time to really build the right thing.
Now, Dalton, probably the master of this is a person we work with, a guy named Paul Buhite,
who infronted this term the 90-10 solution.
He always says something like,
how can you get 90% of the benefit for 10% of the work?
Always.
This is what he always puts on to people when they tell them
it's really hard to build something or it'll take too long to code it.
He'll just always push on this point.
And, you know, founders don't love it.
Right? Would you say that's a fair assessment, Michael?
That's a fair assessment. Yes, founders hate it.
But tell them, tell them.
the audience why it's worth listening to the guy? Like, why does he have the credibility to say that to
people? Well, Peeby is the inventor of Gmail, and as kind of a side project at Google, he invented
something that 1.5 billion people on Earth actively use. And he literally did it doing things that
don't scale. So I'll start the story and then please take it over. So as I remember it,
Pee-B was pissed about the Gmail product, a Google product he was, sorry, the email product he was using.
And so Google had this newsletter product.
The first version of Gmail, he basically figured out how to put his email into this Google Groups UI.
And as he tells the story, kind of his eureka moment was when he could
start reading his own email in this UI. And then from that point on, he stopped using his old
email client. And what I loved about this is that as he tells the story, every email feature that
any human would want to use, he just started building from that point. And so, you know, he would
talk to the YC Batch. And he's like, and then I wanted to write an email. And so I built writing
emails. And if you know P.B., like, he could have gone a couple days reading emails without replying at all. So, like,
He didn't need writing emails to start.
I remember him telling the first time he got his, like, co-worker,
like literally like his deskmate or something to try to use it.
And his desk means like, this thing's pretty good.
It loads really fast.
It's really great.
The only problem is, PB, it has your email in it.
And I wanted to have my email.
And B was like, oh, shit.
Okay.
Well, I got to build that.
I'm like, I got to build that.
Perfect night-in-night solution.
And so then.
it started spreading throughout Google.
And do you remember when it broke?
No.
What happened?
Oh, so he told his story where like one day Peeb came in late to work, which is, you know, knowing Peeb.
Every day.
You know.
And everyone was looking at him really weird.
And they were all like a little pissed.
And he got to his desk and someone came over to him was like, don't you realize that
Gmail's been down like all morning?
And Peeb was like, no, I just got to work.
I didn't know.
And so he.
he's like trying to fix it
trying to fix it and then his co-workers see him
like grab a screwdriver and go to the
server room and it was like
they were like oh god why do we trust
PB with our email like we're totally
screwed
and I think he figured out like there was a corrupted
hard drive and I remember
that point of story he was like
he says and that day I learned that people
really think email is important and it's
got to always work
it's like
perfect that is
I think the reason, I think the reason he did it, man, is because he liked to run Linux on the desktop.
And he didn't want to run Outlook.
Like, the Google, like, suits were trying to get him to run Outlook on Windows.
And he was like, I don't really want to want Windows.
But, yeah, it was the dirties hack.
And as I recall in this, you know, final part of the story, it was hard for him to get Google to release it because they were afraid it was going to take up too much hardware.
And so there was all these, there was all these issues where there was a good, there was a decent chance, I think.
it never would have been released.
Well, this part was that everyone thought Gmail's like invite system was like some cool, like growth hack.
Fidelity hack.
Like virality hack.
It's like, oh, you got access to Gmail.
You got, I think, four invites to give someone else.
And these are like precious commodities.
And it was, it was another product.
It was just another version of things that don't scale.
They didn't have enough server space for everyone.
They have enough servers.
So they had to build an invite system.
Yes.
There was not an option to not, but basically there was no option other than building an invite system.
It was not like genius PM growth hacking.
It was like, yeah, well, we saturated.
The hard drives are full, so I guess we can't invite anyone else to Gmail the day.
That's it.
That's it.
So you had another story about Facebook early days that is similar in this light.
So let me paint the picture.
Back when you started to start up a long time ago, you had to buy servers and put them in a data center.
which is a special room
that's air-conditioned
that just has other servers in it
and you plug them in and they're
they have fast internet access
and so being a startup founder
until AWS took off
part of the job
was to drive to the suburbs
or whatever drive to some data center
which is an anonymous warehouse building somewhere
go in there
and like
plug things in
and what was funny is when your site crashed
It wasn't just depressing that your site crashed.
It actually entailed getting in your car.
Like part of being a startup founder was waking up at 2 a.m.
and getting in your car and driving to like Santa Clara.
Yep.
Because your code wedged that you have to physically reboot the server.
And your site was down until you physically rebuilt the server.
So anyway, I'm just trying to set the stage for people.
So this was what our life was like.
Okay.
And so my company, I mean, we had a data center in Santa Clara.
and there was a bunch of other startups there as well.
And so something that I liked to do was to look at who my neighbors were, so to speak.
There was never people there.
It was just their servers.
And there would be a label at the top of the rack.
And you could see their servers and you can see the lights blinking on the switch.
Okay?
So this is what I was like.
And so our company was in the data center in this data center in Santa Clara.
And then one day there's a new tenant and, oh, new neighbors.
So I look at it.
And the label at the top of the cage next to ours, you know, three feet away.
the label said the facebook.com.
And I remember being like, oh, yeah, I've heard of this.
Like, cool, like, sounds good.
And they had these super janky servers.
I think there was maybe eight of them when they first moved in.
And they were like super cheaply, like super micro servers.
You know, like the wires were hanging out.
Like, you know, I'm like, cool.
But the lights were blinking really fast.
Okay.
And so what I remember was that there was labels on every server.
And the labels were the name of, you know,
of a university. And so at the time, one of them, one of the servers was named Stanford,
one of them was named Harvard, you know, like, and it made sense because I was familiar with a
Facebook product at the time, which was like a college social network that was at like eight
colleges. Okay. So then I watched every time we would go back to the data center, they would
have more servers in the rack with more colleges. And it became increasingly obvious to me
that the way they scaled Facebook was to have a completely separate PHP instance.
running for every school that they copy and pasted the code to,
they would have a separate MySQL server for every school,
and they would have like a MimCash instance for every school.
And so you'd see like the University of Oklahoma,
you'd see the three servers next to each other.
And the way that they managed to scale Facebook
was to just keep buying these crappy servers.
They would launch each school,
and it would only talk to a single school database,
and they never had to worry about scaling a database
across all the schools at once.
Because again, at the time, hardware was bad.
Okay, my SQL was bad.
Like, the technology was not great.
If they had to scale a single database, a single user's table,
to hundreds of millions of people, would have been impossible.
And so their hack was the 9 and 10 solution like PBE used for Gmail,
which is like, just don't do it.
And so at the time, if you were like a Harvard student and you wanted to log in,
it was hard coded to the URL was Harvard.
TheFacebook.com, right, man?
And so if you try to go to Stanford about the facebook.com,
it would be like, you know,
error. Like that was just a separate database. And so then they wrote code so you could bounce between
schools. And it actually took them years to build a global user's table, as I recall,
and avoid this hack. And so anyway, the thing they did that didn't scale is to copy and paste
their code a lot and have completely separate database instances that didn't talk to each other.
And I'm sure people that work at Facebook today, I bet a lot of people don't even know the story.
but like that's what it took that's the real story behind how you start something big like that
versus what it looks like today so in the case of Twitch all if not all like most of the examples
of this came from this core problem and it's why I tell people to not create a live video
site a normal website even a video site on a normal day will basically have peaks and troughs
of traffic.
And the largest peaks will be
2 to 4x the SETI say traffic.
So you can engineer your whole product
such that if we can support
2 to 4X the city to say traffic
and our site doesn't go down, we're good.
On a live video product,
our peaks were 20X.
Now, you can't even really
test 20x peaks.
You just experience them
and fix what happens
when 20X
more people than normally show up on your website because some pop star is streaming something.
And so two things kind of happened that were really fun about this. So the first hack we had was
if suddenly some famous person was streaming, on their channel, there'd be a bunch of dynamic
things that could load. Like your username would load up on the page or their channel and the view
count would load up and a whole bunch of other things that would basically hit our application
servers and destroy them if 100,000 people were trying to request the page at the same time.
So we actually had a button that could make any page on Justin TV, a static page.
All those features would stop working.
Your name wouldn't appear.
The Vucat wouldn't update.
Like literally a static page that loaded our video player and you couldn't touch us.
We could just cache that static page and as many people as possible want to look at it.
Now to them, certain things might not work right.
But they were watching the video.
chat worked because that was a different system, the video worked, that was a different system.
And we didn't have to figure out the harder problems until later.
Later, actually, Kyle and Emmett worked together to figure out how to cache parts of the page,
we'll make other parts of the page dynamic.
But that happened way, way later.
Dude, that reminds me, let me give you a quick anecdote.
Yes.
Remember Friendster before Myspace?
Yeah, of course.
Every time you would log in, it would calculate how many people were two degrees of separation
from you.
And it would fire off on my SQL thread, where you would log in, it would look at, you
your friends and then would calculate your friends and show you a live number of how big your
extended network was. And the founders, you know, John Abrams, he thought this was like a really
important feature. I remember talking to about it. Guess what MySpace is, uh, do things that don't
scale solution was. If they were in your friends list, it would say, this is in your friends. So and so
and so is in your friends list. And if it wasn't, I would say, so and so so and so so
And so, so Friendster was like trying to like hire engineers and scale my SQL and they're
run into like too many threads on Linux issues and like updating the kernels.
And MySpace was like, uh, so-and-so is your extended network.
That's our solution.
Anyway, carry on that.
But that's the same deal.
So our second one was, it always happened with popular streamers.
So our second was, if you imagine, if someone is really popular and there's 100,000 people
want to watch their stream, we actually need multiple video servers to serve.
all of those viewers. So we would basically propagate the original stream coming from the
person streaming across multiple video servers until there was on enough video servers to serve
all people who are viewing. The challenge is that we never had a good way of figuring out
how many video servers we should propagate this stream to. And if a stream would slowly
grow in traffic over time, we had a little algorithm that could work and like spin up more
video servers and it'd be fine. But what actually happened was that a major celebrity
would announce they were going on, and all their fans would descend on that page.
And so the second they started streaming, 100,000 people would be requesting the live
stream.
Bam, video server dies.
And so we were trying to figure out solutions, solutions, solutions, and like, how do we,
how do we model this?
How do we, like, there were all kinds of, like, overly complicated solutions we came up with.
And then, once again, Colin Mette got together, and they said, well, the video system doesn't
know how many people are sitting on the website before the video stream, before it starts,
starts trying to start video. But the website does. All the website has to do is communicate
that information to the video system and then it could pre-populate the stream to as many
video servers as they would need to and then turn the stream onto users. So what happened now in
this setup is that some celebrity would start streaming. They would think they were live.
No one was seeing their stream. While we were propagating their stream,
to all the video servers that are needed.
And then suddenly, the stream would appear for everyone
and would look like it worked well.
And like the delay was a couple seconds.
It wasn't that bad, right?
But like dirty, super dirty.
But it worked.
And honestly, that's going to be kind of the theme
of this whole setup, right?
Super dirty, but it worked.
You had a couple of these in I-Meme, right?
Yeah, there was a couple that we had an I-meme.
So one of them,
so at the time, again,
like to set the stage, the innovation of showing video in a browser without launching
real player, no one here probably knows what that is.
But it used to be to launch a video, it would launch another application in the browser
that sucked and it would like crash your browser and you hated your life.
Okay.
So one of the cool innovations that YouTube, the startup YouTube, had before it was acquired
by Google, was to play video in flash in the browser that required no external dependencies.
or just play right in the browser.
At the time, that was like awesome.
Like it was like,
it was a major product innovation to do that.
And so we wanted to do that for music at Imeam.
And we were looking at the tools available to do it.
And we saw all this great tooling to do it for video.
And so rather than rolling our own tools that was music specific,
we just took all of the open source video stuff and hacked the other video code that we had.
So that every music file played on Imeam was actually a video file.
It was a dot FLV back in the day.
And it was actually a flash video player.
And it was basically we were playing video files that had like a zero bit in the video field.
And it was just audio.
And we actually were transcoding uploads into video files.
You know what I'm saying?
Like the entire thing was it was a video site with no video.
I don't know how I'll explain it.
And I do think this is a recurring theme is a lot of the best product decisions
are ones made kind of fast and kind of under duress.
I don't know what that means.
But it's like when it's like 8 p.m. in the office and the site's down,
you tend to come up with good decisions on this stuff.
So we had two more at Twitch that were really funny.
The first one, talking about duress, was our free peering hack.
So streaming live video is really expensive.
Back then, it was really expensive.
And we were very bad fundraisers.
That was mostly my fault.
And so we were always in the situation, we didn't have enough money to stream as much video.
And we had this global audience of people who wanted to watch content.
And so we actually hired one of the network ops guys from YouTube who had figured out how to kind of scale a lot of YouTube's early.
usage and he taught us that you could have free peering relationships with different ISPs
around the world and so that you wouldn't have to pay a middleman to say serve video to folks in
Sweden you can connect to yourself your servers you go I forgot what they're all
it saves you money and it saves them money that's what they want it yeah and there were these
massive like switches where you could basically like run some wires to the switch and bam you
can connect to the Swedish ISP now the problem is is that some
ISPs wanted to do this free peering relationship where basically you can send them traffic for free,
they can send you traffic for free. Others didn't. They didn't want to do that or like they weren't
kind of with it. And so I think it was Sweden, but I don't remember. Some ISP was basically not allowing
us to do free peering and we were spending so much money sending video to this country and we were
generating no revenue from it. It's like we couldn't make a dollar on advertising. And so what we did is that after 10
minutes of people watching free, free live video, we just put up a big thing that blocked the
video that said, your ISP is not doing a free peering relationship with us so we can no longer
serve you video. If you'd like to call to complain, here's a phone number and email a
track. And that worked. How fast did you do for that to work? I don't remember how fast. I just
remember it worked. And I remember thinking to myself, it's almost unbelievable. Like that
ISP was a real company. Like, we were like a website in San Francisco. And, and hey, that worked. And
then the second one was translation. So we had this global audience. And we would like call these
translation companies and we'd ask them like, how much would it cost to translate our site into
these like 40 different languages? And they were like infinite money. And we're like, we don't have
infinite money. And so I think we stole the solution from Reddit. We were like, what happens if we just
build a little website where our community translates everything. And so basically it would just
like serve up every string in English. And it was like served to anyone who came to the site who
wasn't from an English speaking country. And it was like, do you want to volunteer to translate the
string in your local language? And of course, you know, people were like, well, what if they do
a bad job to translate? I was like, well, the alternative is it's not in their language at all.
So like, let's not make the perfect enemy of the good. And I think we had something where like
we would get three different people to translate it and match, but like that happened later.
We basically got translation for a whole product for free.
Maybe to end, because I think this might be the like maybe the funniest of them all,
tell a Google story because I think this one's like the like, really?
So look, for the Facebook story, that was firsthand where I personally witnessed the servers with my own eyes.
So I'm 100% confident that is what happened because it was me, right?
This Google story is secondhand, and so I may get some of the details wrong.
I apologize in advance, but I'll tell you this was relayed to me by someone that was there.
All right?
You ready?
So, look, the original Google algorithm was based on a paper that they wrote, which you can go read, page rank.
It worked really well.
It was a different way to do search.
Okay.
It worked.
They always didn't have enough hardware to scale it because remember, there was no cloud back then.
You had to run your own servers.
And so as the internet grew, it was harder and harder to scale Google.
You still with me?
Like there were just more web pages on the internet.
So it worked great when the web was small, but then they kept having more web pages really fast.
And so Google had to run as fast as they could to just stay in the same place.
Just to run a crawl and re-index the web was like a lot of work.
And so the way to work at the time is they weren't re-indexing the web in real-time constantly.
They had to do it in one big back.
process back in the day. Okay. And so there was some critical point, this was probably
in the 2001 era again, this is second hand, I don't know exactly what it was, but there was
some critical point where this big batch process to index the web started failing.
And it would it would take three weeks to run the batch process. It was like the you know,
re-index web.sh. You know, it was like one script that was like do Google, you know, and it started
failing. And so they tried to fix the bug and they restarted and then it failed again. And so the story
that I heard is that there was some point where for maybe three months, maybe four months,
I don't remember the exact details. There was no new index of Google. They had stale results. So anyone,
any user of Google, they didn't know that, you know, the users didn't know this, many user of
Google was seeing stale results and no new websites were in the index for quite some time. Okay.
And so obviously they were freaking out inside of Google.
And this was the genesis for them to create MapReduce, which they wrote a paper about,
which was a way to parallelize and break into pieces all the little bits of crawling and re-indexing the web.
And, you know, Hadoop was created off of MapReduceers, a bunch of different software use.
And I would argue every big Internet company now uses the descendants of this particular piece of software.
And then it was created under duress when Google secretly was completely broken.
for an extended period of time because the web grew too fast.
But I think this is the most fun part about this story.
When the index started getting scale stale, did Google shut down the search engine?
That's the coolest part.
Like people just didn't realize it.
And did they build this first?
Again, in terms of do things that don't scale, did they build MapReduce before they had any users?
No.
Like they basically made it this far by just building a monolithic,
product and they only dealt with this issue when they had to.
You know, I think this is like such a common thing that comes up when we give startup advice.
You know, we'll get a founder that's like, oh, how do I like test my product before I launch to
make sure it's going to work? And I always come back and tell the founder's the same thing.
Like if you have a house and it's got full of pipes and you know some of the pipes are broken
and they're going to leak, you can spend a lot of time trying to check every pipe.
guess whether it's broken or not and repair it,
or you can turn the water on.
And like, you'll know, like, you'll know exactly the work to be done
when you turn the water on.
And I think people are always surprised that that's basically all startups do.
It's just turn the water on, fix what's broken, rinse and repeat.
And, like, that's how big companies get built.
It's never taught that way, though, right?
It's always taught and, like, oh, somebody had a plan and they wrote it all down.
It's like, never, never.
And you earn the privilege to work on scalable things by making something people want first.
You know what I think about sometimes with Apple is picture like Wozniak hand soldering the original Apple computer and like those techniques compared to like whoever it is that works on Apple to design AirPods.
Like it's the same company but like Wozniak hand soldering is not scalable.
but, you know, because that work,
they earn the privilege to be able to make AirPods now.
And because Google Search was so good,
they earned the privilege to be able to create super scalable stuff
like MapReduce and all these other awesome internal tools they built, right?
Yes.
But if they wouldn't build that stuff first,
they wouldn't be Google, man.
And so to wrap up kind of what I love about things that don't scale
is that it works in the real world, right?
The Airbnb founders taking photos,
or the DoorDash folks doing deliveries.
It also works in the software world, right?
Like, don't make the perfect, the enemy of the good.
Just try to figure out any kind of way to give somebody something that they really want
and then solve all the problems that happen afterwards.
And you're doing way better.
All right.
Thanks so much for watching the video.
