Screaming in the Cloud - How Scaling Turns Rare Occurrences Into Common Ones with Jason Cohen
Episode Date: February 22, 2024Today Corey Quinn is joined by Founder and Chief Innovation Officer at WP Engine, Jason Cohen. Jason breaks down the biggest issues he has seen throughout his career hosting millions of websi...tes including why seemingly rare problems should be expected at scale, how moving on after attaining a “good enough” metric can save time and money, and what it means to be proud of your work in the world of cybersecurity. Check it out!Show Highlights(00:00) - WordPress popularity and outsourcing engineering tasks(07:28) - Web hosting and scalability(11:01) - Server reliability and quality control(14:18) - Scaling infrastructure and prioritizing customer value(26:20) - Website speed and optimization(28:17) - WordPress scalability and deployment in a cloud environment(36:14) - Customer profitability and service limitations(38:54) - Security measures for ethical decision-making(47:19) - Balancing free speech and decision-making in online content moderationAbout JasonFounder of unicorn WP Engine (200,000 customers, 1,200 employees). Previously founder of bootstrapped Smart Bear (sold 2008; re-sold in 2021 at ~$2B) and ITWatchDogs (sold 2004). Original mentor and angel investor with Austin-based Capital Factory since 2009.Written about startups for seventeen years, most recently at https://longform.asmartbear.com; Twitter: @asmartbear.Links Referenced:Personal Website: https://longform.asmartbear.com/WP Engine: https://wpengine.com/Linkedin: https://www.linkedin.com/in/jasoncohen/
Transcript
Discussion (0)
That's another thing that's true of engineering.
You can do anything if you really want,
and you can write stuff in any language if you really want.
Doesn't mean you should,
doesn't mean it's a good fit, but okay.
Welcome to Screaming in the Cloud, I'm Corey Quinn.
Periodically, I will have people
from a variety of different companies
doing different things for different reasons
come on the podcast.
Every once in a while, I like to crack down, okay, who's a vendor that I've used a lot of
and often don't necessarily think about? And when I started framing it that way,
today's guest became relatively obvious. Jason Cohen is the founder of WP Engine.
Jason, thanks for joining me. It's great to be here. I have a
painful history with running websites at even small scale, then medium scale, then large scale.
And WordPress has been sort of a thing that has taken over the world. It felt like the late 90s.
Now there's still a disgusting percentage of the world that runs on top of WordPress. I've run it myself. It was a terrific demo app for teaching people how to use
Puppet. It touches a whole bunch of different things. And when it came time to decide I should
have a website that probably is useful to work with, the first iteration that I went with
personally was building my own custom thing on serverless. This was a bad idea. When it
became an actual real business, I went with WordPress and figured, huh, who can I wind up
finding to run this for me that isn't me? And the answer was WP Engine pretty quickly. So I've been
your customer for something like seven years now. Thanks for not going down that during that time.
It's appreciated. Oh, oh sure thanks for giving us money
that's that's what we like yeah i mean wordpress currently powers 43 percent of every domain on
earth which as you say is a staggering and unbelievable number but there's many different
data sources who all point to that and uh yeah it's it's because it's open it's because there's
a community it's because i think it is true that once you have some success and momentum there,
it builds on it because people know it and then they build another site, just like you
said.
And so there's also a big set of design agencies that use WordPress.
So they are almost essentially like a sales force for WordPress.
WordPress is free.
So that's in quotes.
But, you know, hey, if you're going to build your agency or freelancing business off of it, clearly you're going to advocate for
that. So I think you have this set of things like that, which made it so successful. Even 14 years
ago, when I started WP Engine, WordPress was already 11 or 12 percent of the web, which is
already kind of infinity for a new company to be able to sell to a tenth of the web. That's huge.
And so, yeah, it's only grown from there, which is hard to believe.
People love to talk smack about it in engineering circles. It's PHP. Who wants to work in PHP?
Great. Cool. I don't want to think about PHP. I don't have to. I care about the content.
What the technology stack that powers my website is, is never going to be a determining factor in
did my company succeed
or not from where i see yeah but here's the thing engineers hate whatever is popular like whenever
the language becomes too popular then everyone hates it so like at first java was cool because
it was new and weird even though it's slow and actually kind of bad then java became the most
popular language and everyone hates java okay i mean you just hate whatever it is that whatever bug tracking system you use,
you hate it. Like almost guarantee you hate it. Okay. I just feel like this is, this is the
standard thing that we do. Also there's, there's a general thing in engineering where it's not
necessarily the highest quality, best thing that wins. There's other factors like it being easy to
try, easy to troubleshoot, easy to troubleshoot easy to understand easy to
dig in under the covers and so forth so you have things like open source projects that have all
those attributes and are they as good as some commercial things in many ways no but it has
those attributes so it wins anyways wordpress has all of these things because it's open source and
it's easy and it's accessible for lots of people to use it and so on and so forth there's an old article called worse is better on this and so it shows
stuff like some of these text formats for for moving stuff around it's inefficient like we
should use other formats like i know but the thing with text is you can write it you can read it you
can look at it in your packet dumping stuff you can you can mess with it easily you can use grep you can dump it to a log and all that other stuff is harder and so right so it's
worse but it's better because it's more accessible it's more you know it's more observable and so
forth so i feel like there's a lot of things and then when those things become popular for good
reason because those other attributes are good engineers like to say i hate it because and then
they list those other types of attributes and they're not wrong that those other attributes are missing or
not optimized for. But I just feel like this is very common in many things in engineering,
so it doesn't bother me. In fact, maybe it means you're winning.
For me, the big reason to go WordPress is not because I have some deep-seated love affair with
it. If anything, just the opposite. Before they wound up dying slash being absorbed by MediaTemple, I worked on large-scale hosted WordPress at MediaTemple
for about a year, year and a half. And that was enough to teach me I didn't want to run
WordPress myself if I could possibly avoid it. Because running it on a laptop or in a container
or who knows, probably someone who's working on Kubernetes these days, is probably not that challenging.
But running anything at scale introduces an entire series of separate problems.
Right.
Yeah, we run it on Kubernetes.
You can run it everywhere if you want.
Of course, that's another thing that's true of engineering.
You can do anything if you really want.
And you can write stuff in any language if you really want.
Doesn't mean you should.
Doesn't mean it's a good fit, but okay.
But yes, doing anything at scale obviously is hard yeah when i got started i built a bunch of serverless stuff uh that to run the website and power the blog and the newsletter
then i realized that you know at least in the website piece of it other people could be much
better at doing a lot of these things than i could and i didn't want all the engineering to be
bottlenecked on at the time the four people on the planet who understood these technologies that came out last week. And but with WordPress,
you swing a dead cat, you'll hit 15 people who know how to work on this, basically, no matter
what room you happen to be in. Yeah, that's part of why the spoils continue to go to the winner,
those kind of things like what you just said. Well, whatever it is, we definitely can hire people
full time or contractors or part time or flex or that we can definitely do it.
Also, will the cloud support the tech that's behind it?
Yeah, of course.
It's 43% of the web.
What are they going to not support it?
That's crazy.
So it's those kinds of things where you go, okay, well, let's just do that.
So exactly as you say, is WordPress or your marketing website in general, is it incredibly
core to your business that it be unique? And the answer is almost always, no, that's not what makes us unique. What CMS we use
and how the marketers mess about with an article like that's that is really far away from what
makes the product unique for almost every company. Okay. So for almost every company, like you
shouldn't, you should spend the least amount of time on this and you should spend enough money
only so that bad things don't happen.
Like the site goes down, the site is slow, the site is hacked.
Okay, yeah, we need to spend enough money to where that's not happening because that is bad.
But beyond that, there's no additional benefit.
Therefore, outsourcing it to us or a competitor of ours for that matter, just simply makes
sense.
It's not where you're going to get a comparative advantage.
So why are you spending your time on it other than the base, the core is needed for it to be functional and do its job.
One of the things that I think is lost on a lot of folks is the idea of scale as being its own
particular skill set. As you say, you have a, an awful lot of competitors. You are not the only
company in the world that provides managed WordPress hosting. And you are also by a landslide,
not the least expensive.
The trouble with just getting the results of all the various companies that do these things and
sorting by price from low to high is that there's a universe of folks out there who,
well, I ran my own website for a couple of years on WordPress, didn't seem that hard. Oh,
it has a multi-tenancy option. I'm going to go ahead and spin that up and then I'll start making money by offering that to other folks. That starts to fall apart extremely quickly. I wanted to trust a company that has
been there before when there's something that's going on and the website goes non-responsive for
some reason. Okay, there are people who know what they're doing looking at this. It'll be a minute
or two and it'll come back as opposed to having to wake someone up in the middle of the night because they didn't realize that that's how computers work.
It's interesting. Scale is an interesting topic. It's also interesting to be expensive.
If you're used to GoDaddy and you pay $2.9 per month for your website, then paying us $29 a
month is 10 times as much. It sounds very expensive. Now, okay, you get what you pay for,
you get service, you get it's fast, it's scalable, blah, blah. But it is expensive too. So okay,
fine. On the other hand, we have tens of thousands of larger customers. For them,
we're the low cost alternative to what they see as website with their website development,
which is things like Adobe Experience Manager or Drupal or Sitecore, these kinds of things,
which are millions and millions of dollars to build a website and then millions of dollars to host it and millions of
dollars every time you're going to do marketing campaigns. So for them, we are 10 times cheaper
and we're the low cost alternative as opposed to the GoDaddy side of the market, the other end of
the market where we're the expensive, we better be great at that price. And so it's very interesting
since it's the whole internet and we're at a scale, we have
200,000 customers.
So we're at a scale where we do see every kind of person.
And so it's interesting, like, are we expensive?
It depends who you ask and what's going on.
And it's interesting that there's that complexity to it.
But yeah, the scale is interesting because I think engineers who haven't done it before
have this in mind.
They say, look, what we do is rewrite code and we have these tools that help us automate things in particular
infrastructure. And so with CloudFormation or Ansible or, you know, Docker containers,
we have all these tools to say, I want something that looks like this, or I push a button and
it creates a set of services that are connected. And if I can do that once, then I can just keep
pushing that button and do it 10 times or write code that pushes the connected. And if I can do that once, then I can just keep pushing that button
and do it 10 times
or write code that pushes the button.
And now I have a thousand servers,
10,000 servers.
I have to pay more money
to allocate the physical resources,
but the scale takes no effort is the thought.
So why is that wrong?
It is wrong, but it's not obvious why it's wrong.
Like it's computers.
I should just keep pushing the button and it works.
It becomes super obvious the second time, but the first time it completely catches people by
surprise. Yeah. But why is it, what are we missing? So here's, here's the answer. Let's say
you have a laptop and let's say it's pretty high quality and pretty stable. And so it only crashes
once every four years, not bad. Like it locks up and you're like, eh, you have to reboot it.
What was that? Who knows? Some really odd thing, something crashed.
The operating system has a bug.
A cosmic ray hit it.
Who knows?
Something that rare.
You're not going to diagnose it.
You don't even care to diagnose it because you're like, whatever.
Okay, I reboot it every couple of years.
Who cares?
Like, this is pretty good.
That would be a high quality laptop.
Now, we have 17,000 servers.
Okay. So let's say they're all this good that they only crash once every four years, randomly, unpredictably can't prevent it. Can't
say when, cause it's the, some weird thing. What happens when there's 17,000 of them?
And by the way, our servers are doing way more stuff than your laptop. So by all rights,
they should crash a lot more than that. But let suppose let's suppose right well 17 000 you know four years is what like 12 1200 days ish 1300 days 17 000 servers
so you start doing the math and you're like we should have totally random unpredictable
unpreventable crashes like 10 times a day oh wait Yeah. Like crap is going to be blowing up constantly. And we just said you
could never predict or prevent it. Wait, what? Yeah. And so then you might say, okay, well,
fine. We'll reboot them. Yeah, I know you will. And then how many customers will get mad about
that? Well, yeah, but, but it's only this tiny, tiny fraction of our customers. Right. But let's
say hundreds and hundreds of customers a day have downtime from your weird, unpredictable thingy.
So what do they do?
They all call support and you have a thousand support tickets a day just from this one thing.
Wait, what?
Or how many go to Twitter and say you suck?
I don't know.
Every day?
What?
Very few people take social media to say good things about companies, but something goes wrong,
oh, it's all over the place. The best outage detector I've ever found.
But we just agreed. Well, we didn't agree, but I'm pretending like you're agreeing.
This is a totally normal and expected, we could be the greatest ever, and this is just going to
happen. So how do you summarize that?
That's the story that shows why it is in fact true.
And you go, oh, okay.
So I summarize that by saying rare things become common.
Rare being hard to detect, hard to prevent, hard to,
and they become common automatically
simply because there's a lot of them.
So if you roll dice enough, then things happen, right?
Kind of like million monkeys sort of thought.
We also see tens of billions of web requests per day across our platform.
So what kind of quality percentage would you need to not see any errors?
It's like, I don't know.
That's a lot of zeros.
I don't know.
Yeah.
It's like something impossible, clearly impossible.
So impossible, that doesn't sound very nice.
Now, a couple of things to take away. One is, okay, so when we talk about quality, it's just a whole other level by which I mean, orders of magnitude different, really,
really different. So is that going to make us have very different development processes and
procedures and what does testing mean and blah, blah, blah, blah, blah. Yes. it means those are going to have to be quite different. Not because small companies are dumb.
The small companies would actually be dumb if they implemented all that heavyweight process
while they're small. That's wrong too, because that's not a problem for you right now. But if
you're at scale, it is. And so the big companies that do all that stuff, that's not dumb. It's
mandatory because everything's multiplied by powers of 10. And so
things appear that were there. You just didn't see them often enough to do anything about it,
rightly so. So yeah, your processes have to get better because you do need more, you know,
percentages of quality or however you'd like to measure, you know, different ways of measuring
that. But the other thing is, but it's never going to be perfect. And at sufficient scale,
stuff's going to happen. And so you also this different mindset of well given that it's going to happen for sure then what oh well then our
reaction time has to be faster the reaction has to be automated remember it's like this kind of
meta second layer prevent prevent prevent but knowing that prevention completely is impossible
and scale means that that will be common comma oh, oh, what kinds of detect?
So you start getting to these numbers like
mean time to detect, mean time to recover,
as opposed to how many incidents.
Of course you do both, you do both,
but the number of incidents you want smaller and smaller
as a percentage of everything.
But smaller as a percentage of something that's growing,
it's still an absolute number growing,
and so you still need to know like,
but do we detect and recover in like a minute or two versus an hour and it takes a human that's a big difference but it's a totally different question
of detect recover automatically than preventing in the first place which of course is quote unquote
better but if it's impossible for it to be good enough so in how you allocate your time or
investment you might say across these things and then we haven't even gotten to security which is a whole nother thing and often hurts things like performance and uptime,
et cetera. So that's another thing that can be at odds with scale is security. So I don't mean
to overcomplicate it, but it just goes to show these are not only things that you don't think
about at first, you shouldn't think about it at first. It would be a waste of your time. It would
be premature optimization. So you shouldn't do that at first.
But on the other hand, if later you're not doing it, that's bad.
And it applies at all layers of the stack too. Easy example. You said a few minutes ago that
you have 17,000 servers. Okay, great. That is a significant point of scale. You can almost
certainly get some incredible discounts from Dell or HP, whoever's making servers these days. Super Micro's been on the rise for a while.
But you're almost, if not entirely,
based on AWS and Google Cloud,
based upon what I've seen over the years
of various service offerings you have.
I get to sometimes pick
which one of those two providers I'm hosted out of,
which, cool, fantastic.
I don't have a strong preference,
believe it or not, for my corporate website.
Why don't you run your own servers?
You certainly have enough that people would say
that people can do basic arithmetic and say,
okay, if a server costs this much,
a calculator tells me this much,
and wow, that's a lot of money on instances.
Do you just hate money?
No, we love money, but Google and Amazon know that.
And so they simply set their prices for us
such that it would be more expensive to move,
to rebuild and move and manage ongoing management.
Let's not forget.
It's not the price of the service.
Of course, that's less.
It's not that it's managing them.
And as you say, everything I just said, cross apply to the physical layer.
So you have to be ready for that now, but you could outsource that.
I know, but all that's expense.
So when you take the total cost of all of that stuff and then you that then what google does is they know that
and so they set their prices such that we go okay if we were to do that maybe we could save this
much per month but we would have to do this that and do this distraction and is exactly what you
just said about wordpress is how we feel about infrastructure. How exactly those SSDs get
racked and powered does not affect our customers. It needs to exist and have high uptime. Beyond
that, our customers don't care how that happens. So if we could save tons of money doing it,
but they just simply set the prices for us so that it's not worth it. So as we spend more and
more with them, they're like, you know, then it becomes more and more economical
for us to do it our own.
But then they change the price so that it's not.
Discounting at scale is very much a thing.
I've yet to find an AWS environment
that's built out anywhere other than at a startup
where the infrastructure costs more
than the people working on the infrastructure.
It's not the, it's hard to reliably replace SSDs
at scale in a data center.
It's that it's hard to be able to afford the people to be able to do that until you're
at a certain inflection point.
And again, you folks are terrific at running WordPress at scale.
I don't know, for example, that you would be nearly as effective at remembering to do
generator maintenance on a consistent schedule and only one at a time so you aren't taking
down both power
rails in various ways and causing site-wide outages and i wish i could start making that one up it's
just not it's just not an expertise it's not an expertise that we have and so you could choose to
build that expertise or perhaps acquire something etc etc and then you but then you start asking the
normal strategic questions is this good for our? Does this make us more differentiated in the market? Does this add some innovation that keeps ahead of trends or
does something valuable? And the answer is no to all. The best thing it could possibly do is save
us money, which is a good reason. That is a good reason to do something. It's just the least
strategic thing you can do to save money, right? Anything you can do for your customers, whether you're charging them more for it or maybe accepting that value
rather than in price by things like retention or advocacy. There's many, many ways to trade
value with your customer. I like to say what you should do is create more value for the customer
and decide how to split it with them. It could be a higher price, you know, but like there's many
ways. And anyway, just create more value. That's number one. And then split it. That's the business
side. Fine. Saving money is none of that. It's good for us and customers don't care. So we should
do it. It's stupid. As you say, it's stupid to burn money for nothing. But again, since the
vendors know that they just simply set the line such that that isn't that isn't a good use of our
time. You know, you hear stories like,
oh, with Dropbox, they did this and that with disks.
Right, because at some point,
at some level, at some scale,
for some companies, it's a good idea.
Of course, of course.
No such thing as a law of physics
that's true everywhere.
And there are a lot fewer companies
with specific large scaled out workloads
that are running into capability barriers
at that scale
than there are people
who look at that and say, yeah, we've got several hundred of these things now. We should definitely
build a data center. No, please don't. No, no. It's hyper-specialized to want to do that.
If you're Facebook, it makes sense to have data centers in Iceland for long-term storage. That
does make sense. At some scale, in some situation, it makes sense. But for almost
all of us, including us, and we're a hosting company, so if anyone should, it's us, right?
It makes no sense. Again, because at best, it's a cost savings, and P.S., it isn't.
And there's value to understanding the market you operate. A couple of years ago, I was profiled in
the New York Times, which was great. But when I called into WP Engine in
advance to let you folks know, the response was not, okay, good for you. You just called a gloat
or what? It's like, no, no, there's none of that. They understood, oh, great. So you're going to
start potentially seeing some scale. Here's what we can do to mitigate that and make sure the site
doesn't go down at the worst possible moment because they're not going to run the profile
a second time. And there were processes and procedures set up. There was a migration from shared to dedicated
for a four-hour span,
but things still stayed up during that time.
It was clearly communicated.
And everything just worked.
That's the sort of thing you only learn to do really well
by doing it really poorly a few times first.
Yeah, we've done it so many times.
You know, another thing that happens with scale is people.
What does not scale is one person doing a thing.
What scales is teams of people who do things and they have their policies and procedures
and training and teaching each other and so on, so that the whole system is of higher
quality and you have checklists.
And if one person leaves the team, the team progresses
in any way. That's the kind of thing that you build a skill with humans. And so that's what
you're describing too with service. So that's also true. And of course, we can do that because
it's advertised over 200,000 customers and no one customer can do it because it's not advertised
over 200,000 customers. It makes no sense for them to try to become an expert. It just doesn't
make any sense. So this is true of many things. Like you said, like it's true for us in the cloud.
Like we treat Amazon like you're saying you treat us, right?
Like we all treat the next layer down on the stack
as a, oh, that's not my business.
That's necessary.
I need it to be high quality,
but that's not my business.
It's not what my customers are.
That's not how I differentiate it.
It's not how I'm going to win.
Therefore, it's not strategic.
So I need something good.
And that's it.
It's like an SLO.
And it's in the Google SRE.
Like I need it to be, I need to hit the SLO and then stop.
Don't, I don't need more.
I don't need to pay more.
I don't even need you to deliver more past the SLO.
We're done here.
If it goes below the SLO, we have a problem.
But if it goes up to the SLO or
at the SLO, then you say, well, the whole point of having an SLO line is when it's above the line,
we all breathe a sigh of relief that there's no current problem. Great. And we agree not to
further invest because that's not giving us value. We need to invest in whatever does, which is
company specific, obviously. And so that's how we think
about the cloud. It's how you're thinking about us as a WordPress provider. And it's correct.
That's the correct attitude towards these things. Another way to look at it is this. There are
things in the company that you want to maximize, meaning there's no such thing as good enough.
Revenue is one. Profit might be one, but revenue certainly is one. Gross margin is certainly one.
But there's many kinds of things, like what kind of customer value of delivering there's no such things too much one of our core value propositions is performance site performance there's no such
thing as a site that's too fast jeff bezos famously talked about how there's no one will
ever say the delivery was too fast i ordered and it came too quickly no like the faster the better
probably you know roughly speaking so there are a few not a lot, but there's a few things in the company
that you want to maximize again, because it's strategic or important in some big way like that.
Good. That's where you should be investing. That's kind of what that means. Most things in the
company, even very important things are things you want to satisfy, not maximize. Once they get to
some threshold, some level, some whatever,
going beyond that is not that valuable.
Either it's not valuable at all,
or just diminishing returns or otherwise,
like it's not a good use of our time or money,
whether because the actual return is diminished,
like a diminishing return,
or simply the business value of it is not enough.
If my website load time increases by 200 milliseconds
from start to finish, great.
I don't make another dime in consulting revenue. I don't get one more sign up for the newsletter,
none of it. It's all, at this point, it checks a box. For me, one of the big values of going to
you folks is that I come from a background where I used to run these things. I do have the engineering
mind where it's fun on some level. I want to set up WordPress and run it across this small cluster of things,
but it adds zero value to my business and it's not what I need to focus on.
So please take it off my plate.
Exactly.
Fun's a whole different thing, right?
Like, oh, fun, fun.
You can throw away everything I just said if you want to do fun.
So many of us learn this stuff on open source software in our spare time,
in the evenings and
weekends or when we're students. And then money is very dear and hard to come by and our time
distills down to basically free. So in time in business, that turns on its head and some people
have trouble with that transition. I did when I started. No, that's absolutely right. Time is
certainly the most expensive thing, there's no doubt. So it's really important that you be
working on the highest priority thing. What is that? Obviously, it's going to be very dependent. But almost for sure,
screwing around with attaching storage is not it. It's almost certainly not the top three,
top one most important thing. And so almost everything in the business should be something
you're satisfying in that model. And so that means outsourcing to something good. Like again,
that threshold for what's good enough to be satisfied to something good. Like again, that threshold for
what's good enough to be satisfied can be high. You can set that up really high and say, for
example, website speed does matter to me because I rank higher in SEO if my site's faster. And that
does equal more dollars at the end of the day for a media company or a e-commerce company, or perhaps even for a
consulting company. And there's a lot of data about e-commerce that shows that faster sites,
more people check out and even, and I don't know why this is, but even have higher average
transaction sizes, like put more in the cart. I don't know why, but there's a lot of data,
like lots of studies. Yeah. We haven't, we saw SEO when we did the analysis improve,
when we improved a website speed by optimizing some things.
And then we checked that.
We got it to a point where, yep, this is awesome.
There is not believed to be any discernible benefit if we, okay, if we drop this performance yet further, we're already getting A's on all the grades and the tests that spit out.
It's, okay, is this where we want to really focus our time?
That's right.
So once you get to that, so you might say, I have a high bar for performance because I've seen what happens when it's not. And it really does help our
business when the performance is high. So I have a high bar, but then saying like, I want to spend
10 times more to push it a slight bit is like, well, no, not that. Like I'm setting a bar and
maybe the bar is low for some things in the business. They're, they just need to exist,
but not very good. Sometimes the bar is very high, no worries, but still it just needs to be satisfied. And then we need to move on. And that should be most things
in the business because we don't have the time and number of people. None of us have the time
and people to do more than that for most things in the business. The bars might be high or low,
or wherever they are, but after we meet them, we need to move on to other things,
especially the things where there is no limit of how good it is.
And then it's OK if you pour forever and ever into that, like Amazon pours forever and ever into into delivery times or inventory and that kind of thing.
Yeah. My experience with my own website is that it is far slower than I would generally find acceptable.
And the reason behind that is that whenever I'm logged into the admin portal and moving around the site, you do a whole bunch of cache busting. It is going direct. Everything is slow and latent
because one of the worst problems in the world is, oh yeah, you fixed the issue, but it's cached
somewhere. So it looks like it wasn't. And then you mindlessly destroy your own website, iteratively
trying to improve it. Been there, got the entire wardrobe, let alone the t-shirt from those
problems. So it's like, yeah, this is slow. Why aren't people the entire wardrobe, let alone the t-shirt from those problems.
So it's like, yeah, this is slow. Why aren't people complaining? Wait, I'm logged in. Okay.
Just to test it, log out, boom, things are loading almost before I click. And okay, good work.
It's always fun when that catches you by surprise. Right, right, right, right.
One last topic I want to get into a bit. You mentioned you were running WordPress entirely on top of Kubernetes. WordPress, the last time I looked at it in any seriousness, which was about 15 years ago,
it was a product of its times coming from the 90s and PHP. It is the era of servers being
physical things. Virtualization was looked at very skeptically in the few places it was deployed in.
It is one of the least cloudy packages I've seen in a while.
It assumes you're going to have
permanent named pet servers
running this thing forever,
trying to get it to work in a cluster
where it can sustain the outage
of one of the nodes,
storing assets in object stores
rather than on disk,
requires a whole bunch of ridiculous patching.
So my question for you,
given that you are the authoritative experts
on running this stuff in the modern era
in a cloud at scale,
how vanilla is the WordPress that you folks deploy
versus how heavily have you had to either patch
or completely fork the thing
in order to get it to do the stuff you want?
So in terms of the PHP code in WordPress,
it's vanilla and there's no forking both for just you could say selfishly
in managing the thing because there's always changes and there's plugins and like there's
all kinds of things that otherwise would break but also we we are also a product of the wordpress
community we we have benefited so much and always have from the wordpress community and it's one of
our core values actually to give back that means several to us. One of them is to give back to the WordPress
community that gave us and continues to give us so much. It's part of what our DNA is.
So we're not interested in forking it and I don't know, somehow, whatever. We're interested in
helping the community, which means that the product it is. However, everything outside of
that is super custom. And of course, that's our secret sauce.
So that's what people are paying us for.
And everyone else is free to do the same, by the way.
It's not like, you know, right.
Oh, in my era, we had so many management scripts for WordPress that were all written in the
most obfuscated Perl that you've ever seen.
It was awful.
Not because we tried to, because we're bad at it and we needed something in a hurry.
Written in obfuscated Perl, or as we also say, Perl.
Yeah, that's an unnecessary adjective.
You can remove it.
The meaning is already implicit.
It really is.
My joke is always like,
no one ever admits they know Perl
because then they're going to be the one,
oh, can you look at this?
And the answer is no, no, no, no, no.
Perl to me kind of looks like
when they picked up the modem,
your mom picked up the modem
and then it went,
like that's what a Perl script looks like to me,
you know? Yep, suddenly your terminal's
sprinting out complete garbage. Yeah, it's the world's only
write-only language. Yeah,
absolutely. Write once, read never.
Yeah, so everything outside
of that is hyper-customized, so
you can do things in Kubernetes. You can mount
disk that's read-write. You can
recover entire things.
You can make a set of containers that
act like a sort of VM, but still using containers for things like each of the processes. So that's
easier to manage and test and deploy and all the normal reasons why one containerizes things.
You can do that and have it move around like a little, I don't want to say cluster because of
course Kubernetes cluster means something else. So this is possible. Now you can also do what you said, which is to try
to make WordPress much more natively 12-factor is what I would say, right? That's also a good idea
in terms of scale. But as you say, it takes a lot of effort and commitment on the part of the site
owner. Because as you say, you have to write the site with that in mind,
like object storage,
really using cache probably,
thinking about how the database works
and how much you're going to hit it,
like how much you're going to abuse it
if it's not going to be local,
making sure, of course,
that the disk is read-only
and only used to deploy code
and doesn't have media.
Part of the problem
with the WordPress plugin ecosystem
is so many of them are written in ways
that are disastrous if you implement them either at any kind of scale or
in anything other than exactly the scenario that the plugin author was envisioning. That's right.
So like a lot of the plugins aren't available to you. So there's a lot of things you have to accept.
If you accept them, then WordPress can be 12 factor and there's plenty of sites that do that.
In fact, we also have a product line
called Atlas, which is headless WordPress
explicitly. Like, we're
running your node in
Kubernetes and also running your
WordPress so that the
whole thing is just what you'd hope,
I guess you could say.
The node is running as fast as it can, things are
cached really intelligently, but it's also talking
to WordPress, which is local to it.
So it's very fast.
So it's a very fast, very scalable thing that uses all the new things like Node.js and blah, blah, blah.
All the new hotness and headless sites.
So we have that too.
So we have sort of the whole gamut between, like you would say, kind of the old-style monolithic WordPress thing,
which is running in Kubernetes, but in a situation where you're like, it's in Kube, but it's kind of like not, right? It's like, yes, that's exactly right.
It looks like to WordPress, like it's not, but it is in Kube. And you might say like,
what's the point of that? And there's actually a lot of points. And one of them is exactly what
we were just talking about. Mean time to recovery. So if a Kubernetes node goes down,
of course, Kubernetes will reconstitute the containers and move the traffic and blah, blah, blah.
And it does it pretty damn fast, much faster than any kind of VM thing with detection, da, da, da, da, da.
And much more reliably and at scale better as well, especially with things like GKS or other things where that's managed for you.
So even making a, you might say, like thing on cube gains you things like some of
these benefits of scale.
You also have things like there's all these advantages of containerization.
You get those, there's like, there's various benefits anyway.
Plus we have products though that go down the line to, okay, wait, are you willing to
really make full factor like headless apps?
Cause if so, we've got a product for you that takes advantage of all that so so welcome and so again we have so many customers so if it was a startup you'd say you
have to focus you can't do all these things that's crazy but we have thousands of employees we have
been around for 14 years and slowly we've built that from something simple and focused to okay
we're going to layer on this product line but we're going to have 50 people working on it we're
going to layer on this this other kind of customer as i mentioned but we're going to layer on this product line, but we're going to have 50 people working on it. We're going to layer on this other kind of customer, as I mentioned, but we're going
to have hundreds of people between sales and marketing and support to pay attention to that
customer segment so that it's in addition to another customer segment, not instead of or
amalgamated. So if you're doing that in the right way, then you can layer on these other
things and it's okay. So now we have all that stuff and it's all right. But yeah, you're right.
It's one of those things. But once again, it's one of those things where it's hard and in some
ways unnatural, but if we solve it, which we do, then we have this competitive edge and we have a
product that's useful. Sometimes companies tackle things that are hard and there's not really a big advantage on the other side.
It's just hard.
And you go, oh, well, that sucks.
That sounds like just a hard business.
In this case, doing that hard thing earns us something.
Oh, this is a high uptime, high speed, whatever.
Like you say, WordPress.
Oh, well, and then since it's hard, a lot of competitors
won't be able to do it or won't be willing to do it. So it'll be somewhat differentiated, let's say.
And you're like, oh, okay, so if we do the hard thing, there's these rewards. Oh, okay,
that's worth doing the hard thing in that case. You've also nailed the pricing as well in that
you don't have one of those $4 a month website offerings that I've ever seen, which means that a lot of those very small dollar customers tend to need an awful lot of handholding as they're getting something up and running.
And when I was at Media Temple, one of the things that we finally started doing was letting go of the bottom X percent of our customers every year just because you're spending $5 a month or $10 a month, whatever it was at the time, and you're expecting 80 hours of engineering support in a month and that the juice and the squeeze don't align.
It's about meeting the right customers and solving the right problems for them.
No, you're right.
You're right.
And it is this there is this ironic inversion where like the less they pay, the more they want.
So it's like, wait a second.
Although although that there's a you there, because then if they pay a lot, they also want to literally be on the phone every week with your product managers and maybe you
should. But there is this interesting, I wouldn't say it's ethical dilemma, but it's a business
dilemma, but it's not obvious what to do, which is, of course, you're going to have some customers
that are more profitable and some customers that are medium and some that are unprofitable like
you're describing. To what extent is that just okay okay it's the cost of doing business and there's value in having a brand that just says
we always help no matter what that even those people who are in that situation they're going
to go and say that to others and there's this momentum and brand reviews twitter reference
ability case studies.
There's all this stuff that like,
if you, the more people are happy with you,
you almost want to say karma works.
Not in like precise mathematical way,
but in just a hand wavy, it kind of does work.
Okay?
To what extent do you want to keep that magic?
Even though you can't measure that
and you're never gonna have metrics, like I get that.
But there's a truth to it. what extent is it like look they're
unprofitable well some are unprofitable now if that gets too much then we got a business model
problem fair enough or maybe some are so crazy on the edge like so so so crazy that no you've
broken the argument now you know that just has to be against our terms of service somehow like
we got to write that into our AUP or something.
You just can't do that. Okay, so maybe
there's this extreme. Yeah, when you have people go,
you're humane about it. You help them transition
somewhere else, but you make it clear that you're not
able to serve them in the way they need to be
served. Even on the consulting side,
we have the same policy.
We found someone paying us $100 a month, and they were the
number one bandwidth user. Okay, well,
you can't do that. There's a limit to where it's like you can't do that
and you're in cloud too so bandwidth is not it's very dear it's very dear it was it was not good
so do you want to trim the tail those cuts if you if you imagine a graph which we've made and maybe
you did too at media temple of basically the customers buy their net profit of per customer
and and you uh and of course there's this tail that's bad, like you say.
Okay, so the absolute, absolute worst ends of the tail you trim.
Fine.
And trim doesn't necessarily mean you make them quit.
In the case of this customer, for example, they had been doing something really dumb with their site.
We helped them fix it and they could remain a customer.
But you do have to fix this thing.
You can't just, you know, fix it. But maybe they have to leave. Maybe not. Maybe they are willing to pay more. Maybe
they can fix it. So, okay. It's worth a conversation. It's a data point. It is not
an answer in and of itself. Yeah. It's like, Hey, we can't let this continue, but like,
there's lots of ways forward here. Kind of a thing, right?
My, for the only ones I was glad to see leave were the ones that were just abusive to the
support reps. That was, that was awful. I made a point to at least once a month beyond the support
floor, taking calls myself
just because, but that's experience.
What the actual, what other people are seeing
and talk to customers.
Imagine that, it helps with things.
Yeah, abuse, you can't allow abuse.
The way I look at it is if you do,
sometimes you do that in the name of
we love our customers, we care about customers,
customers never wrong, that kind of stuff.
And while that's generally the right attitude,
it's the right default attitude, let's say,
like in the absence of other information, yes. But do we love our employees less? Do we respect and trust and love and care about our employees less than our customers? Well, if you allow abuse,
the answer is yes, you do. And I think the right way to do business is the answer is no,
they're all people. And we all we none of them
should be abused neither our customers nor employees and so if you're abusing our employees
okay we'll tell you etc but if you can't stop that's not acceptable i don't care what you're
paying i don't care what your profit margin is you just can't do that because it's not like we
don't care about our employees you know and so i think that's just the right attitude that's just a
good a good relationship these are all human beings let right attitude. That's just a good, a good relationship. These are all human beings. Let's have a, let's have a good, more or less respectful, more or less
safe, more or less professional relationship. Right. I mean, that just seems like a good idea,
but, but how much do you trim? So you could argue, trim it all the way up to the people,
you know, blah, blah, blah. And that's not a bad idea. Like that you're, you're,
you will have a profitable company for sure.
And nothing wrong with that.
And I'm fine.
There's really nothing wrong with it.
But I do think there's a magic there that you might want to be careful
before you snip it,
especially because I don't know
that you can measure it.
At least I don't know how.
And so I think there's an art there
to like what happens over there.
It's not obvious,
but I think it's quite interesting. Yeah customer basis is always is always incredibly important i
mean especially when you're a hosting company uh you always have you have a whole category of
problems that many customers many companies just don't realize exist stolen credit cards because
effectively even before cryptocurrency came out great i can use this to send spam to two billion inboxes.
And I can do this to run control
and control attack sites
and all kinds of other nonsense.
People will say all the right things.
For better or worse, I have not
yet found a way for my consulting projects
to be turned into something
actively harmful to the rest of the world.
But I'm sure someone, Enterprise, will come up with one
sooner or later. It's true.
That is a constant worry and a constant challenge for us.
One way is bad guys taking control of other people's websites.
And that could be tech or it could be social engineering, by the way.
There's also, like you say, if you can run code, you can do whatever.
All the things you said and more.
So using a stolen credit card to get a website and then do stuff because you're uploading code and doing stuff. And so that can be arbitrary stuff,
even just as simple as bouncing through the site to something else, just to cover your tracks some
more. I mean, whatever, or going to a site and injecting something in the, in the, like some
JavaScript in their site. So it's not quite so obvious that they've been hacked, but it's doing
whatever click fraud or whatever the heck it's doing in there. So yeah, a constant thing. So we, security has always been a critical thing
that we've had to invest a lot in. And one of the reasons why people pay to be with us is that,
of course, there's no such thing as quote unquote perfect security, but there is such a thing as
having layer after layer, thing after thing. And, and, you know, either you do or don't do all that
stuff so that you're at least not being negligent and you're doing as much as you can.
That is true.
So we have everything,
and we have a whole security department,
and we're SOC 2 and ISO,
which is not exactly security,
as any security person will tell you,
but it does show that we're trying to be organized
and thoughtful about our processes
and access control and so on.
But we do things like every year,
everyone at the company has to go through a security training,
including social engineering training, especially for sales and support.
It's easier for that to happen.
And of course, all of our code goes through all these reviews and there's automation as well as humans on that.
And I mean, just everywhere you look, there's like stacks of things that have to do with security.
So, yeah, security is definitely like performance.
There's no such thing as we're done.
We've done all the things that needed to be secure. Hooray, we're finished.
But if you're like, well, I installed a firewall, but there's, but, but I've never thought about social engineering. That's negligence. I mean, when you're first starting out, it's not,
it's whatever, but like at some scale at something, or especially if you're promising
that security is one of your features or benefits or whatever you'd like to call it.
Okay. Well then if you're not doing social engineering training then that is negligence if you don't have keylogger detectors
on the laptops that's negligence when we installed that by the way it must have been a decade ago
we found like a 10 of the laptops had a keylogger on it oops like it's there it's definitely like
all this shit is happening for sure. No doubt.
The only question is like,
are you looking,
do you know to look,
are you doing something about it?
So if you're doing a ton of stuff like we are,
and there's still some crazy side route where this thing happened,
it's like,
right.
I mean,
that's not good.
We want to do something about it,
but it's not negligence.
There's a big difference.
Right.
You don't inadvertently take something down.
That's important.
Just because someone doesn't have a full understanding of what's going on it's all this stuff is complicated at massive scale the way
i look at it is if something happens i want the story of how it happened to be ludicrous it's like
they did this and then that and they use this thing and we don't even know about that and
and then the customer wrote their own code and that's where they got in in the first place and
then it's like okay that again that doesn't mean there's nothing we want to do about it there may
be things for us to do about it there may be lessons to learn no
problem no worries there but we can rest easy going like well geez if that's what it takes
then we're doing our job yeah we've successfully raised the bar of required to get into something
okay now it requires active unlikely misbehavior or choices made on behalf of customers. Yeah. Yeah. And,
and,
and like not obvious.
So like,
uh,
there was a security bug in some library we use and it was reported a month
ago and we still haven't upgraded and they got in that way.
Okay.
That's,
that's negligence.
Why didn't we update it in a certain timeframe?
Right.
Another way is a zero day bug was just reported.
We patched it within 12 hours,
but it was already exploited in hour one. Now is it negligence? No. We're doing actually way more
than any of our customers would have ever done for themselves. Far, far more. And we prevented
from nearly every single customer. And you have the telemetry and the organization around it to
be able to track that down and not say, well, we have no evidence of any compromise. Yeah. Cause you don't have
logs turned on. Right. Right. Right. So it's like, well, we protected, you know, 99.996%
of our customers from it, but a couple of them before it's like, well, then we're really again,
then, okay, that's back to it. We have a thing that we use in our values that again, I keep,
I keep coming back to that because we actually have them.
And you know that because you keep referring to them and using them to make decisions.
So one of them I really like is it's called do the right thing, which doesn't mean anything by itself.
That's just nonsense.
But what it says next is to define it is if it's right for the customer and right for the company and you're proud of your decision, then you've done the right thing.
And this discussion right now with security is an interesting application of this. So when something happens, you look at it and you say, are we proud of this,
or are we kind of not proud about this? Well, if it's some crazy ass whatever thing, you're like,
yeah, we're fine. As opposed to, oh my God, you guys, we should have gotten that.
How did we miss? Like we, that should not have happened. I'm not proud of that happened. I,
so what's funny is on the one hand, being proud of something sounds so subjective.
How is that an objective measure? But what's funny is it isn't subjective at all. You know,
immediately when you're proud, it's hard to codify in a handbook somewhere in a way that any,
that you're going to be able to distort into something that fits in a contract, but you know, it's hard to codify in a handbook somewhere in a way that any that you're going to be able to distort into something that fits in a contract.
But, you know, exactly right.
If it's in a contract, never mind.
Right.
But like just from one person to another, are you proud of this?
You know, the answer immediately.
And so it actually works really well, in fact, because something happens.
Everyone's like, oh, everyone knows we're already agreeing here.
It is objective. You know, not that there's no gray hairs ever, but, everyone knows we're already agreeing here. It's,
it is objective, you know, not that there's no gray hairs ever, but you know, you get the idea.
So I'll give you another example where this is a fun application. Another thing you get in hosting is our customers put all sorts of stuff on the web. Is it okay? Are we okay with it being on the
web? How much are we looking? How much can we look? These are all kinds of things where you
deploy these same kinds of responses to abuse reports, but that doesn't mean you're basically crawling
everyone's website in your spare time to see what they've got.
Yeah.
So there's things like, you know, Cloudflare goes through this kind of famously all the
time where there's somebody and they're saying stuff and it's really offensive to some people.
And some people say Cloudflare should shut them down because this is over the line.
And some people say Cloudflare should not shut them down because it's the Internet and they shouldn't make those decisions.
It shouldn't be up to Cloudflare.
And I think both of them have a point.
Hence the dilemma.
It's like they both have a point to make.
Cloudflare has a way of finding themselves in the middle of that debate over and over and over again in a way that other providers never seem to.
Well, so much of the Internet goes through there and people use that to keep their websites up. So
I mean, we have that too. It's not quite as in the news, but we have that as well.
So here's my attitude. And of course, you don't have to agree with me, but here's my attitude
on that. You don't have to agree with me, but let's say as someone who's had to wrestle with
such things, and we as a company have had to figure out procedurally how do we wrestle with such
things and we really struggle with it is this i want to see the organization struggle i want them
to i want to see them go on the one hand this and we value that and the other hand this other thing
we value that too we you know we a lot of us don't like what they say but that can't be it the right
way to do it and and and uh And we do believe in free speech.
We do believe in the internet, blah, blah, blah, blah.
And that makes sense.
And yeah, who are we to make that decision?
And yet we do have to make the decision because we're here.
We do have that.
And yet, yeah, should we even?
And well, we have this in our terms of service and maybe we should invoke it.
But you could read it this other way because, of course, there's gray areas.
Some things are black and white, but some things are not none of these
conversations are simple yeah and like so i want to see that i want to see them going ah
but but we really value this we value that yeah i want to see the struggle and then
i don't care how they just resolve it i want to see them because that means they're trying to do the
right thing, whatever that means to them. And none of us will always agree on what someone
else ends up deciding in a thing like that. None of us will agree with each other a hundred percent.
So that can't be the metric or that can't be the thing that decides whether they're trying to do
the right thing, but they're proud of that decision. So to me, if, if, if I see our team
struggles and struggles and then comes up with a answer, I'm proud of that. I're proud of that decision. So to me, if I see our team struggles and struggles
and then comes up with an answer, I'm proud of that. I'm proud that we tried really hard
and we came up with something. We had some rationale. Of course, not everyone will agree.
I'm proud of that. I'm proud of that way of deciding. I think that has to be good enough.
I mean, it has to be, of course, genuine, right? If they genuinely genuinely humans tried to figure it out and and like are we
getting a million of those a day or have we improved and improved and improved our aup such
that only the hardest craziest things are still hard and everything else is is uh is known because
again like if the answer is no oh well then we're being negligent about having a good
policy but if the answer is yes yeah almost everything we're being negligent about having a good policy but if the answer is
yes yeah almost everything is handled and this is just one of those very few things that still fell
in that that that gap again i'm proud of that then i'm proud of our policy i'm proud that this is this
is rare and uh then we struggle good like i mean that's at least that's my approach so
there was this lady there was a late i don't remember when it was, but I remember
one of these times with Cloudflare, they put out this big letter explaining their struggle.
And I remember reading the letter and just thinking, there's the struggle.
That's what I, personally, I'm like, see, so I'm happy whichever way they went, because
I feel like they're trying to do the right thing.
They cared enough for it to bother them.
Yeah.
Yeah. And then they cared enough to tell everyone, thing. They cared enough for it to bother them. Yeah. Yeah.
And then they cared enough to tell everyone, like, it's just, and so of course people are
like, no.
And some people are like, yay, you know, whatever.
What can you do?
So that's, of course, that's the outcome.
So I really want to thank you for taking the time to speak with me.
If people want to learn more about how you view this and so many other things, where's
the best place for them to find you these days?
Sure. So for me personally, it's asmartbear.com, like the animal.
We both like animals, I guess. My previous company was called Smart Bear. That's why it's called that, because it's my online identity from long ago. And then of course, WP Engine is wpengine.com.
And we'll of course put links to both of those things in the show notes. Thanks for having me. And I hope you see I didn't duck any of the questions.
No, you did not. It's appreciated. Thanks again for agreeing to do this. I really appreciate you
taking the time. It was fun. Great topics. Jason Cohen, founder of WP Engine. I'm cloud
economist Corey Quinn, and this is Screaming in the Cloud. If you enjoyed this podcast,
please leave a five-star review on your podcast platform of choice.
Whereas if you hated this podcast,
please leave a five-star review
on your podcast platform of choice,
along with an angry, insulting comment
that won't get published correctly
because your platform of choice
decided to run its own WordPress instance instead.