PurePerformance - 013 Pat Meenan (Google and WebPageTest) on Correlating Performance with Bounce Rates
Episode Date: September 26, 2016Pat Meenan (@patmeenan) is a veteran when it comes to Web Performance Optimization. Besides being the creator of WebPageTest.org he has also done a lot of work recently on the Google Chrome team to ma...ke the browser better and faster.During his recent Velocity presentation on “Using Machine Learning to determine drivers for bounce and conversion” he presented some very controversial findings about what really impacts end user happiness. That it was not rendering time but rather DOM Load Time that correlates with conversion and bounce rates. In this session we dig a bit deeper into which metrics you can capture from your website and presented them to your business side as an argument for investing in faster websites. Find out which metric you really need to optimize in order to “move the needle”Related Links:* Using machine learning to determine drivers of bounce and conversion - Velocity 2016***** https://www.youtube.com/watch?v=TOsqP16jnDs* WEBPAGETEST***** https://www.webpagetest.org/* WPO-Foundation Github repository for machine learning***** https://github.com/WPO-Foundation/beacon-ml
Transcript
Discussion (0)
It's time for Pure Performance.
Get your stopwatches ready.
It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to episode lucky 13.
That's right, this is episode 13, so I suspect that before the end of the day, one of us will be dead.
With that, my name is Brian Wilson. Andy, hello, how are you doing?
I'm good, I'm still laughing about your comment.
I hope it's not going to be me because I have plans on the weekend.
So I would really be happy if I could actually take my trip down to South America and not having to die today.
It would be a waste of money on the airfare.
I think all of us would be happy if we didn't die today.
So wait, if that's the case, you're pushing it either on me or our very special guest today.
So that's a little selfish.
Oh, that's tough.
Yeah.
You know, we don't want Pat to die either.
So Brian.
I guess I'll take one for the team.
Okay.
This is my last episode, everybody.
Thank you so much.
All right.
And Andy, would you like to introduce?
I would love to. Yeah. Our guest speaker today, Pat Minen.
Well, I assume if I say the name Pat Minen in the performance world, especially when it comes to front-end performance, people will be excited because I'm excited every time when I see Pat speak or just meet him and bump into him at different conferences.
But, Pat, maybe you want to introduce yourself, who you are in case people don't know who you are, why you are, as I believe, a big shot in web performance and front-end performance optimization.
Because I know you've been with different companies over the years and kind of take over for what you've done.
Sure.
I love talking web performance all day long every day,
so I'm happy to be here with you guys chatting about it.
I'd counter the big shot side of things.
I'm just the guy who has a website.
You're being humble, but you're a very well-known name.
I get paid to do stuff I enjoy every day.
I'm spoiled that way.
So right now I currently work at Google on the Chrome team,
largely trying to make the web faster in general,
both by making Chrome faster in cases where I can and on the website by trying to get websites to be faster as much as possible. And a lot of that's based on the work that I do with WebPageTest and
WebPageTest.org, which is an open source web performance testing tool, if you would, that a lot of people use to measure the performance of
their websites. And it's very developer focused in that it tries to give you as much detail about
why your site is performing the way it is so that you can hopefully go and optimize it and end up
with a much faster website. And so I've been at Google for going on, I think, six years now.
Originally joined as part of the Make the Web Faster team.
So in my time there, I've been largely doing the same things,
just now it's within the Chrome team.
And before that, I was at AOL for 10, 11 years, give or take, something like that.
Also working on performance.
Originally started working there on the connectivity team, working on the networking stack.
And this is, I mean, it's a little scary to think, but it's, what, 15, 16, 17 years ago.
But at the time when I joined, it was all dial-up networking. And I was working
on the dial-up networking stack and routing IP through AOL's proprietary network and out to the
web. And then trying to figure out how to make that as fast as possible. And AOL's internal web
browser as the web became more popular as fast as possible.
And it just sort of evolved there.
And AOL is actually where I was when WebPagetest was created.
I created it there internally initially as a tool that we were using to measure the performance.
And then they were kind enough to let me open source it.
And it just sort of grew a life of its own after that. Well, you have obviously, you know, you can call yourself now a big shot because you basically
lived through the whole growing up phase and now the kind of, you know, parental phase
or whatever you want to call it of the web.
So you know all the details in the beginning on a network perspective.
And now obviously you focus a lot on front-end performance, helping web developers to build better websites.
It's phenomenal what you've done and thanks for WebPageTest.
So I can just encourage everybody, go to WebPageTest.org, enter your URL, choose a test location, and then see how fast or slow, hopefully fast, your website loads.
And not to embarrass you by singing your praises, but I spent about 10 years in load performance and we started getting interested in front end, the impact and reading stuff that people
like you and Steve Souders put out really gave us a new life and gave us enthusiasm
to get back into being enthusiastic
about performance. So just a great thanks there. And I think, um, you might've missed it cause I
don't think I had the audio routed, but you mentioned starting up in dial-up days, but the
very beginning of the intro is a, uh, dial-up modem. So sorry, I didn't have that routed to
give you some nightmares. Anyhow.
Flashbacks.
We went to the talk today about a really, really interesting presentation you gave with Tammy Everts back at Velocity a couple months back about machine learning in terms of conversions and bounces.
That was a very, very fascinating.
Andy, I think you were there live.
I watched it on the internet and we'll put the links
to some of this stuff on the page.
Andy, you were there, correct?
I was there.
Yeah, that was back in Santa Clara.
And I know that, Pat, you said,
I mean, once this show airs,
you will have been doing
also an updated version of it
in New York in the upcoming Velocity.
But yeah, I was there.
And I think I need
to look it up now. But I think on my blog, on blogdanoschweiss.com, I actually wrote a little
summary and kind of try to catch and capture also the skepticism and the critical feedback that came
out of your initial findings about which metrics actually correlate to conversion and bounce rates.
And yeah, it was a very interesting talk.
But yeah, I want to hear more about it and maybe also for the audience, get some background
on what you actually did, which data you used, and what it actually was that you tried to
find out.
Yeah, no, that'd be great.
I mean, it was controversial for me as well.
When I first saw the results, it was driving me insane. But we'll get to that in just a second.
So what we started out doing is we partnered up with SOSTA. I'm sure I'm going to pronounce it
wrong. I'll call them SoStuff right now.
They've got a whole lot of rich performance data from the field for their customers, as well as business data, both around bounce and conversion.
And we wanted to take machine learning, apply it to their data set,
and see if we could find out what were the key drivers for bounce and conversion
and how much did they impact bounce and conversion.
And so you can do it, I'm sure, with all sorts of classic stats and analysis
and regression analysis and everything else.
But it was, for me, not being an analysis geek and being more of a code monkey,
it was a whole lot easier to just throw all the data at machine learning,
let it run, figure out the relationships, what matters, and how things work,
and then just ask it for what the results look like.
It turns out, and this won't be surprising to anyone who's deep into machine learning and stuff
that second part of getting the answers back out is actually something that machine learning is
horrible at and it's actually designed not to be you're not really supposed to be able to
introspect and figure out why the machine is figuring out the relationships it did.
Just it's more about teaching it something and then it can predict based on that.
But we did find different ways.
And the full presentation and video will have like the details on how the neural network.
And then we actually ended up using a random force the first time around, uh, because we could get the data back out. Um, and the controversial parts are, um,
it turns out, so I'm a huge fan of render and user facing metrics. Uh, that's a lot of what
webpage chess has pushed for, um, start render being my main metric or the main easy to measure metric.
This is the first time the user sees anything.
If they're not seeing anything, you might as well not be serving a page, right?
It's this huge metric in the web performance space.
And at the end of the day, what the machines told us were,
yeah, that metric doesn't matter at all. It's way out here at the end of the day, what the machines told us were, yeah, that metric doesn't matter at all.
It's way out here at the tail, almost no impact on the results.
And the big drivers were the DOM content ready, which is effectively the same as DOM content loaded,
but it's their sort of polyfill version that also works on browsers that don't support navigation timing,
and the page load metric.
And so those two were the main metrics from a performance perspective that drove bounce and conversion. And in the case of bounce, those were the two metrics that drove basically all of bounce.
Once you got past those two metrics,
almost nothing else mattered.
In the case of conversion,
it actually ended up being
that there were a whole lot of
sort of page structure features
that mattered more than performance.
Things like number of images on a page,
number of scripts,
number of DOM elements.
But I'll get to why I don't think
that's necessarily accurate in a second. But since the initial findings and, you know,
what we talked about in Santa Clara, we've done a whole, well, I've done a whole lot more of sort
of the machine learning and figuring out. And it ends up being,
you really need to know your data set really well.
And a lot of browsers don't support measuring start render.
So a lot of the beacons didn't have a render time
included in them,
which is why it ended up not being important for bounce. So it ended up being,
I think, something like 30% of the beacons actually had a start measure time because
you can't get it start render because you can't get it from Safari and Firefox.
And so it ended up not being useful figuring out if the session would bounce or not because it
wasn't there a large percentage of the
time. And when I re-ran the data filtering it out to just sessions that did include start render time,
it ended up being almost as important as DOM content ready, but it still wasn't
more important than DOM content ready. So the findings in the initial results are still, as far as what matters, are still important.
As far as the controversial, well, render doesn't matter at all, I'm glad I actually understand why now in the data set.
And it's not the case.
So render still matters.
The world is still good.
I remember you were pretty upset when you were making that announcement.
I think, and we'll probably be getting into this, so I don't want to jump the gun, but it almost seems like a lot of it might also pertain to the verticals that this is being run in, right?
So this test was your initial test, and I would assume possibly your follow-up reanalysis was all based on that commerce system. And you even mentioned in the talk,
it'd be interesting to see how this would look
for a content-only type system
or different kinds of applications
if any of these things would change.
Because some of the conversion rates,
like number of scripts and all that,
seem to maybe apply more to something
where it is commerce.
And yes, there might be more scripts
towards the end when someone's checking out.
Right. And so the data set itself was actually a fairly good mix of e-commerce and publishing, um, which is good, but it's also broad and varied. Um, the conversion results are
definitely for e-commerce only because those were the only
sets of data that we had conversion for. But that's sort of another reason I tend to like
the bounce side of the data better because it applies both to published content. You generally
want people to consume more of your content. Just don't put up the stupid slideshows where you need to click every page to force it. But I think the main question for me is, this is how it looked for a broad set of sites for your specific site. You know, the typical, your mileage may vary.
There may be certain aspects to your user base or to your site that are kind of washed away a
little bit in the large aggregation. So I do highly recommend you take your own beacon data
and rerun the same kind of analysis on your specific data set
so that you can see if the same things hold true for you.
I'd be surprised, at least on the bounce side of things,
if that wasn't the case just because effectively what we ended up measuring
was the DOM content ready was the most important proxy
or most important impact on bounce. And that's sort of the closest
proxy for user experience you have that works across all of the browsers. It includes sort of
both usually most of the initial page structure is rendered by then and the page is functional
at that point. And then fully loaded is sort of the tail,
the typical, hey, this is my page load time. And both of those ended up being really critical.
And we'll have some graphs that show up in the New York version of the presentation where we can
actually, with the deep learning neural net, we can plot out the probability of bounce as either one of those varies.
And it's interesting how quickly it ramps up.
Like in the DOM content loaded event, from 100 milliseconds to one second,
I think it's like a really steep ramp in the probability to bounce.
And once you get past one second out to about six seconds,
it tails off a little bit, but it's still a very linear relationship
where the probability to bounce almost doubles going from one second to six seconds.
And then after that, once you're past six seconds for DOM content loaded,
it almost doesn't matter.
You're so slow that they're more likely than not going to bounce anyway.
Or these are just like I remember I had a conversation with the guys from Nordstrom a couple of weeks ago.
And they were actually saying, which is now also very controversial to your findings.
They said, you know, they have total different numbers because they have a total different user base.
Their users are typically very loyal and they know what they want.
They shop, quote unquote, regardless of performance and render time. I mean, they saw impact, obviously, but they say they saw different patterns than to what
is out there in the industry, you know, published by folks like you and the Amazons and the Facebooks of the world.
Because as you said, you have to look at your data because you are not like everybody else.
You have something unique, something unique, and you might be a high value brand.
And people that go to your site, they probably go to your site because they want to shop with you. And then they have a different user behavior than the regular user that wants to buy something on a low-budget e-commerce site where he has 50 other options.
Yeah, and I mean I could see definitely where that's the case.
They're going to convert whether or not it's slow.
I mean to a certain point,
I expect. But there's probably even in those cases, there's probably situations where like the number of card items or the number of pages that they go through looking through stuff or
randomly exploring other features of the site before they go to the checkout, where they'll see more engagement the faster it gets.
Even though at the end of the day, it's a dedicated user who absolutely is going to buy it from Nordstrom's,
how much they engage with the Nordstrom site as they're doing that is probably, I expect they're going to find a very direct relationship. So that's actually an interesting point, which is, well, did you also look at
like the number of steps it takes to go from A, from start to conversion and how that
translates into higher conversions across the different pages?
No, we haven't done that kind of analysis yet. You know, a lot of this, hopefully, we open sourced, and I mean,
to say we open sourced it is, there's not a lot of code involved in doing machine learning with
this kind of data. I'm hoping we get a lot more people looking sort of at a lot of different
aspects of their metrics and providing information on what they're seeing in their environment.
And for some people, that kind of business metric is something that they track,
the number of steps, the number of cart items, the shopping cart size, that kind of stuff.
None of that is sort of at least available to us in the broad aggregate set of data that we had that we were looking at.
I think some of our customers, so we also have a RAM solution.
We call it Dynatrace UEM, use experience monitoring.
And what we do now, and we encourage everyone that uses our solution to work with your model
and try to figure out the correlation between the metrics.
We also see a lot of our customers using the data, feeding it into Elasticsearch and then making sense out of it.
And because we have the context of – so we see every single user and every single click along the the website, have this type of conversion where another type of conversion.
So there's a lot of interesting conclusions that we can draw from the data, obviously.
But as you said, it's always hard to get the right answers out of the system, even if you have a machine learning system.
You need to ask the right questions.
And you have to be really careful about how you ask them um so like in the case of
the the latest run even on the bounce data um what i ended up doing is i'm brute forcing uh
the importance where we'll since there's so few metrics that we're throwing the neural nets for
this kind of analysis are really easy and light. It's not like
language processing or anything like that. So we can actually try independently train every
separate metric and then see, okay, which one predicted best. And from there, pick the next
metric and see how they predicted best. And once, if you're watching it as it's going, you start learning more about why it's
discovering relationships and you go, okay, I'm going to explicitly deny you access to this metric.
And for me, the most clear case of that is we have in the data set, we had median, max,
and average, I think for each one of the metrics. And so the median DOM content
ready time was the most important metric. And then if you take that and also look at all of
the other metrics together, it turns out that the max DOM content loaded time was the second
most important metric. And combined with those two,
it could 99% accurately predict
if a session bounced or not.
And it becomes pretty obvious
when you look at it and go,
oh, you know what?
If those two numbers are different,
there were obviously more than one page load
in the session,
so it clearly wasn't a bounce.
So you can just do a simple,
hey, is the median and the max different?
If so, it bounced. If not, it didn't. So you can just do a simple, hey, is the median and the max different? If so,
it bounced. If not, it didn't. So you have to be careful kind of throwing all of the raw data at it because it might find relationships that are more sort of along the lines of what you were measuring as the output rather than user behavior.
And so the perfect case for that, and this is why I'm much more comfortable with the bounce data
than I am with the conversion data, is I am not entirely convinced that the machine learning model
didn't discover and learn what checkout pages look like and go, okay, well,
if the max number of scripts on this session was 200 or 300 or some fixed number that matches all
of the checkout pages and all of the data set, it could go, hey, there was a conversion page
somewhere in this session. The conversion page always has more script than all of the data set. It could go, hey, there was a conversion page somewhere in this session. The conversion page always has more script than all of the others. This session obviously
converted. So I haven't had a chance to do it yet. One of the things I want to do is I want to filter
and instead of getting looking at sort of aggregates across the entire session,
I want to look at just the arrival page. When someone first landed
on the site, look at all of the metrics for that page only, and then the end result of the session
if they bounced or converted, so that there's sort of no influence of this is what a converted page
looks like or a checkout page looks like on the data set. But it's definitely illuminating as you sort of work
through it and look as it's picking out individual features. You have to have sort of a mental filter
that you're applying and you have to go, okay, no, you're not allowed to look at those because
of what you picked the first go around. And I think SSL was another one that you had to eliminate from conversions,
correct? Yes. That's an easy predictor, right? Yeah. So that was one that came out really
obviously as if there was SSL anywhere in the session, well, checkout pages are more likely
to be SSL. They didn't convert because it was SSL. It was detectable as a conversion because
there was SSL.
And both of these, the tree and the machine learning, those are on GitHub, I believe, right?
So we'll put those on.
It kind of sounds like, you know, I'd say early but still current days of genome sequencing, right?
As you mentioned, there's not much code involved, but figuring out the right way to look at it, the right way to process the
right inputs, that's where all the magic and all the work comes into it. So that kind of sounds
like a lot more of the challenging part of this. And I think for anybody who's looking to try this
out, it's, you know, don't jump up and down after your first run. It sounds like it's going to have
to be, you know, a lot of really analyzing
what you're feeding in and thinking about why some of those you might have had some of those
outputs and outcomes that you're seeing. And be Yeah, be very skeptical about your results and
and try to tear them apart yourself first. Because you're gonna discover a lot as you're doing that.
But yeah, I totally expect this is the very early days.
And I really expect anyone who has any idea what they're doing with stats and analysis is just going to look at me and laugh because I'm sure they have all sorts of better ways for doing a
lot of this. For me, the machine learning is working really, really well, and I don't have to
know all of the rest of the ways to do it.
I think it also starts that conversation too. You know, if someone does see a better way or ways to build upon it, well, at least you're putting this out there and getting this rolling.
Cause I think it's a, it's a brand new way of, of analyzing. And I think a lot of it's going to
come down to, as you were saying, it's going to be individual to your organization, to your
application. They'll probably will be, especially on on on the bounce side some some commonalities that can start coming out as industry best
practices but i think there's going to you know we'll likely find little tweaks and individualities
and all of them so it's going to be interesting seeing seeing what comes of all this andy what
were you going to say there sorry yeah i want i want to say kind of as a summary and also like
like a a list for for people that would like to try this so what are the what are the minimum metrics that
you suggest people need to capture in their beacons so that they can actually start digging
into this i know we will post the links to the video and the github repository but just so people
know hey am i can i actually do this because i I have the metrics or I don't have them?
And if I don't have them, what do I need to do to get them?
So kind of the minimum set of measures of metrics that you believe they have to same metrics apply to your site, really all you need are DOM content loaded time see how accurate it is at predicting bounce or not
to give you an idea, is this really the set of metrics that are needed for my site? In the case
of the deep learning model we trained between those two metrics, it could predict a bounce to
roughly a 90% accuracy. So if you're in that ballpark,
then yes, those are the two metrics that matter for you. And then you can plot out the probability
distribution for bounce from the model. So you can see, for me, the really exciting graphs are to see,
okay, how much do we need to improve performance by and how much should our bounce rate improve by that?
So basically that then allows teams to say, hey, if we move the needle by 500 milliseconds to the left,
meaning 500 milliseconds faster, that based on the model should increase conversion rate or drop bounce rate by 15%.
And are we willing to invest in 500 milliseconds
more that kind of value yeah that's perfect cool yeah that's awesome because people are and
obviously you always have to have the business justification as well we all know we want to
make the websites faster but if you actually can give numbers and say, this is the predicted benefit we have on our end users.
And then.
Right.
And then you're competing in the same game as the marketing teams that are trying to
add tag managers and everything else, right?
Where you have, hey, here's the revenue impact of this work or even the revenue cost.
If we add this tag manager, it's going to slow this metric down by 200 milliseconds.
It's going to cost you X in bounce or conversions. easy communicate to the business side and say, well, we need to invest this much money
in order to make it faster because that gives you in the end so much more money.
So are you willing to make that technical investment?
I think that's also great.
Right.
And yeah, the competing for me really is the competing for those technical investment dollars,
where is you've got the product teams, the marketing teams, everyone's got their features
that they're throwing in the bucket. And it's always about, well, we only have so many engineering
resources, where do we focus the time and performance usually ends up getting dropped,
because, hey, we have these shiny new buttons that we need. And they've communicated that
these shiny buttons are going to increase the conversions or whatever.
So now you have sort of a stake in the game where you can say, hey, focusing on the infrastructure to improve metrics by this will result in these dollars.
And you're sort of on the same same playing table as that.
I love the the the mindset of, hey, this this shiny new slow feature is going to increase conversions.
That's always kind of from the marketing side sort of thing.
Yeah, but on the other side, I mean, I want to counter you here, even though that may open up a total new topic on its own.
If you are delivering something fast that is not relevant for the end user and appealing, then it doesn't help you either.
So, I mean, the whole thing we miss out here
is obviously the relevancy of the content,
of the products we sell on that page, right?
And it's obviously something we can't measure.
But the reason why I want to bring it up,
I was at the DevOps Days in Boston last week
and there was one guy, I don't remember his name,
he said a very interesting,
he had a very interesting example where he said
from his engineering perspective,
he had to implement
a change that did not make at all sense to him as an engineer, colors changing and something
like that.
But he was proven wrong because it automatically jumped in, the conversion rate jumped by 10%.
And so he said, the business sometimes doesn't understand us, what we want to do, but sometimes
we also don't understand the business.
So there is obviously – if we can prove with our numbers that we can move the needle, that's great.
But I think there are some things like content relevancy that sometimes we, as we are too technical, too deep in the weeds, we always say sometimes also don't understand why it impacts the end user
because we are not always our end user don't know how they how they react does this make sense yep
oh yeah no absolutely um yeah if you don't have the content for them to consume or the stuff that
they want to buy there's no reason so the question is if we could have an additional metric that we
could add to the mix would be saying like relevancy for the end user.
But that metric can only be, I guess, captured by asking the end user.
But yeah, nothing we can do automated.
It just kind of brings into the idea.
I think a lot of people, when we talk about DevOps kind of things, look at how does the technical team open the eyes of the business?
And kind of what you're, it sort of seems,
sounds like a, a one-way street, like the technical team is going to reach out and be,
you know, help out all the other teams, but it's never thought of until kind of what you're saying
there. How is the business team going to reach out and educate the technical team?
Exactly.
Interesting. Anyway, that's a whole different topic. Um, if, uh, if any other thoughts on, on this thoughts on this topic before we wrap up on this here?
No, I mean, I'm just looking forward to seeing what other people can do with it and certainly challenge even all of the stuff that sort of I've been putting out there, we've been putting out there, because that only makes the results better, right? As we're seeing with the second iteration on the data and understanding, like,
why render didn't show up the first time around. All right, well, thank you. This will wrap up part
one of our conversation with Pat Meenan. Tune back in, we're going to have part two. It should
be published at the same time, I believe, but we will be discussing
some more performance optimization with Pat Meenan coming up very soon.