PurePerformance - 013 Pat Meenan (Google and WebPageTest) on Correlating Performance with Bounce Rates

Episode Date: September 26, 2016

Pat Meenan (@patmeenan) is a veteran when it comes to Web Performance Optimization. Besides being the creator of WebPageTest.org he has also done a lot of work recently on the Google Chrome team to ma...ke the browser better and faster.During his recent Velocity presentation on “Using Machine Learning to determine drivers for bounce and conversion” he presented some very controversial findings about what really impacts end user happiness. That it was not rendering time but rather DOM Load Time that correlates with conversion and bounce rates. In this session we dig a bit deeper into which metrics you can capture from your website and presented them to your business side as an argument for investing in faster websites. Find out which metric you really need to optimize in order to “move the needle”Related Links:* Using machine learning to determine drivers of bounce and conversion - Velocity 2016***** https://www.youtube.com/watch?v=TOsqP16jnDs* WEBPAGETEST***** https://www.webpagetest.org/* WPO-Foundation Github repository for machine learning***** https://github.com/WPO-Foundation/beacon-ml

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatches ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to episode lucky 13. That's right, this is episode 13, so I suspect that before the end of the day, one of us will be dead. With that, my name is Brian Wilson. Andy, hello, how are you doing? I'm good, I'm still laughing about your comment. I hope it's not going to be me because I have plans on the weekend. So I would really be happy if I could actually take my trip down to South America and not having to die today.
Starting point is 00:00:56 It would be a waste of money on the airfare. I think all of us would be happy if we didn't die today. So wait, if that's the case, you're pushing it either on me or our very special guest today. So that's a little selfish. Oh, that's tough. Yeah. You know, we don't want Pat to die either. So Brian.
Starting point is 00:01:13 I guess I'll take one for the team. Okay. This is my last episode, everybody. Thank you so much. All right. And Andy, would you like to introduce? I would love to. Yeah. Our guest speaker today, Pat Minen. Well, I assume if I say the name Pat Minen in the performance world, especially when it comes to front-end performance, people will be excited because I'm excited every time when I see Pat speak or just meet him and bump into him at different conferences.
Starting point is 00:01:57 But, Pat, maybe you want to introduce yourself, who you are in case people don't know who you are, why you are, as I believe, a big shot in web performance and front-end performance optimization. Because I know you've been with different companies over the years and kind of take over for what you've done. Sure. I love talking web performance all day long every day, so I'm happy to be here with you guys chatting about it. I'd counter the big shot side of things. I'm just the guy who has a website. You're being humble, but you're a very well-known name.
Starting point is 00:02:26 I get paid to do stuff I enjoy every day. I'm spoiled that way. So right now I currently work at Google on the Chrome team, largely trying to make the web faster in general, both by making Chrome faster in cases where I can and on the website by trying to get websites to be faster as much as possible. And a lot of that's based on the work that I do with WebPageTest and WebPageTest.org, which is an open source web performance testing tool, if you would, that a lot of people use to measure the performance of their websites. And it's very developer focused in that it tries to give you as much detail about why your site is performing the way it is so that you can hopefully go and optimize it and end up
Starting point is 00:03:20 with a much faster website. And so I've been at Google for going on, I think, six years now. Originally joined as part of the Make the Web Faster team. So in my time there, I've been largely doing the same things, just now it's within the Chrome team. And before that, I was at AOL for 10, 11 years, give or take, something like that. Also working on performance. Originally started working there on the connectivity team, working on the networking stack. And this is, I mean, it's a little scary to think, but it's, what, 15, 16, 17 years ago.
Starting point is 00:04:02 But at the time when I joined, it was all dial-up networking. And I was working on the dial-up networking stack and routing IP through AOL's proprietary network and out to the web. And then trying to figure out how to make that as fast as possible. And AOL's internal web browser as the web became more popular as fast as possible. And it just sort of evolved there. And AOL is actually where I was when WebPagetest was created. I created it there internally initially as a tool that we were using to measure the performance. And then they were kind enough to let me open source it.
Starting point is 00:04:45 And it just sort of grew a life of its own after that. Well, you have obviously, you know, you can call yourself now a big shot because you basically lived through the whole growing up phase and now the kind of, you know, parental phase or whatever you want to call it of the web. So you know all the details in the beginning on a network perspective. And now obviously you focus a lot on front-end performance, helping web developers to build better websites. It's phenomenal what you've done and thanks for WebPageTest. So I can just encourage everybody, go to WebPageTest.org, enter your URL, choose a test location, and then see how fast or slow, hopefully fast, your website loads. And not to embarrass you by singing your praises, but I spent about 10 years in load performance and we started getting interested in front end, the impact and reading stuff that people
Starting point is 00:05:39 like you and Steve Souders put out really gave us a new life and gave us enthusiasm to get back into being enthusiastic about performance. So just a great thanks there. And I think, um, you might've missed it cause I don't think I had the audio routed, but you mentioned starting up in dial-up days, but the very beginning of the intro is a, uh, dial-up modem. So sorry, I didn't have that routed to give you some nightmares. Anyhow. Flashbacks. We went to the talk today about a really, really interesting presentation you gave with Tammy Everts back at Velocity a couple months back about machine learning in terms of conversions and bounces.
Starting point is 00:06:23 That was a very, very fascinating. Andy, I think you were there live. I watched it on the internet and we'll put the links to some of this stuff on the page. Andy, you were there, correct? I was there. Yeah, that was back in Santa Clara. And I know that, Pat, you said,
Starting point is 00:06:37 I mean, once this show airs, you will have been doing also an updated version of it in New York in the upcoming Velocity. But yeah, I was there. And I think I need to look it up now. But I think on my blog, on blogdanoschweiss.com, I actually wrote a little summary and kind of try to catch and capture also the skepticism and the critical feedback that came
Starting point is 00:06:58 out of your initial findings about which metrics actually correlate to conversion and bounce rates. And yeah, it was a very interesting talk. But yeah, I want to hear more about it and maybe also for the audience, get some background on what you actually did, which data you used, and what it actually was that you tried to find out. Yeah, no, that'd be great. I mean, it was controversial for me as well. When I first saw the results, it was driving me insane. But we'll get to that in just a second.
Starting point is 00:07:34 So what we started out doing is we partnered up with SOSTA. I'm sure I'm going to pronounce it wrong. I'll call them SoStuff right now. They've got a whole lot of rich performance data from the field for their customers, as well as business data, both around bounce and conversion. And we wanted to take machine learning, apply it to their data set, and see if we could find out what were the key drivers for bounce and conversion and how much did they impact bounce and conversion. And so you can do it, I'm sure, with all sorts of classic stats and analysis and regression analysis and everything else.
Starting point is 00:08:23 But it was, for me, not being an analysis geek and being more of a code monkey, it was a whole lot easier to just throw all the data at machine learning, let it run, figure out the relationships, what matters, and how things work, and then just ask it for what the results look like. It turns out, and this won't be surprising to anyone who's deep into machine learning and stuff that second part of getting the answers back out is actually something that machine learning is horrible at and it's actually designed not to be you're not really supposed to be able to introspect and figure out why the machine is figuring out the relationships it did.
Starting point is 00:09:10 Just it's more about teaching it something and then it can predict based on that. But we did find different ways. And the full presentation and video will have like the details on how the neural network. And then we actually ended up using a random force the first time around, uh, because we could get the data back out. Um, and the controversial parts are, um, it turns out, so I'm a huge fan of render and user facing metrics. Uh, that's a lot of what webpage chess has pushed for, um, start render being my main metric or the main easy to measure metric. This is the first time the user sees anything. If they're not seeing anything, you might as well not be serving a page, right?
Starting point is 00:09:56 It's this huge metric in the web performance space. And at the end of the day, what the machines told us were, yeah, that metric doesn't matter at all. It's way out here at the end of the day, what the machines told us were, yeah, that metric doesn't matter at all. It's way out here at the tail, almost no impact on the results. And the big drivers were the DOM content ready, which is effectively the same as DOM content loaded, but it's their sort of polyfill version that also works on browsers that don't support navigation timing, and the page load metric. And so those two were the main metrics from a performance perspective that drove bounce and conversion. And in the case of bounce, those were the two metrics that drove basically all of bounce.
Starting point is 00:10:44 Once you got past those two metrics, almost nothing else mattered. In the case of conversion, it actually ended up being that there were a whole lot of sort of page structure features that mattered more than performance. Things like number of images on a page,
Starting point is 00:11:00 number of scripts, number of DOM elements. But I'll get to why I don't think that's necessarily accurate in a second. But since the initial findings and, you know, what we talked about in Santa Clara, we've done a whole, well, I've done a whole lot more of sort of the machine learning and figuring out. And it ends up being, you really need to know your data set really well. And a lot of browsers don't support measuring start render.
Starting point is 00:11:35 So a lot of the beacons didn't have a render time included in them, which is why it ended up not being important for bounce. So it ended up being, I think, something like 30% of the beacons actually had a start measure time because you can't get it start render because you can't get it from Safari and Firefox. And so it ended up not being useful figuring out if the session would bounce or not because it wasn't there a large percentage of the time. And when I re-ran the data filtering it out to just sessions that did include start render time,
Starting point is 00:12:11 it ended up being almost as important as DOM content ready, but it still wasn't more important than DOM content ready. So the findings in the initial results are still, as far as what matters, are still important. As far as the controversial, well, render doesn't matter at all, I'm glad I actually understand why now in the data set. And it's not the case. So render still matters. The world is still good. I remember you were pretty upset when you were making that announcement. I think, and we'll probably be getting into this, so I don't want to jump the gun, but it almost seems like a lot of it might also pertain to the verticals that this is being run in, right?
Starting point is 00:12:53 So this test was your initial test, and I would assume possibly your follow-up reanalysis was all based on that commerce system. And you even mentioned in the talk, it'd be interesting to see how this would look for a content-only type system or different kinds of applications if any of these things would change. Because some of the conversion rates, like number of scripts and all that, seem to maybe apply more to something
Starting point is 00:13:20 where it is commerce. And yes, there might be more scripts towards the end when someone's checking out. Right. And so the data set itself was actually a fairly good mix of e-commerce and publishing, um, which is good, but it's also broad and varied. Um, the conversion results are definitely for e-commerce only because those were the only sets of data that we had conversion for. But that's sort of another reason I tend to like the bounce side of the data better because it applies both to published content. You generally want people to consume more of your content. Just don't put up the stupid slideshows where you need to click every page to force it. But I think the main question for me is, this is how it looked for a broad set of sites for your specific site. You know, the typical, your mileage may vary.
Starting point is 00:14:28 There may be certain aspects to your user base or to your site that are kind of washed away a little bit in the large aggregation. So I do highly recommend you take your own beacon data and rerun the same kind of analysis on your specific data set so that you can see if the same things hold true for you. I'd be surprised, at least on the bounce side of things, if that wasn't the case just because effectively what we ended up measuring was the DOM content ready was the most important proxy or most important impact on bounce. And that's sort of the closest
Starting point is 00:15:10 proxy for user experience you have that works across all of the browsers. It includes sort of both usually most of the initial page structure is rendered by then and the page is functional at that point. And then fully loaded is sort of the tail, the typical, hey, this is my page load time. And both of those ended up being really critical. And we'll have some graphs that show up in the New York version of the presentation where we can actually, with the deep learning neural net, we can plot out the probability of bounce as either one of those varies. And it's interesting how quickly it ramps up. Like in the DOM content loaded event, from 100 milliseconds to one second,
Starting point is 00:16:00 I think it's like a really steep ramp in the probability to bounce. And once you get past one second out to about six seconds, it tails off a little bit, but it's still a very linear relationship where the probability to bounce almost doubles going from one second to six seconds. And then after that, once you're past six seconds for DOM content loaded, it almost doesn't matter. You're so slow that they're more likely than not going to bounce anyway. Or these are just like I remember I had a conversation with the guys from Nordstrom a couple of weeks ago.
Starting point is 00:16:40 And they were actually saying, which is now also very controversial to your findings. They said, you know, they have total different numbers because they have a total different user base. Their users are typically very loyal and they know what they want. They shop, quote unquote, regardless of performance and render time. I mean, they saw impact, obviously, but they say they saw different patterns than to what is out there in the industry, you know, published by folks like you and the Amazons and the Facebooks of the world. Because as you said, you have to look at your data because you are not like everybody else. You have something unique, something unique, and you might be a high value brand. And people that go to your site, they probably go to your site because they want to shop with you. And then they have a different user behavior than the regular user that wants to buy something on a low-budget e-commerce site where he has 50 other options.
Starting point is 00:17:36 Yeah, and I mean I could see definitely where that's the case. They're going to convert whether or not it's slow. I mean to a certain point, I expect. But there's probably even in those cases, there's probably situations where like the number of card items or the number of pages that they go through looking through stuff or randomly exploring other features of the site before they go to the checkout, where they'll see more engagement the faster it gets. Even though at the end of the day, it's a dedicated user who absolutely is going to buy it from Nordstrom's, how much they engage with the Nordstrom site as they're doing that is probably, I expect they're going to find a very direct relationship. So that's actually an interesting point, which is, well, did you also look at like the number of steps it takes to go from A, from start to conversion and how that
Starting point is 00:18:33 translates into higher conversions across the different pages? No, we haven't done that kind of analysis yet. You know, a lot of this, hopefully, we open sourced, and I mean, to say we open sourced it is, there's not a lot of code involved in doing machine learning with this kind of data. I'm hoping we get a lot more people looking sort of at a lot of different aspects of their metrics and providing information on what they're seeing in their environment. And for some people, that kind of business metric is something that they track, the number of steps, the number of cart items, the shopping cart size, that kind of stuff. None of that is sort of at least available to us in the broad aggregate set of data that we had that we were looking at.
Starting point is 00:19:22 I think some of our customers, so we also have a RAM solution. We call it Dynatrace UEM, use experience monitoring. And what we do now, and we encourage everyone that uses our solution to work with your model and try to figure out the correlation between the metrics. We also see a lot of our customers using the data, feeding it into Elasticsearch and then making sense out of it. And because we have the context of – so we see every single user and every single click along the the website, have this type of conversion where another type of conversion. So there's a lot of interesting conclusions that we can draw from the data, obviously. But as you said, it's always hard to get the right answers out of the system, even if you have a machine learning system.
Starting point is 00:20:19 You need to ask the right questions. And you have to be really careful about how you ask them um so like in the case of the the latest run even on the bounce data um what i ended up doing is i'm brute forcing uh the importance where we'll since there's so few metrics that we're throwing the neural nets for this kind of analysis are really easy and light. It's not like language processing or anything like that. So we can actually try independently train every separate metric and then see, okay, which one predicted best. And from there, pick the next metric and see how they predicted best. And once, if you're watching it as it's going, you start learning more about why it's
Starting point is 00:21:07 discovering relationships and you go, okay, I'm going to explicitly deny you access to this metric. And for me, the most clear case of that is we have in the data set, we had median, max, and average, I think for each one of the metrics. And so the median DOM content ready time was the most important metric. And then if you take that and also look at all of the other metrics together, it turns out that the max DOM content loaded time was the second most important metric. And combined with those two, it could 99% accurately predict if a session bounced or not.
Starting point is 00:21:49 And it becomes pretty obvious when you look at it and go, oh, you know what? If those two numbers are different, there were obviously more than one page load in the session, so it clearly wasn't a bounce. So you can just do a simple,
Starting point is 00:22:02 hey, is the median and the max different? If so, it bounced. If not, it didn't. So you can just do a simple, hey, is the median and the max different? If so, it bounced. If not, it didn't. So you have to be careful kind of throwing all of the raw data at it because it might find relationships that are more sort of along the lines of what you were measuring as the output rather than user behavior. And so the perfect case for that, and this is why I'm much more comfortable with the bounce data than I am with the conversion data, is I am not entirely convinced that the machine learning model didn't discover and learn what checkout pages look like and go, okay, well, if the max number of scripts on this session was 200 or 300 or some fixed number that matches all of the checkout pages and all of the data set, it could go, hey, there was a conversion page
Starting point is 00:23:03 somewhere in this session. The conversion page always has more script than all of the data set. It could go, hey, there was a conversion page somewhere in this session. The conversion page always has more script than all of the others. This session obviously converted. So I haven't had a chance to do it yet. One of the things I want to do is I want to filter and instead of getting looking at sort of aggregates across the entire session, I want to look at just the arrival page. When someone first landed on the site, look at all of the metrics for that page only, and then the end result of the session if they bounced or converted, so that there's sort of no influence of this is what a converted page looks like or a checkout page looks like on the data set. But it's definitely illuminating as you sort of work through it and look as it's picking out individual features. You have to have sort of a mental filter
Starting point is 00:23:53 that you're applying and you have to go, okay, no, you're not allowed to look at those because of what you picked the first go around. And I think SSL was another one that you had to eliminate from conversions, correct? Yes. That's an easy predictor, right? Yeah. So that was one that came out really obviously as if there was SSL anywhere in the session, well, checkout pages are more likely to be SSL. They didn't convert because it was SSL. It was detectable as a conversion because there was SSL. And both of these, the tree and the machine learning, those are on GitHub, I believe, right? So we'll put those on.
Starting point is 00:24:31 It kind of sounds like, you know, I'd say early but still current days of genome sequencing, right? As you mentioned, there's not much code involved, but figuring out the right way to look at it, the right way to process the right inputs, that's where all the magic and all the work comes into it. So that kind of sounds like a lot more of the challenging part of this. And I think for anybody who's looking to try this out, it's, you know, don't jump up and down after your first run. It sounds like it's going to have to be, you know, a lot of really analyzing what you're feeding in and thinking about why some of those you might have had some of those outputs and outcomes that you're seeing. And be Yeah, be very skeptical about your results and
Starting point is 00:25:15 and try to tear them apart yourself first. Because you're gonna discover a lot as you're doing that. But yeah, I totally expect this is the very early days. And I really expect anyone who has any idea what they're doing with stats and analysis is just going to look at me and laugh because I'm sure they have all sorts of better ways for doing a lot of this. For me, the machine learning is working really, really well, and I don't have to know all of the rest of the ways to do it. I think it also starts that conversation too. You know, if someone does see a better way or ways to build upon it, well, at least you're putting this out there and getting this rolling. Cause I think it's a, it's a brand new way of, of analyzing. And I think a lot of it's going to come down to, as you were saying, it's going to be individual to your organization, to your
Starting point is 00:26:01 application. They'll probably will be, especially on on on the bounce side some some commonalities that can start coming out as industry best practices but i think there's going to you know we'll likely find little tweaks and individualities and all of them so it's going to be interesting seeing seeing what comes of all this andy what were you going to say there sorry yeah i want i want to say kind of as a summary and also like like a a list for for people that would like to try this so what are the what are the minimum metrics that you suggest people need to capture in their beacons so that they can actually start digging into this i know we will post the links to the video and the github repository but just so people know hey am i can i actually do this because i I have the metrics or I don't have them?
Starting point is 00:26:46 And if I don't have them, what do I need to do to get them? So kind of the minimum set of measures of metrics that you believe they have to same metrics apply to your site, really all you need are DOM content loaded time see how accurate it is at predicting bounce or not to give you an idea, is this really the set of metrics that are needed for my site? In the case of the deep learning model we trained between those two metrics, it could predict a bounce to roughly a 90% accuracy. So if you're in that ballpark, then yes, those are the two metrics that matter for you. And then you can plot out the probability distribution for bounce from the model. So you can see, for me, the really exciting graphs are to see, okay, how much do we need to improve performance by and how much should our bounce rate improve by that?
Starting point is 00:28:08 So basically that then allows teams to say, hey, if we move the needle by 500 milliseconds to the left, meaning 500 milliseconds faster, that based on the model should increase conversion rate or drop bounce rate by 15%. And are we willing to invest in 500 milliseconds more that kind of value yeah that's perfect cool yeah that's awesome because people are and obviously you always have to have the business justification as well we all know we want to make the websites faster but if you actually can give numbers and say, this is the predicted benefit we have on our end users. And then. Right.
Starting point is 00:28:48 And then you're competing in the same game as the marketing teams that are trying to add tag managers and everything else, right? Where you have, hey, here's the revenue impact of this work or even the revenue cost. If we add this tag manager, it's going to slow this metric down by 200 milliseconds. It's going to cost you X in bounce or conversions. easy communicate to the business side and say, well, we need to invest this much money in order to make it faster because that gives you in the end so much more money. So are you willing to make that technical investment? I think that's also great.
Starting point is 00:29:35 Right. And yeah, the competing for me really is the competing for those technical investment dollars, where is you've got the product teams, the marketing teams, everyone's got their features that they're throwing in the bucket. And it's always about, well, we only have so many engineering resources, where do we focus the time and performance usually ends up getting dropped, because, hey, we have these shiny new buttons that we need. And they've communicated that these shiny buttons are going to increase the conversions or whatever. So now you have sort of a stake in the game where you can say, hey, focusing on the infrastructure to improve metrics by this will result in these dollars.
Starting point is 00:30:16 And you're sort of on the same same playing table as that. I love the the the mindset of, hey, this this shiny new slow feature is going to increase conversions. That's always kind of from the marketing side sort of thing. Yeah, but on the other side, I mean, I want to counter you here, even though that may open up a total new topic on its own. If you are delivering something fast that is not relevant for the end user and appealing, then it doesn't help you either. So, I mean, the whole thing we miss out here is obviously the relevancy of the content, of the products we sell on that page, right?
Starting point is 00:30:51 And it's obviously something we can't measure. But the reason why I want to bring it up, I was at the DevOps Days in Boston last week and there was one guy, I don't remember his name, he said a very interesting, he had a very interesting example where he said from his engineering perspective, he had to implement
Starting point is 00:31:05 a change that did not make at all sense to him as an engineer, colors changing and something like that. But he was proven wrong because it automatically jumped in, the conversion rate jumped by 10%. And so he said, the business sometimes doesn't understand us, what we want to do, but sometimes we also don't understand the business. So there is obviously – if we can prove with our numbers that we can move the needle, that's great. But I think there are some things like content relevancy that sometimes we, as we are too technical, too deep in the weeds, we always say sometimes also don't understand why it impacts the end user because we are not always our end user don't know how they how they react does this make sense yep
Starting point is 00:31:51 oh yeah no absolutely um yeah if you don't have the content for them to consume or the stuff that they want to buy there's no reason so the question is if we could have an additional metric that we could add to the mix would be saying like relevancy for the end user. But that metric can only be, I guess, captured by asking the end user. But yeah, nothing we can do automated. It just kind of brings into the idea. I think a lot of people, when we talk about DevOps kind of things, look at how does the technical team open the eyes of the business? And kind of what you're, it sort of seems,
Starting point is 00:32:25 sounds like a, a one-way street, like the technical team is going to reach out and be, you know, help out all the other teams, but it's never thought of until kind of what you're saying there. How is the business team going to reach out and educate the technical team? Exactly. Interesting. Anyway, that's a whole different topic. Um, if, uh, if any other thoughts on, on this thoughts on this topic before we wrap up on this here? No, I mean, I'm just looking forward to seeing what other people can do with it and certainly challenge even all of the stuff that sort of I've been putting out there, we've been putting out there, because that only makes the results better, right? As we're seeing with the second iteration on the data and understanding, like, why render didn't show up the first time around. All right, well, thank you. This will wrap up part one of our conversation with Pat Meenan. Tune back in, we're going to have part two. It should
Starting point is 00:33:22 be published at the same time, I believe, but we will be discussing some more performance optimization with Pat Meenan coming up very soon.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.