The Data Stack Show - 79: All About Experimentation with Che Sharma of Eppo

Episode Date: March 16, 2022

Highlights from this week’s conversation include:Che’s background and career journey (4:23)Coherence between hemispheres in the human brain (6:58)Raising Airbnb above primitive AB testing technolo...gy (8:54)Economic thinking in Airbnb’s data science practice (14:24)Dealing with multiple pipelines (16:48)Eppo’s role in recognizing statistically significant data (20:01)Defining “experiment” (23:25)Types of experiments (25:57)The workflow journey (27:18)Dealing with metric silos (34:21)Why we still need to innovate today (37:03)Where experimentation can be used (39:36)How big a sample size should be (43:29)How to self-educate to get the maximum value (45:39)Bridging the gap between data engineers and data scientists (48:14)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com..

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, one platform for all your customer data pipelines. Learn more at rudderstack.com. And don't forget, we're hiring for all your customer data pipelines. Learn more at ruddersack.com. And don't forget, we're hiring for all sorts of roles. You have the chance to meet Costas and I live in person coming up soon in Austin, Texas. We're both going to be at Data Council Austin.
Starting point is 00:00:38 The event is the 23rd and 24th of March, but both of our companies are hosting a happy hour on the 22nd, the night before the event. So you can come out and have a drink with Kostas and I. Kostas, why should our visitors join us live in Austin? For tequila, of course. That could make things very interesting. I mean, yeah, it's a happy hour. People should come.
Starting point is 00:01:05 It's before the main event. So without getting tired from the event or anything, come over there, meet in person, something that we all miss because of all this mess with COVID. Have some fun, talk about what we are doing and yeah, relax and have fun. It's going to be a great time. Learn more at datastackshow.com. There'll be a banner there you can click on to register for the happy hour
Starting point is 00:01:31 and we will see you in Austin in March. Welcome to the Data Stack Show. Today, we are going to talk with Che from Epo, which is an experimentation platform that actually runs on your warehouse, which is super interesting. Can't wait to hear about that. But he has done some unbelievable things. Most notably, he really built out a lot of the experimentation framework and technology at Airbnb pretty early on. So he was there for four or five years. Costas, I'm really interested to know,
Starting point is 00:02:06 I want to ask him when he started at Airbnb, you know, the experimentation frameworks or experimentation products, as we know them now, we're nowhere near at the level of sophistication, right? So they probably had to build a bunch of stuff inside of Airbnb. And I think one of the side consequences of that is you think about testing is that you're sort of creating an additional data source that ultimately needs to be layered in with, you know, say product data or marketing data or whatever. And that's a pretty complex technical challenge. So I'm going to ask him how he sorted through that at Airbnb. How about you? Yeah, I think I'd like to get a little bit more into the fundamentals of experimentation,
Starting point is 00:02:58 mainly because I think it's many people, you know, especially like in product management, they consider, they take like experimentation very lightly. They think that's just like a tool that's like a kind of foracle that just tells you what you should be doing, right? Which obviously is not the case. And especially if you are in a company that doesn't have anyone with background in statistics or a data scientist or even an analyst, I think it can be very hard for someone to interpret correctly and make sure that the experiments that are running, they're doing that right. So I'd love to learn more about that. What is an experiment? How we should run them?
Starting point is 00:03:40 What's the theory behind them? Why we should trust them? What's the theory behind them? You know, like why we should trust them, you know, like all that stuff. And yeah, also see like if we can get some resources from Che about how we can learn more. I think for product managers, especially it's important to get educated before we start using these tools. I agree. Well, let's jump in and see what he has to say. Che, welcome to the Data Sack Show. We're so excited to chat with you. Yeah, I'm excited to talk with all three of you. Okay. Well, so much to cover today. You've done some amazing things at companies like Airbnb and Webflow, but just give us your brief background and what you're doing today. Yeah, absolutely. So my name is Che. I'm the founder and CEO of Epo,
Starting point is 00:04:25 which is a next-gen AV experimentation platform, one that's specifically focused on the analytics and reporting side of the workflow. A lot of it comes from my experience building these systems at, like you said, Airbnb, Webflow, a bunch of other places. My background is in data science. I was the fourth data scientist at Airbnb. I stayed there for about five years of 2012, 2017, which has really guided a lot of the way I think around the data ecosystem and seeing that how it played out at Webflow. In other words, it's sort of validated a lot of those things. So some concrete things to know about Airbnb, you know, it's founded by designers. And so that kind of leads into the way that the company likes to do
Starting point is 00:05:06 work. It comes much more from a Steve Jobs, like incubate something, focus on UX, and then release it with grand announcements more than a sort of iterative data-driven measurement metrics approach like a Zuckerberg or a Bezos type of thing. And what that really means as a data team is that in addition to needing to build all of these capabilities, you also had to win over the work, you know, into a certain way of thinking to believing metrics matter. So in addition to building infrastructure, we had to essentially win a culture war. And I, it was really interesting to see how that all played out. To me, the biggest piece of solving that problem was experimentation because experimentation as a function just unlocked so much in addition to this concrete
Starting point is 00:05:55 ROI of measuring things, it also fundamentally changes the way you do work where suddenly people have this really intimate connection with metrics, where you understand exactly what you drove, what you didn't drive, how your work plays into it. And it also unlocks this culture of entrepreneurialism, where people can take risks, try stuff out, validate ideas without winning political battles. So this combination of just incredible business ROI, plus changing your corporate culture to one that's kind of a little bit more fun was really
Starting point is 00:06:25 what led me to start up very cool super interesting okay and i have to ask one thing and i apologize for the surprise here because we didn't talk about this in show prep but you did um some research on the human brain uh-huh and i read that it was on coherence between hemispheres. And because we are all like super nerds on this show, can you just tell us a brief bit about that? Like what were you studying between hemispheres? That's so interesting. Oh yeah. Yeah. It was fun. So, you know, as, as a bit of backgrounds, my university, I studied electrical engineering and the focus on signal processing.
Starting point is 00:07:02 And then I studied statistics. And I really thought it was a cool way of understanding information and being able to make statements about it. I came across this researcher in the Stanford psychology department, who was trying to see if there was a different way of understanding the brain where instead of just looking at some MRI and seeing what lights up and just seeing where the blood flows if instead you said maybe the way the brain works does not increase blood flow it's from synchronizing things so while there's like two parts that are just kind of like going in different pieces whatever when they are focused they just lock and suddenly their community makes way and it was this kind of I didn't know anything about neuro, right?
Starting point is 00:07:45 I was an electronic theory student. So I, like, one of the great things about being a statistician is you get to play in everyone's backyard and understand their fields. And so this is my way of, like, learning a little bit about brain research. So it was really tough. Statistical methods to say, how do you make a hypothesis test around synchronization of neurons? But yeah, it was very cool. Like, you know, we, I was only working on it for about six months, so I can't quite tell how the research evolved over time, but it was cool to learn about the field. Fascinating. Well, I'll have to dig into that more. Okay. Let's, let's dig into data stuff. So we have tons of questions
Starting point is 00:08:23 about the technical side, but one thing I want to ask to start out. So let's go back to Airbnb. And when you joined, I'm just trying to go back in my mental history here, dusting off some pages. But back then, the A-B testing technology was pretty primitive compared to what it was today. So how did you begin to think about that? Like, did you evaluate vendors or like how did you just start out building it yourself? Yeah, at Airbnb, you know, we always had a bias to build over buy. I think you can kind of see the number of open source projects out of there. You know, one of my colleagues at Ethlo used to work at Snowflake and, you know, Snowflake
Starting point is 00:09:09 has forever been like, why are you like spending so much time and energy and stuff on rolling your own infrastructure at this stage? So in any case, Airbnb has always had its own biases. We kind of knew from the start, we're going to build our own experimentation framework. One engineer built a feature finding thing, like fairly quickly. That wasn't too much time. But then this one data scientist and an engineer decided they wanted to build the end-to-end workflow and UI. So Airbnb, the first team to run experiments
Starting point is 00:09:40 was the search engine. This was back in, I think, late 2012, 2013. They were, and this is pretty typical. Most companies, I think the first team to run an experiment is either a machine learning team or a growth team. For me, it was a machine learning team. Every time you iterate on a model, you want to see drive revenue.
Starting point is 00:10:00 You know, not just did you drive like a click, but did you drive revenue? And crucially, if you're a machine learning team, you need that evidence if you're going to like hire four more engineers, like, you know, no, and you did. So that is kind of the earliest place of investment. Once that team started showing more success, then other teams started adopting it, the growth team, the rest of the marketplaces team. And we started quickly seeing how the teams that adopted experimentation, like wholesale, like really deeply started showing clear success. One of the really formative experiences for me was this search team basically re-inflected
Starting point is 00:10:39 Airbnb's growth. Most companies, they start on this crazy rocket ship thing, Airbnb's going 3X, 3X, 3X, and then it was like 2.7X, 2.5X, whatever. This team was able start on this crazy rocket ship thing. Your music is going 3x, 3x, 3x. Then it was like 2.7x, 2.5x, whatever. This team was able to re-accelerate. Interesting. They broke the logarithmic plane. Exactly. It was clear it was this team because they ran experiments and proved it.
Starting point is 00:10:58 Interesting. It's a really amazing coalition building moment. I always say, if you're going to spread experimentation culture start with some kind of teams that are going to adopt it really readily and don't try to push it on everyone else until you've shown success interesting are there any like is there anything do you remember any of the or like one example of a test that that team ran that was kind of like a oh yeah you know this is you know huge absolutely there were that's a bunch of them okay let me let me talk through a
Starting point is 00:11:31 few one of them which i think was just a great example of how you draw that right how you imagine this all is playing out so this data scientist he he looked at the data and saw there were all these airbnb hosts who basically never accepted anyone. You know, this was back in the day before Instant Book was a really large percent of Airbnb traffic. Sure. And instead, you had to request to every host, am I allowed to stay on this date? I'm bringing my dog with me. I'm showing up late or whatever.
Starting point is 00:12:01 Is that okay? And people would say yes or no. And there were some hosts who just literally never said yes. And so this person noticed that these hosts were essentially dragging down the marketplace because there's all these people who would spend all this effort vetting and Airbnb listing, message these hosts, get a no, and be like, oh, we have a lot of folks, friends.
Starting point is 00:12:18 And so from there, ran an experiment that took the host denial rate and steadily demoted people and eventually took them off the system if they were too strong naysayers. And that was like a huge success. It moved metrics. And that was one of the earliest examples. Like this was back in 2013 or something
Starting point is 00:12:37 before a lot of other experiments had come out. And so that was one of those early examples of like, oh, I think we have something here because this sort of strategic analysis bent work, it can be hard to win over every level of hierarchy to get it done. But if you can run this experiment and show it's powerful, that'll get reinvestment. So there are examples like that. But then every company has these examples, these little changes that seem so cosmetic. There's like no way they could matter that much. And then they just blow the metrics out.
Starting point is 00:13:15 In the case of Airbnbs, this engineer ran this experiment where when you click on the Airbnb listing, it opens it in a new tab. So that was the entire change. When you click on the Airbnb listing, it opens in a new tab. And that thing was the largest metric improvement of any experiment over five years. Because it turns out, it's like very obvious about it, right? Like, yeah, you do that. You don't lose your search context. Sure.
Starting point is 00:13:37 No. Yeah. Because you want to click on it. Like, of course, we all do that, right? It's like, I want to do that, right? But man, I think boosted bookings by like two or three percent it was like a very very long and just one little change and it's exactly the sort of thing absent experimentation like there's no way people would have noticed that that was a
Starting point is 00:13:55 big deal the design team probably been like ah it looks kind of ugly i don't know like you know hesitation yeah just every company has these stories, which I always think is fascinating because it's not just random chance. There's important lessons here. Like not losing your search context matters a lot in Airbnb system. Yeah, totally. One quick question. I know two more questions. I'm monopolizing here and I know Costas has a bunch of questions. First one, it's interesting to think about the economics, right? So when you talked about, you know, hosts who never replied, like that's almost like, you know, calling a store and saying, do you have this?
Starting point is 00:14:35 And then you go to the store and it's not there, right? And so over time, it's like artificial supply, which is an economic problem. Did you have to apply a lot of economic thinking in the data science practice at Airbnb? 100%. The person who did that analysis wasn't a PhD economist. Oh, okay. 100%. So I actually think that like, you know, data science, there's a lot of skills that go into it. There's like a straight engineering piece of just how do you make reliable, robust data systems. But when you talk about the, the, the ultimate goal of data science, and this is something I always try to like kind of confirm and validate for
Starting point is 00:15:12 people is that the whole reason you start a data team is not to like have a data warehouse or the modern data stack or whatever. The whole point is to make better decisions. So you need to understand what data, what analyses, what can I do that's going to lead to better decisions? And economists, they have a lot of background sort of thing. It makes a lot of sense. Yeah, absolutely. Okay. This is going to be kind of a detailed question, but hopefully it sets the stage for Kostas to ask a bunch of technical questions as well. But one thing I'm interested in, and I'm
Starting point is 00:15:45 just thinking about our listeners who maybe are dealing with like a vendor A-B testing tool, or maybe have like built something themselves, or even just trying to think about how to process this. So you said, you know, someone built a simple feature flagging mechanism at Airbnb. So one of my questions is, and this is sort of a problem that every company faces, or at least, you know, my purview, which is limited. So maybe not every company, but okay. So you have feature flagging in the context of like testing and data science, but then you have this problem of, you kind of want that feature flag to be available in multiple places, but generally you're also
Starting point is 00:16:26 running like a separate product analytics, you know, sort of infrastructure instead of pipelines. You have growth, you have customer success, you know, there are all those components. How did you deal with that from a technical standpoint? Right. Because, you know, you hear about building your own feature flagging thing and it's like, does that actually make it harder to deal with all these other pipelines as well? it's a great call out so it touches on what i would call the modern experimentation stack is that to run experiments you basically have these pieces right you have one way which is feature flagging or animization so that's the the start of the technical stack which is arrive, they got to be put in groups. And that's actually it. That's where it ends.
Starting point is 00:17:30 And so you'll see tools like your optimizably launch darkly, which pretend that data warehouses don't exist. They just let you do feature flagging and then, okay, let the data scientists sort it out. And that's kind of what the gap we're trying to fill. So our product today actually does not have feature flagging at all, although we'll probably be building it pretty soon. Instead, what we rely on is this basic separation of where feature flagging needs analytics, which is the data warehouse. So all of these feature flagging tools, even if they don't directly give you data, it's very easy to build your own instrumentation
Starting point is 00:17:59 and get that data into your Snowflake or BigQuery or Redshift or whatever. And as tools like Redytostack show, there's this amazing new ability to get everything into the warehouse. And so it's a nice central point to operate off of for applications like ourselves. So, you know, modeling, experiment stack,
Starting point is 00:18:18 you got feature flagging, you have metrics, which inevitably are operating out of their warehouse. You have a bunch of pipelines to kind of intermingle these things, calculate your quantities, run your diagnostics, do your investigations and then reporting, which is this very public kind of cross-functionally consumed interface, which is answering, how are my experiments are doing? Yeah, totally.
Starting point is 00:18:41 All right. I'm going to be so rude to Costas and actually just continue to ask you questions because I can't stop. I can't help myself. Yes. Okay. So, yeah. So what, so this is, this is so interesting to me. So the, and I'm coming at this just so you're aware as, you know, someone who's worked in marketing and done a lot of data stuff and use a lot of AB testing tools. So it seems like the package solutions for AB testing, like their value comes a lot from basically sort of handling the statistical analysis as a service, right? Like they suck all the, like, okay, you, you, whatever you do your test, it says, okay, this is like variation one. And then it tells you like, okay, this is going to improve your conversion rate or not. Right. But the challenge has been that they keep all the data trapped inside of their like particular system, which I think inherently limits the
Starting point is 00:19:43 value of that data, because ultimately you want to actually see that data in the context of all the other data you have. Is that kind of the idea behind EPO is that you're, you know, you're sort of not, you know, creating obfuscation around the data itself and just providing like a, yes, this is statistically significant or not. There's a few pieces that go into what makes epo epo i think specifically with regards to the data where it lives what it comes from sort of thing i think our standpoint which is what you will see as a principle at airbnb or netflix or google whatever is that there should be a single definition of like what is revenue right the data teams are singularly focused on defining
Starting point is 00:20:25 that thing. What is revenue? What is a purchase? What is this subscription upgrade or whatever? And the natural home of those things is your data warehouse. So the real, you know, there's two points of failure with most of the existing systems. One is that they create their own parallel data warehouse. So suddenly they got their own idea of what revenue is, right? And it's hard to really sync it up with your own. And in addition to revenue itself, you want to split revenue by a bunch of other things, right? By marketing channel or by persona or whatever.
Starting point is 00:20:58 So that's one thing is that having an incomplete and parallel version of what data is, drives data teams insane. It's like I spent so much time trying to define this thing. And, you know, here's the system over there telling a PM that they increase revenue when the revenue does not even include this other source. Like that is inaccessible by the system. So that's one piece of it. And, you know, that probably gets exacerbated by different business models. If you have multiple points of sale, then like trying to instrument each one separately doesn't really make sense.
Starting point is 00:21:29 You have to centralize it in a data warehouse. If you use Stripe, Stripe is not a set of data that is accessible by those systems. A bunch of things like that. That's definitely a core piece of it. But the other piece is almost like a design principle around organizationally how should experiments be done because one of the things that i run into all the time is you see a lot of companies that will they'll match their organization to their tools instead of matching their tools to their organization and it's really unfortunate because it means it puts a lot of stress to higher high expertise statistician types economists and the like to actually be able to run experiments in the way that you're supposed to, to follow
Starting point is 00:22:09 statistical protocols, to have good procedures. Whereas a tool should just enable those sort of things. So the way we operate is that a lot of companies might have one, two, three, what I've called experiment specialists who have opinions around what metrics matter. Here's how they're defined. Here's what statistical regime we're going to use. And we want EPPO to let them scale out that knowledge where they can do a one-time definition, say here are the rules of engagement. And then going forward, some junior PM fresh out of college, never done this before, can just operate within the system, turn the crank and being like, look, I just operate within the system turn the crank and being like look i just by using
Starting point is 00:22:45 the system i am following all the guidelines that i'm supposed to say i have to i mean i know that i'm probably going to disappoint eric a little bit because he's expecting for me to ask something very technical probably but i want to start with something very basic and i want to ask you what is an experiment because we keep keep talking about, you know, like experimentation platforms and like all these things. But let's start from the basics. Like what is an experiment? What defines that? I love it because it's one of those things where the basic questions are the most technical, actually.
Starting point is 00:23:20 I'm going to give you my simple answer and then I'm going to give you my galaxy brain answer. So the simple answer is an experiment. It's a methodology that you probably learned about in grade school, where if you have a theory of what is driving change, you take a group of people or a group of something, you flip coins a bunch of times, flip them into multiple groups, and then you measure who did better you get one of those groups like you know i think that showing people making people do a morning walk every day will lead to lower diabetes whatever like you have one group tell them to do a walk every day and then you measure how much diabetes you got so that that's the basic methodology is you know irrespective of
Starting point is 00:24:06 what type of AV experiment, it's basically that you need to have some random way of dividing people into groups. We need some way to measure success, which are metrics. And then you can try out different ideas for what drives success. Now, my, my galaxy brain approach to this, which is, you know, is that an experiment is anything that has a kind of before after comparison group that says, like, did this group do better than that group? And what's interesting in the world today is that if you look at Nervy or Netflix, there are a bunch of products that you ship that don't lend themselves well to av experiments that let you kind of divide the world into like think of like a pricing experiment are you going to give half the people one price half for another price for a very kind of well-known product or if you're netflix and you watch stranger things like you know that's actually the most important decision netflix will make in a year is you know to get a we get an ROI on Stranger Things? And so there's a
Starting point is 00:25:05 kind of rich suite of things of causal inference methods that try to figure out, like, you know, did metrics move once you control for a bunch of other factors? And the Galaxy Brain answer is that, well, that's also kind of decision science, which fix right under. That's super interesting. And I mean, as long as I remember hearing like the term experiments, I can take at least, we are always talking about like A-B testing, right? Which as you said, as you described right now, like it's about splitting like your population into two
Starting point is 00:25:43 and run the experiment there. Is this the only way that we can do experiments? So it gets back to the simple word experiment and the galaxy brain version. To do a basic kind of A-B test, you do need some random or uncorrelated with the metric way of dividing people into groups. So the nice thing about these online platforms is that dividing people into groups randomly is actually a well-solved problem. It's actually probably the easiest part of the work flow. So if you have the ability to randomly split people into groups, it's kind of the best way to do it. Now, there's a kind of depth to this topic.
Starting point is 00:26:29 Like what happens when there are some users in the group who have a way outlier disproportionate effect on everything? You know, you can try to randomize them, but they are just going to overpower everything. How do you deal with that? There's a set of methods called stratified sampling, variance reduction techniques. There's a bunch of ways to do it, but there are ways in which the random
Starting point is 00:26:51 sampling thing can break. And again, it gets, it falls back on tools like Epo to try to make you aware of them. Okay. And you mentioned like, but this is maybe like the easiest part of like the workflow. So what is this workflow? Let's take us on a journey of working with the product itself.
Starting point is 00:27:12 How we start and what do we have to do until we get a result? Yeah, absolutely. So let me walk you through what I would call the experiment lifecycle. And then from there, I'm happy to dive into how Apple touches on all of it. The start of an experiment is to have a basic alignment. I'm like, what are we trying to do here? Like, are we trying to increase purchases? Are we trying to reduce customer service tickets? You know, what is our overall goal? Just to have some idea of like, this is where our goal is to be. And, you know the the corollary to that is that you need a metric for it right you need in some ways like what is a customer service ticket what is a purchase from there the second stage is you need to come up with hypotheses of how am i
Starting point is 00:27:56 going to drive that metric you know is it that we want to reduce complexity and reduce friction do you want to increase urgency, increase desire? Are there social proof things? You just come up with a big list of stuff, right? Of saying like, here's all my ideas of how I think we can improve things. And from there, there's a kind of basic product approach, you know, what is expected impacts for each one, what is expected complexity, et cetera. So you come up with hypotheses and you have some way of deciding which ones you want to do. From there, you have to design an experiment, and designing an experiment is both the product side
Starting point is 00:28:32 of UI, UX type of thing, but also there's a statistical piece, which is called a power analysis. So basically, to actually get signal out of this change, how many people do you need, how long do you need to wait to actually get signal out of this change, how many people do you need, how long do you need to wait to actually get it? So, you know, you need to have enough sample size,
Starting point is 00:28:51 you need to be able to get that sort of signal. So that's, I'll call, I'll call part of experiment design. From there, you have to implement it and implementing the experiment is where you touch on the feature flagging side. It's also where there's straight product buildings, like you hopefully implement it without a bug. You know, hopefully it's a community experiment. You didn't break it on iOS or something or push some important, you know, design asset below the fold. So there's implementation details there. From there, the experiment runs for a period of time and you want to observe it and make sure it's healthy. Now, this is one of those tricky things where, you know, experimentation has this central issue where from a statistical standpoint, you shouldn't peak too much.
Starting point is 00:29:32 You shouldn't stop experiment early. You shouldn't really examine it too closely until it's done. But that's just not a reality for most organizations. You know, the real political capital is being spent on this thing. You can't afford to let an unhealthy or unproductive experiment take up weeks of product time. And so figuring out how to navigate that is also something that, you know, at FO we have some opinions on the way to do it. From there, you reach an end of the experiment, you have to make a decision.
Starting point is 00:29:59 And a lot of times that decision might be simple. It's just like, did the metric go up or down? But sometimes it gets complicated. Like what happens if one metric goes up and one metric goes down? Like what happens if revenue goes up and customer support tickets goes up? What do you do there?
Starting point is 00:30:15 So kind of navigating that decision, that metric hierarchy is kind of a big thing here. And then from there, you have to make a decision, right? You know, actually go forward and say, I'm going to launch launch it i'm not going to launch it sort of thing and record that for posteriorly so there's a bunch of stuff involved here and i always think it's just very telling that the commercial tools touch on like such a limited number of it so from uh like about this experiment lifecycle, which sounds like something quite complicated, to be honest,
Starting point is 00:30:49 there are like quite a few steps there and many different steps where something can go wrong. So how much of this lifecycle you can cover right now with the product that you have? Yeah, we, right now we are, we do a lot of the post-implementation details. So you like, actually that's not true. So the things we do do are first, we let you build up this large corpus of metrics.
Starting point is 00:31:18 So that includes your important stuff like revenue purchases, et cetera. It also includes your more of a spoke stuff like widget clicks or whatever. And so all that becomes addressable in my system. And if you want to see like for revenue, like how did all my experiments do in history? Like what were the biggest revenue drivers, least revenue drivers, whatever you have those views from. We also, from there,
Starting point is 00:31:41 we help you kind of after implementation to say like, how long should this experiment be run, this power analysis thing. So we automate that, make that self-serve for you. And we automate all these diagnostics so that if there's some issue with randomization, it's not actually random, then we will alert you on that, make you aware of it. If there's a precipitous metric drop, we'll make you aware of it. If there's a precipitous metric drop, you make you aware of it. And then at the end for kind of guiding your decision, we have a sort of opinionated design interface that is meant to say like, again, if you have some junior PM fresh out of college and they need to make a decision here, how can you lead them to the right direction? You know, allow people to incorporate
Starting point is 00:32:23 their opinions and what metrics matter and what metrics are sort of explor direction. You know, allow people to incorporate their opinions and what metrics matter and what metrics are sort of exploratory. From there, all the experiments get compounded into this like knowledge base. So, you know, here's where you can look through all the past experiments, see has anyone tried this before, understand the picture of like
Starting point is 00:32:39 what have been the big drivers historically. So that's where our system touches today. I think as time goes on, we're going to be reaching further and further back into the planning cycle. So today we do power analysis once it started. Pretty soon we're going to do power analysis before it starts. So you can actually do a kind of scoring of the complexity of these experiments going in and then just kind of group experiments according to like what i've got ethics right like here's a bunch of experiments that are all related to driving search rankings
Starting point is 00:33:12 personalization policy or something so you you mentioned metrics a lot like in our conversation so far and it seems like having a good understanding and good definition that is served among all the stakeholders of what the metric is quite important for the... I mean, they're breaking the results of whatever experiment you are doing. And I can't stop thinking of all the different places that we keep metrics in organization. We keep talking a lot about data silos, but there are also like metric silos, right? We don't really talk about that, but it's really easy to create the same metric, to recreate the same metric in many different places with even like slightly different semantics, but this might make like a big difference. So how do you deal with this problem and
Starting point is 00:34:08 how do you relate with this whole movement that we hear a lot lately about metric layers, metric repositories, and how your product works with that? Yeah, absolutely. So I have been very heartened to see that all these metric layers and metric groups have been taken off, right? Because I'm obviously a big believer in it. And experimentation systems get a lot more powerful when companies have a clear definition of metrics. So, you know, I see our system integrating well with those metric layers. You know, we are one of many downstream processes that should operate off the single definition source. So, you know, there's a little bit of let's see which ones catch on and, you know, what the right
Starting point is 00:34:51 integrations to do are, but it's definitely in our interest to play well with them. In terms of how we deal with them, I think there's two things. So one is experimentation as a practice gets more powerful by the diversity of your metrics. And to give an example of that, suppose I'll tell you a story of an Airbnb. There was this, there were these two teams. One team was focused on driving this instant book feature. So I've mentioned to you before, it's very annoying to have folks have to approve you. So it's great to, if a host just says, look, I'm just going to accept everyone. And that became a strategic thing that we're trying to improve. So it started out, there was like what, like two or 3% of posts who had it.
Starting point is 00:35:34 And today it's like 80, 85%, like a really, really huge change. Yup. And so there was one team that's just running a bunch of experiments against that. Simultaneously, there was another team which was trying to make it so that when you use Airbnb you sign up much earlier on than you currently do so Airbnb is an app where you can get all the way to checkout page and never create a user account right and so they were experimenting with various ways of incentivizing people to create user account early on and different parts of the building you know teams that might have
Starting point is 00:36:04 hung out socially but not really sharing road maps and stuff like that too much it turns out that experiments that drive sign up rates will actually have a crazy effect on driving up instant book rates because this instant book feature was gated behind sign up so it's the exact example of where things, like it's a limited surface area. These people's metrics have interactions between them. So if you have some ability to say, like, I am the business travel team,
Starting point is 00:36:37 all I care about is Airbnb business travel. I just want to see how every experiment affects it. Like these sort of views become super important. So from our standpoint, we're happy that there's been a standardization of metrics. Philosophically, we are 100% in the direction of saying companies should have a single source of truth around them. We're aiming to build off those systems. Experimentation is a very metrics-hungry.
Starting point is 00:37:00 Makes a lot of sense. And OK, experimentation platforms have been around for quite a while. It's not something necessarily new. Why do we need to innovate today and bring something new in the market? What's the reason behind that? Yeah, so I think there's a few answers here. So one is that experimentation has existed in this feature-centric world for much longer than what you might see at an Airbnb or Microsoft back in the day.
Starting point is 00:37:34 So I think that the scope of what experimentation includes has widened, where you now include core business metrics. It's no longer a CRO consultant who's trying to drive signups off your marketing page. It's now, no, you're trying to drive core OKRs at the company by an experimentation strategy. But part of the reason that that has been enabled is the rise of cloud infrastructure, where suddenly a lot more people are working off these kind of common set of tools that make it very easy to do
Starting point is 00:38:05 these complex workflows you know i think of optimizely as a company who like they might have even been a little bit ahead of its time you know 2008 when they started like when we i could not build epo in 2008 because instead of integrating with snowflake redshift and BigQuery or whatever, I would have to integrate with like, but Teradata and hive clusters and, you know, pig clusters, whatever, all sorts of like SAS or something, you know, it would have been a much tougher argument to say, how do we integrate with these databases? So the entire analytics side was just really hard before in a way that is now become possible to deal with. Also, now most companies are operating off of AWS or GCP or something like that. And so to have a sort of cloud infra place where you can kind of quickly turn experiments on
Starting point is 00:39:02 and off, put them up and down, have this very iterative process, this continuous integration environment is now just much more common than they used to be. Makes sense. And additionally, the first thing that comes in mind when someone has heard of words with experimentation platforms before is product. it's a tool that is heavily used by product teams. Where else can we use experimentation? Or do you think that it's like a tool that is reserved only for product managers? Yeah, it's,
Starting point is 00:39:36 I think experimentation today probably has two big homes. One is on product development, which is, and I think that's a much more expansive definition than just growth and ML teams. I'm talking about literally changes through code, you know, as an easy way to do it. The other big place I've seen a lot of experimentation is marketing. You know, if you think of ad campaigns and some of the management of growth marketing, that's another place that's very experiment heavy. So those are probably the two biggest buckets. I think, you know, where else to grow from there?
Starting point is 00:40:13 Experimenting on kind of operational teams is something you're starting to see more companies dabble with, such as like a sales team or a customer service team. Like if you think of like UPS, you know, they can experiment on their fleet of drivers or whatever so that's kind of an emerging area from my standpoint the the product side of things and the growth marketing side of things are just these like hugely growing industries now that we have product like growth you know we have more bottoms up self-serve motions etc that make it just really attractive yeah and like as like as a person who has worked in product,
Starting point is 00:40:48 I always thought of, and by the way, my background is mainly B2B products, right? So when the first time that I tried like to use in a B2B product, like experimentation platform, like I felt miserable, to be honest. And like I developed this kind of way of thinking that experimentation platforms are mainly for B2C companies because you need to have volume there
Starting point is 00:41:13 that will drive these experiments. Is this true? You know, the thing with sample size and experimentation is that it's really around what sort of effect size they're trying to get. So if you are running experiments when you don't have too much sample size, like there's still value in just preventing horrendous bugs, right? There's sort of a, look, I just want to make sure my, I know if a metric dropped like 10%, you know, if there's some like very major issue. So when you look at B2B enterprise companies, you see experimentation play out much more as hygiene, as like we just need to make sure that everything is healthy or that if someone has outlier success, we know about it.
Starting point is 00:41:56 But I would say that the business models that are much more levered on experimentation comprehensively beyond that are these consumer, pro-consumer companies. Basically, you arrive at a website and just sign up and purchase. So, you know, today, you know, every startup has to have a kind of focus to start with. We like to focus a lot on these consumer prosumer companies. Yeah.
Starting point is 00:42:17 Yeah. Makes a little sense. And let's say I'm a founder and like I've started like building my company right and my product and I'm looking like for product market fit okay where should I introduce an experimentation platform is this something that I should be using before product market fit after is it something that can help me like find product market markets faster? You're also a founder. So what's your opinion on that? Yeah, I mean, my opinion is that basically once you have the sample size,
Starting point is 00:42:55 the ROI in experimentation is so clear that you should really be doing it. It's basically saying, do you want to measure product changes or not? Because that's the basic answer. And the answer is clearly yes. So now it happens to be the case that to have sample size that let you run experiments really well probably means you have some amount of product market fit but i i think the main thing is just sample size it's like can you actually run an experiment at all because once you can it's just really clear you should so can you give us like a like i mean some kind of sense around what this sample size should be?
Starting point is 00:43:27 Yeah, what you should think about is what is the most common behavior that you care a lot about? So maybe you don't care too much about signups, right? But maybe you do if you're saying Webflow, maybe you care about people publishing sites or something like that you know that's not exactly a subscription it's not your north star revenue-based metric but it is something that through all of webflow's history they have noticed driving publishes is is a powerful thing so what you might want to do is to say okay i have this many users on webflow i have this many signups every day of those people here's how many are publishing from there i can plug it into these you know
Starting point is 00:44:11 their online calculators for power analysis we will be building our own there are different ways to conceive a bit and then from there it's just saying like you know what is your comfort level with running an experiment for three months or two months or one month or two weeks or whatever? And once you have an answer to that, that feels comfortable, which is that you're not going to lose a lot of product time, product speed, by waiting and being blocked on this experiment for this amount of time, then you should do it. You should absolutely do it. Yeah. One last question, and then I'll let Eric ask any questions he has.
Starting point is 00:44:45 I think it's clear that having a team or a person in the company that knows about statistics is very useful when you operate these tools. But that's not always the case, right? Especially when we are talking about younger companies. I can't imagine many companies that when they start, they have like a data scientist there, unless the product is very related to data science. What you would recommend to a founder or to a product manager that doesn't have access to any resources,
Starting point is 00:45:17 statistical resources, how they should educate themselves in order to maximize the value that they can get from these tools. For example, you are talking about power analysis, right? What can I learn about power analysis? How I can do this thing? Yeah, absolutely. It's funny you mentioned that because one of the things we're going to be working on is a modern experimentation guide. One of the things that's sort of tough is that there's a lot of content on the web that's all fragmented everywhere right yeah so it would be nice to kind of compile that the i think there are let's say two or three resources i would recommend if you're a product manager and you're trying to get educated in experimentation so the first i think the reforge program is great a little bit so it's basically a product
Starting point is 00:46:03 management mba for product managers is i think how they self-describe themselves. It naturally tends to lean a lot more on quantitative methods and experimentation than many other places. So I think that's a really great thing. I think it's great use of your learning budget. Another great use of your corporate learning budget, if you have it, is Lenny's newsletter. So Lenny Richesky, the Army alum of ours and also an investor. He has that Slack group that related to the newsletter is really, really informative. You know, I've personally learned a lot from it and there's a lot of experimentation talk. I like to contribute there. So that'd be another great resource. And then the third is probably the
Starting point is 00:46:41 closest thing you can call to an experimentation Bible is the book by Ronnie Kouhavi, online controlled experiments. It was Ronnie Kouhavi, for those who don't know, was a, was one of a pioneering team of experiment scientists at Microsoft. So Microsoft, especially back in the day, really pushed the edge, pushed the frontier on what was possible with these online platforms. Ronnie Kohavi is probably the one who did the most evangelizing of that in his book and his talks and stuff like that. And so that's a great resource to read through. Very readable.
Starting point is 00:47:17 And I believe now he actually even has an online course. So that might also be a great venue. And of course, for any of your listeners, I love chatting with people. So if you want to just email me or DM me on Twitter, I'd love to chat through whatever experimentation topics you have on your mind. Cool. Super helpful. Okay.
Starting point is 00:47:36 One more question, Che. And hopefully this doesn't push us over time, but we've learned so much about sort of testing frameworks and methodologies. You've seen both sides, the infrastructure side and the testing side. And so I'm thinking about our listeners who, you know, are maybe a data engineer who isn't as close to the data science team, you know, or machine learning side. Could you just help us understand how should data engineers who aren't as close to that work with data scientists? Like, what are your thoughts there? What do you have that could be helpful for
Starting point is 00:48:10 those listeners? Yeah, absolutely. And so I think the starting place to establish is to just say that part of the reason I started an experimentation company is that experimentation just unlocks so many of the things that a data team wants to do. You know, data teams, again, they exist to drive good decision-making with a value system that tends to be more metric-based in database than other teams. And experimentation is just an incredible cultural export of those values to the organization. It really just helps people engage with data in a much more meaningful way that does not require nearly as much cognitive load to build it into your
Starting point is 00:48:52 decision-making process. So I think a starting point is to just say, like, if you're a data engineer, your work will be a lot more impactful in the organization if you have an experimentation program. So that's a starting point. In terms of how do you enable it, I think if you have an experimentation program. So that's the starting point. In terms of how do you enable it, I think if you are a data engineer, it's very similar to what you might see in most other data engineering practices. The things you can do is provide great definitions of metrics for things that drive the business.
Starting point is 00:49:24 Candidates for things where if this goes up, we're all in great shape as a business. And then to create great APIs to them. So is it very easy for data scientists or PMs to utilize those metrics? Easy to plug them into tools like Epo? Do you have great ways for people to build an affinity and understand what drives them?
Starting point is 00:49:43 I think that those same topics that probably apply to pretty much everything else a data engineer does, you know, apply heavily to experimentation. Okay, super helpful. I also think that the accessibility piece for data engineers is really helpful, right? Because it's hard actually just to collect the data a lot of times, right. And, and get everything in one place. Do you have a story that sticks out to you about, you know, a context of a data team where, you know, maybe they were sort of behind the scenes and then experimentation or something happened where all of a sudden it was like, wow, you know, like you're our best friends now. Oh yeah, absolutely. Experimentation tends to really, you know, create new relationships in the org. In terms of where the work just became a lot more visible, you know, at Webflow, we
Starting point is 00:50:32 spent a lot of time trying to quantify when has someone onboarded, when has someone activated to the system. And if anyone has used the product, it's quite complex, right? It looks a lot like Photoshop. It's got buttons everywhere. It's brimming with power, but you might have to take a Coursera class or something to learn how to use it. So the learning curve is like this big issue. And we have significant data resources going towards trying to find levers to improve it. Now, the thing is that you can have a lot of different theories around how to improve
Starting point is 00:51:08 activation, but it really helps if you can just show that your theories are correct because you built an experiment against it and you drove the method. So there was this growth team that, you know, spent a lot of time trying to craft activation metrics and levers, but once they're running experiments, you know, the whole product org was aware of you know the experiments that were being run and how they were trending we started seeing more product teams want to run experiments themselves as a result which is great i always say one of the cool the blessings and the curse of experimentation is that it's a it's a very public process it draws a lot of eyeballs it's like it has very much of of that man in the arena quality
Starting point is 00:51:47 where this team is going to go and they're going to expose themselves to whether they were successful or not in a way that most product teams don't expose themselves. So it's great. The right type of product team lives off that stuff. And then if you're a data engineer or data practice, you've got to feed good data into that process. You know, your tables have to be
Starting point is 00:52:08 pristine. You have to arrive on time, you know, meeting an SLA on a data engineering pipeline becomes much more critical for an experiment. And, you know, that I think it naturally leads to more resources in that area. Absolutely. Okay. One last thing. If someone's interested in looking at Epo, where should they go? Just go to getepo.com, www.getepo.com. That has details about a product and also has a link to reach out. I'm also, like I said, I love chatting with people, whether on Twitter, Slack, LinkedIn, or whatever.
Starting point is 00:52:39 So you can reach out to me, Chetan Sharma, on any of those mediums. I'd love to get in touch. I love talking to anyone who's interested in experimentation, no matter what the maturity stage or readiness for a product like ours is. So I would love to chat with whoever. Awesome. Well, Chet, thank you so much for the time. We learned so much and just really appreciate you taking the time to chat with us. Absolutely.
Starting point is 00:53:01 It's been a pleasure. You know, I'm just constantly struck. I think every single show, I just am amazed by how smart the people that we get to talk to are and what they've done. And Che, of course, is no different. Studying synchronization between brain hemispheres, you know, and then building a statistics practice inside of Airbnb. Pretty amazing. I think, you know, here's my takeaway and this isn't, this isn't super intellectual, but it's enjoyable, hopefully. I really appreciate that, even though it's clear that Che is bringing a huge amount of sort of knowledge and experience into building this technology that does some pretty complex
Starting point is 00:53:46 things, especially on the statistics side. He acknowledged that, you know what, it's like the small, dumb things that can make the biggest difference in this world, right? Like opening a link in a new tab. And it was funny just hearing him talk about that and seeing him smile because, you know, that's like as simple as it gets. But it was the winningest experiment, you know, over a five year period at Airbnb. So I just really appreciate that. Like we can throw as much, you know, math and technology at this as we want. And sometimes it's just a really simple idea that wins the day.
Starting point is 00:54:21 Yeah, yeah, 100%. I think one of the misconceptions around experimentation is that the experimentation process is going to tell you what to do, which is not the case. You have to come up with a hypothesis. You have to come up with what matters and why. The experimentation platform and methodology is there to support you in the decision that you are going to make and that's what we have to keep in mind and that's how i think we should be using these tools like as another tool that can help us make the right decision at the end and one of the things that he said that I think is super important is that these platforms
Starting point is 00:55:06 and these methodologies also provide the credibility that is needed in order to communicate more effectively whatever decisions you propose to the rest of the stakeholders. So that's what I keep from the conversation today. I think it was a way to, let's say, give a very realistic description of what an experimentation platform is, what we can achieve with that, and what to expect from them. I agree. Well, thank you for joining us and learning about A-B testing today. Lots of great shows coming up, and we'll catch you on the next one. We hope you enjoyed this episode of the Datastack Show.
Starting point is 00:55:45 Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, Eric Dodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by rudder stack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudder stack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.