Microsoft Research Podcast - 122 - Econ2: Causal machine learning, data interpretability, and online platform markets featuring Hunt Allcott and Greg Lewis

Starting point is 00:00:01 I like causal ML software because it gives researchers fewer and fewer choices. I think one of the downfalls of academic research for many reasons is that researchers have to make a lot of individual decisions. And sometimes for many reasons, sort of poor training or p-hacking, or just that the problem is very hard, they just make bad choices. Part of what's appealing about machine learning as a general idea is that it's an algorithm. It's these are inputs, these are outputs. There are no choices in between.

Starting point is 00:00:29 It just does a set of pre-programmed instructions. And it's an end-to-end process. In the case of what economists often do in practice, sometimes it's just ordinary least squares and then it's an algorithm. But even then it's often I'll run it once and then I'll add these variables, delete these variables,

Starting point is 00:00:43 then I'll run it again, then I'll run it again. and I'll run it again. Now it looks like it's significant, so maybe that's the table I want to go with in practice. And I think that there's something very nice about having tools that say, yeah, I decided to let this machine go and figure out what the right model was, and I cross-validated, and that's how I picked my high parameters, and we're done. Welcome to the Microsoft Research Podcast, where you get a front row seat to

Starting point is 00:01:05 cutting edge conversations. I'm Hunt Alcott, an economist at Microsoft Research in Cambridge, Massachusetts. And I'll be your host today as we speak with Greg Lewis, my colleague at Microsoft Research. Greg is an economist and an expert on two important trends in the online economy. First, with the big data revolution, there's now much more opportunity for businesses and governments and other groups to target interventions at people who would benefit the most. For example, advertisers targeting ads, health systems targeting medical treatments, schools offering tutoring, etc.

Starting point is 00:01:46 The second key trend is the rise of online platforms, such as Amazon and Uber, that facilitate transactions between independent buyers and sellers. In this new business model, there are now important questions about how different online market designs and data sharing approaches benefit or harm sellers and consumers. In addition to being an expert on these two trends, Greg is my next door office neighbor in our offices in Cambridge, Massachusetts. Greg Lewis, welcome to the Microsoft Research Podcast. Why, thank you, Hunt. I'm so glad we're not recording this in the office,

Starting point is 00:02:25 otherwise we'd be disturbing each other. Indeed, indeed. So before we get to economics or the economics that you study in your current research, I want to start with your background. You grew up in South Africa and you majored in economics and statistics. And what made you majored in economics and statistics.

Starting point is 00:02:47 And what made you want to study economics? Well, it's actually a pretty circuitous route. I started off studying to become an actuary. In South Africa, smart kids do one of two things. They either go to medical school or they become actuaries. So I started down that road and it was quite boring, ultimately. A lot of it is about death, about when people are going to die, and how much you should charge them for insurance. No, it's not a lot of fun. And so at some point of my, I guess, my junior year of college, I decided that although I'd been taking economics along the way as part of my degree, I

Starting point is 00:03:22 should, in fact, change that to be my major. And I kept statistics because that was part of actuarial science. So yeah, I became an economist because I was turned off by death. That's great. And of course, ironic because they call us the dismal science. It is indeed. So you're coming out of your undergrad with these skills and trying to avoid death. And you could have worked in a lot of different areas. You could have worked for a bank or a consulting company or done any number of things. So how did you end up in a PhD program? I think a couple of things. One was peer pressure. A lot of my friends decided to come do PhDs in the States, computer science and physics mainly, but that was one thing. Is that what you wrote on your application essay?

Starting point is 00:04:03 No, no, no. Nobody writes that on their application essay. Actually, I wrote on my application essay that I wanted to work with your advisor, one of your advisors with Sandal. Anyway, but yeah, no, I think what got me excited, I was always excited about modeling. Even when I was a kid, I used to design casino games to use statistics to cheat my brother out of money. And so I was always that kind of like math kid who was like just interested in modeling. When I discovered that you could connect the real world and large economic systems with modeling, I thought, oh, this is the field for me.

Starting point is 00:04:34 This is great. And I got excited about studying that at a higher level than I had in undergrad. So you end up at the University of Michigan. How did you end up specifically doing what we call industrial organization? Yeah. The more you make me think about it, it all feels so very accidental. So I started off as a theorist. I was jazzed on theory and I started working on a paper, which eventually got published like a decade later. But in my second year of grad school,

Starting point is 00:05:02 I worked on a paper on college admissions and a game theoretic model of college admissions. And my advisor at the time, Melona Smith, who's a well-known game theorist, asked me to prove a theorem and I just couldn't. And at some point I got really frustrated and I thought maybe I should do something with data instead. And the adjacent field to theory with data is industrial organization because it uses a lot of theory. And then sometime later I worked at the theorem I was asked to prove was false, but that was like three years later. And then we proved it was false and that was advantageous and eventually got published. So that was part of it. And I had really good advisors. I got to work with Jim Levinson, who's a giant industrial organization, with Pat Byrie, who's

Starting point is 00:05:40 similarly a giant. And so that was part of why I stuck with IOs. I had really good opportunities working with those folks. So then you graduate and you had to be an assistant professor at Harvard. And that's where your two current interests really start to solidify. And I want to start off by talking about causal machine learning. So what is causal machine learning? So part of it is machine learning, which is, I'd say, the use of sort of standardized algorithms to make sense of data, and in particular, to mostly do prediction. So to figure out, is this a cat or is this a dog, typically on the internet, and more generally to do many, many things that are very important. So machine translation, for example, figuring out which word comes next in a sentence. Then there's a separate sort of topic, which we're very concerned with in economics,

Starting point is 00:06:32 which is causality. If I think about a job training program, does that really end up helping workers or is it actually not a very cost-effective way of training our labor force? Causal machine learning is bringing those two ideas together, is to say, can we use ideas from machine learning in terms of algorithms and making sense of data, especially large-scale data, along with ideas from causality to start automating processes by which we can learn about causal relationships and data.

Starting point is 00:06:59 And so another way that I've often thought about causal machine learning is just that it is a more flexible way of estimating heterogeneous treatment effects. So when I say heterogeneous treatment effects, of course, what I mean is in your job training example, job training programs might be especially useful for men, or for young men, or maybe for old women, or maybe for people with two kids, but not people with three kids. And you could come up with any number of other examples of heterogeneity in your favorite causal inference application. And so there are these traditional set of techniques

Starting point is 00:07:41 where you might say, well, let's just try to estimate the treatment effects separately for men versus women and see what the differences are. But that requires the researcher to, of course, come in with some prior that men and women might be different and then kind of manually do those separate estimates. And so I guess another way that one might think about causal ML is that it provides a set of tools that allows us to more flexibly understand the richness in the differences or heterogeneity in the treatment effects of a job training program or better schools or a medical intervention across a wide variety of people. Is that another way of saying it in your view? I think that's a super narrow way of saying it. So I think, yeah, I think that that comes from a very particular kind of economic tradition of thinking about what we can do with machine learning. I think folks in the machine learning community would say, like, let's think about,

Starting point is 00:08:38 you know, Bayesian causal networks and start asking, like, which relationships do I think I can discover in the data automatically? I think econ is very focused on policy. When you come from econ, as we both did, you often immediately gravitate towards these questions like, will this work differently for different people? But I think that the statistical richness of machine learning is much larger, and therefore the things you can do in the causal space are much larger. And even that's true in my own work.

Starting point is 00:09:03 So I work on some heterogeneous treatment effect estimation, but I also work on sort of flexible methods for instrumental variables. And there are policy optimization questions that are robust optimization questions. There are all these questions that are kind of in this general field. What are the main applications of causal machine learning? I wonder if in light of our conversation here, if you can break those down into the traditional, what I had called heterogeneous treatment effect estimation, looking at a policy intervention and looking at impacts on different people. Maybe you could start with one of those applications and then move into something else that you think illustrates these broader applications you're thinking of?

Starting point is 00:09:45 Great. So one kind of major set of applications is exactly this sort of context-dependent treatment or heterogeneous treatment effects. And the easiest example to think of there is not one I personally worked on, but one that's sort of a natural application is in drugs. You can imagine that as we get more and more genetic data, we're going to want to tailor the drugs we administer for different medical conditions to the genetic makeup of the individual person. So you could run experiments to go ahead and figure out which drugs work on average, and already we do that. That's sort of the gold standard for causal inference. But those techniques typically won't tell you very much about which drugs work well in which circumstances, especially when you're trying to figure it out on the basis of somebody's genetic code, which think of it as just this giant

Starting point is 00:10:34 string of variables, right? And so you have no sense. I mean, obviously, experts in the bio community will have some priors about which parts of the genome matter, right? But it's not super easy to figure that out. And so you'd like the computer to do the work for you. And so one sort of major set of applications is around trying to figure out which modifiers are statistically, you know, you can confirm are statistically modifying the treatment effect. This drug works for these people who have this set of markers, but otherwise it doesn't work. You'll see this even in the popular press talking about COVID. I don't know this research personally, but every self, you'll read about this and you'll go, yeah, this seems to work for most people, but there are this rare group of people for which it

Starting point is 00:11:16 just doesn't seem to be working very well. And maybe even here's why. This is what we can point to in the data that shows us that this is the marker of somebody who's not going to respond well to this. So those are some applications that resonate within the definition I set up of estimating differences in treatment effects across people where you have different characteristics of the person. But then you were also alluding to more highbrow or novel applications. And can you give an example of one of those? Yeah, sure.

Starting point is 00:11:44 So the ones that I've been thinking of are nonlinearities. So you're trying to find causal effects, but the effects are not just do I turn something on or off, which is what we think of in the treatment effect paradigm often. But what if I give you 10 versus 20 versus 30? So you could think about dosage would be an example of this. And you might think about nonlinear effects in dosage, and you want to understand what those effects look like, right? So that would be a second kind of application is trying to figure

Starting point is 00:12:12 out, okay, what's the dosage curve look like? A third kind of application, which is more highbrow and is not my field of expertise, but I think is important to think about, is actually trying to figure out what the causal structure of a complex system is. So you now have many variables moving around, and you're trying to understand which ones are related to which other ones, and in what order. Does X cause Y, or does Y cause X, and how's that related to Z? And so now, instead of what we typically think about in economics, which is this paradigm in which there is an outcome variable, the thing I care about, do they recover from the disease or not? There's a treatment variable, that's the drug. And then there are these control or modifier variables like your genetic history or family background or your age or gender or whatever. Now let's just think about putting

Starting point is 00:12:58 those all into a pot and saying, actually, I'm not really going to tell you in advance what I care about and how these things are related. I'm just interested in all the relationships between these things. And that's a much, much harder problem for obvious reasons. But you can imagine there are many biological systems in which you'd like to be able to figure that out if you could, because it's not obvious where to start or what the right levers to pull are. And so now I feel like it's quite clear that this is not the same as we had a job training program and we want to know the average effect of that job training program on the average person. So this is really exciting stuff.

Starting point is 00:13:33 Can you tell us about a couple of applications that you're working on right now? Sure. So one of the recent things we've been working on has been thinking a little bit about internal investment decisions being made by a large company. And in that case, it really becomes a matching problem that we were looking at. So you can imagine throughout the sales process that you have many different kinds of ways you could interact with your customers to make the experience better. So for example, you might assign additional salespeople, you might assign technical support,

Starting point is 00:14:08 or you may make direct investments to your customers in various ways. And for some customers, different kinds of investments may be more or less appropriate. And so you're trying to figure out what's the right matching. And in economics, the way we think about that is, okay, well, which of these investments is going to give me the highest ROI on which kind of customer? If I assign technical support to a company, are they is going to give me the highest ROI on which kind of customer? If I assign technical support to a company, are they subsequently going to be able to grow the technical component of their business and be able to come back to us and buy more stuff from us in the future? And you can imagine that the same solution isn't right for everybody. People in different industries, companies of different sizes or parts of their company

Starting point is 00:14:43 lifecycle may want different kinds of tools. They may have different needs. And so trying to figure that out is very much a causal inference problem and one that uses a sort of causal machine learning. I'm curious to hear a little bit more about the mechanics of what you're doing. I can imagine a data structure where you have data on a bunch of customers. And then you saw that some of the customers got tech support, and some of them got additional sales effort, and some of them got a third thing. And you could correlate these different actions that the company took with how profitable the account was, whether the client stayed with your large company, etc.

Starting point is 00:15:26 But of course, that correlation isn't what you want. So can you tell us a little bit more about how you turn this into a causal inference problem? Yeah, right. So that's a great question. And of course, the best answer would be to say, I ran an experiment, but I didn't. And the reason I didn't is the reason most businesses don't. Nobody goes ahead and starts thinking, well, we'll randomize some large investments. And so you're in the imperfect world of machine learning or causal machine learning without this nice experiment. And so what's the best you can do? The best you can do is try and find a control group that's sort of relevant, right? So if I have some group of people that are getting investment A, some group of people are getting investment B, some group of people are getting investment C, and some group of people are getting nothing, can I find people in group D, the control group who's getting nothing, that look like the people in A and B and C?

Starting point is 00:16:17 And that's a procedure that's been around for a long time. You're super familiar with it, the sort of matching estimators that we see in econometrics. But what if I know a lot about my customers? What if I have many, many, many variables, right? That gets us into some of the way the machine learning approaches work well, and in particular, the so-called double machine learning approaches. What do they look like? You're trying to find something that looks like an experiment. So what you can do is you can take the set of people who get each of these different kinds of things, you can say, okay, what part of that is unpredictable?

Starting point is 00:16:49 Let me find people, companies that were surprisingly likely or surprisingly unlikely, given everything we know about them, to get a particular intervention. So company A got a lot of technical support, but they're surprising. It's surprising they got a lot of technical support, but they're surprising. It's surprising they got a lot of technical support. And company B, who I might have thought got technical support, didn't. Okay. So now I have these companies that look like I'm basically doing a coin flip. Beforehand, I really had very little idea.

Starting point is 00:17:21 They looked to me like, you know, it's hard to tell whether it's A or B who's going to get the tech support. And then one of them does get the tech support. That kind of looks like an experiment. I didn't literally flip a coin, but as far as I can tell from everything I knew about these companies, it looks to me like we pretty much flipped a coin. And then I can use that as if it were an experiment. When you work in academia, that feels like a terrible approach for causal inference, because you think to yourself, well, if it looks like a coin flip, but actually they decided to pick something,

Starting point is 00:17:50 there was a reason they picked something and that's going to mess up all your estimates. But once you're inside a company, you realize a lot more things look a whole lot more like coin flips than you thought they did. And so it's maybe not as worrying as you thought. Yeah, I see. That's very interesting. So can you talk about the impact that your work has had? The way that this sort of work plays out in large companies is in many ways. One is it's informative a little bit as to this thing that we started off studying, which is which kinds of customers should get which kinds of investments. And I think that's informative to people. It's not the only

Starting point is 00:18:28 source of information they have. And so they weighed it against a bunch of other factors in thinking about how to make future sales investments. And also, just the raw numbers, this particular kind of investment does very well with this kind of customer might change the way you budget. So you might say, okay, I want to spend more money on this kind of investment in the future. But then it also has much more sort of what I'd say micro implications in a large organization, which is that there are teams that are not making budgeting decisions or big strategic decisions that are just thinking, okay, what do I want to give my customer next? What's the next thing I want to do with my customer?

Starting point is 00:19:04 There seems to be the study that's saying that often when I assign tech support to a customer like that, it goes pretty well. And so maybe if my customer has recently gone from small to medium, they've grown their business a lot, suddenly, according to the study, it looks like, oh, now the next logical sequence of actions for me to take is to change my kinds of interactions with that customer to a different kind of interaction. And this sort of customer lifecycle journey is very much a kind of application for these sorts of tools. It's trying to figure out what the next logical action is for each customer. I see. This is super interesting. So I feel like I'm seeing a lot of applications these days of this type of flexible

Starting point is 00:19:48 approach to estimating heterogeneous treatment effects or causal machine learning that in practice aren't delivering much value. I'll give you a couple of reasons in the applications I'm seeing. So one is that at least in many academic studies, you don't have big data, you have small data. So I've only got 2000 observations in my experiment. And so I can be flexible with my estimation of treatment effects, but I just don't have enough data to know reliably what heterogeneity there is.

Starting point is 00:20:21 The second is bad potential moderators. So it could be that if we knew a lot about people, we could predict how they would respond differently to some drug or a job training program or whatever. But in practice, we just don't know very much about people or we may know a lot, but the stuff we know may not be relevant to predicting differences in treatment effects. And the third reason I feel like I'm seeing these approaches not be relevant to predicting differences in treatment effects. And the third reason I feel like I'm seeing these approaches not live up to the excitement that I had is that the world is often pretty smooth or pretty linear. And so doing fancy non-parametric stuff where I can learn about, you know, the effects on women from Indiana who are age 34 just isn't as useful as just knowing

Starting point is 00:21:07 that the effects are different for women or the effects tend to be different for older people. So do you share that big picture assessment? And are there specific examples where you think there's been really high value added that you can point to? Yeah. So the reasoning you offer is pretty sound. What I'd say is that in many applications, certainly corporate applications, applications with large data sets, you often can find heterogeneity because you just have such large feature sets, histories of customer engagements. I'm thinking of Bing, for example, where we have a lot of data and we can think about what kinds of ads there

Starting point is 00:21:45 might be value seeing. And because there's a very large customer base, you can cut it many ways. And also in these corporate applications, you're less interested in interpretability. You are interested in statistical significance. So you don't want to incorrectly infer that women in Indiana really want to see a particular kind of ad and then show them this ad when in fact that's not true at all. So really want to see a particular kind of ad, and then show them this ad when in fact that's not true at all. So you want to get that right. Statistical significance matters, but interpretability doesn't in some ways.

Starting point is 00:22:19 Whereas in academic applications, you often really want to be able to tell a story. Some of the business of academia is narrative construction, right? And so you want a compelling narrative. In automated applications, actually what you care about, say ads, is lift. Do I manage to lift the probability of somebody clicking? And so in fact, I don't actually care why this happened. I'm never going to investigate why this happened. All I want is a system that can reliably deliver higher lift. And if I have enough scale in terms of data, I'll be able to figure out how to do that with causal machine learning. And if I'm running large enough experiments of data, I'll be able to figure out how to do that with causal machine learning. And if I'm running large enough experiments.

Starting point is 00:22:47 So I think there are cases in industry where it's pretty compelling. In academia, my sense is that often the returns are low, but I also think the cost is being very low. What I'm hoping for as this group that I co-lead at Microsoft, this ALICE group that works in software for estimating heterogeneous treatment effects, EconML, I'm hoping that you'll just be able to deploy that, and it'll be easy, and it'll give all the results that academics want. And if the result is, man, it's pretty much linear, that's fine. As long as the times when it's nonlinear, or the times where it does happen to be an interesting heterogeneous effect, that's statistically well isolated,

Starting point is 00:23:22 you can say, oh, in this particular application, actually, we can show you something different. And it didn't cost us very much more time to do that. Yeah. And I certainly, as someone who is a natural user of the software you're developing, I really appreciate all the work you all are doing on the hall here at Microsoft Research New England. I actually want to follow up on something you said, which is, I'd like to ask about interpretability. So you mentioned that part of academia is storytelling. And I think that's totally right. I would have said it just, we're doing work that we want to make generally interesting and generally useful. And so nobody cares about 32 year old women in Indiana and how they responded to my being ads treatment. But they might care to have some insight in general about gender differences or age differences or whatever.

Starting point is 00:24:17 And so I think that's how I think about the storytelling and the value of storytelling. So what are the improvements that you're seeing in our ability to make machine learning results more interpretable? So I think you were very early there. And I actually think that's sort of one of the weaknesses of this area. One of the things that we've worked on is tree-based interpreters. By that I mean, let's take this complicated set of machine learning results, which actually may say very specific things. They may say that this particular user in Indiana, we think that they're really going to like these ads.

Starting point is 00:25:00 Let's reduce that to a much simpler story by saying that all I really want to do is divide my data into four groups. People who respond a lot, people who respond a little, people who don't respond, people who actually respond negatively. What is a very compact way of expressing that? And it may be as simple as saying, oh, it's people who are young and male who are in category one. And category two is a different destroyed set. That's something that we've worked on a little bit and other people have in category one. And category two is a different destroyed set, right? That's something that we've worked on a little bit and other people have proposed as well. But I think there's a lot more to do here because, as you say, nobody cares about the specific thing.

Starting point is 00:25:37 People care about general patterns. And that's really, in some ways, another machine learning problem. This is kind of a bit of a tangent, but it's kind of interesting. I once had a research assistant at Harvard. He was an undergrad. And he was an undergrad who was previously an undergrad in biology, and he switched to economics. And I asked him to study something. And he came back with 100 pages of regression results, which was incredibly impressive.

Starting point is 00:26:02 I was like, wow, this guy has already thought about this. And he had cut the data like a thousand different ways. And his name is Chris Sullivan, and he's now a professor of economics at Wisconsin. So it worked out well for him. But then I said, I need a regression to summarize these. I don't have, I need some other technique to make sense of all this data you've just thrown at me.

Starting point is 00:26:18 And it's the same thing with these machine learning techniques often. They give you this super fine grain data about what you think is going to happen for individual people. And then you say, okay, but now tell me a story with this. And there are different machine learning tools you could stick on top of that to get some interpretability, but I think we're still at the beginning of developing those. That is a great story. So another direction that I think is interesting in this space is that in practice, some companies and analysts I talked to are using

Starting point is 00:26:48 prediction approaches when ideally they want to be using causal inference. So for example, we just had a big election. And so an important thing that people talk about in election years is targeting campaign ads on the internet. And so a get out the vote organization, what they really want to know is the causal impact of serving an ad to somebody on Facebook on the chance that they'll donate or the chance that they will vote for their candidate instead of the other candidate.

Starting point is 00:27:21 But this is really hard for any number of reasons. And so instead, you might do some prediction thing, we'd say, well, I'd like to using some characteristics, predict who's going to click on an ad or predict who is a moderate, and then take those predictions as some signal of what the treatment effect of the ad would be on some outcome that I care about. In clinical applications, right, people might use the predictions of who's sickest to target some medical intervention on the grounds that the sickest people might benefit the most. I'm curious if you have any examples from your work or talking with other folks of when that type of heuristic approach tends to work well versus when it can give misleading results. I mean, I think in those applications, it's sort of like you're targeting at the people where you have good reason to believe it's valuable.

Starting point is 00:28:20 So I can imagine, you don't actually know what happens in the election example, but I can imagine you targeted the pivotal voters, right? You'd think, okay, these people are the people who might actually switch their minds. So they're the people that should get the ads. Or maybe I need to play to my base. I don't know. It depends on your preferences. But you can imagine that those actually could be quite wrong, right? So you may be misinformed. And so one of the examples that came up recently was a study by Susan Athey and some co-authors where they were looking at targeting help with filling out student applications to different kinds of students. And their prior was that the people who were really bad at doing this by themselves, the people who are least likely to apply for financial aid by themselves, were exactly the people

Starting point is 00:28:57 you wanted to help with financial aid applications. And in fact, they didn't actually do that policy. They randomized and then studied the results afterwards. And when they studied the results, they found that really it was people who were actually most likely to already do it who really just need to push over the edge. And so that's an example where you could imagine that you have a strong belief that it's the people who are the most vulnerable who need the most support. But in fact, you can imagine easy stories in which they're actually so far in the

Starting point is 00:29:23 hole that it's not going to help. And the places where you can do the most are the people who are much closer to the marginal, closer to daylight. And so I think we need to study these things. That's not to say that I don't think this is super valuable. So often we encourage folks who are thinking about using some of these causal machine learning tools to first develop a predictive model, which says something like, this person is the sickest, and then ask, can I look for heterogeneity by that prediction?

Starting point is 00:29:50 These people I think are the sickest, and then let's look for treatment effects among the sickest people, the middle people, and the most healthy people, and let's see what those look like. And you could see very quickly whether your prior is right, and then maybe you'll even run with that in the future. Maybe you'll say, I've done this in a few cases, and it always turns out to be the sickest people who benefit the most. So maybe I don't need to do this whole causal machine learning thing anymore. But I think it's worth going through the process of checking if you can.

Starting point is 00:30:20 So as part of your work at Microsoft Research, you've developed this set of tools, computer code, basically, for implementing causal machine learning approaches. Tell us about that software and how that helps researchers. Thanks for asking. So that's the EconML package, which is available on GitHub. A group of folks have worked on it, and I've actually written very little code, as you'll notice if you ever look at our contributions to the repo. So it's definitely a group effort. That's code for estimating heterogeneous treatment effects. And it's very agnostic as to what the right way to do that is. So some of the techniques that myself and some of my colleagues have worked on are in that repo, but many other techniques from leading researchers are represented there.

Starting point is 00:31:07 We want people to have access to the best available statistical technology for estimating heterogeneous treatment effects. One of the things I like about this package, as I've worked on it and discussed the conceptual issues that arise, is that we've really thought carefully about the statistics here. And we've realized along the way that there are many, many difficult statistical issues that arise. And so I want people to use automated packages because I think that if you don't use automated packages, it's easy to think you're doing the right thing and somehow make mistakes

Starting point is 00:31:40 because these mistakes can often be subtle. And there are varying kinds of mistakes, right? So one of the things that's appealing for policy evaluation of some of these causal machine learning tools is that they automate model selection. And what do I mean by that? Well, if I'm trying to figure something out about the world, about whether X causes Y, there are many different ways I could approach that problem. I could just look at the role correlation between X and Y. I could run an experiment. I could look at the correlation between X and Y after controlling for, say, age and demographics and maybe family history and medical history, lots of other things.

Starting point is 00:32:18 And I have a lot of choices to make along the way. And what machine learning tools do is they try to figure out in the way that most machine learning tries to figure out what the best forecast of something is, what the best prediction of something is, they try to figure out what the best model is. And so that sort of ties the researcher's hands a lot as to what they get to show as the set of results. And I think this is good for a number of reasons. One, we might find the best model. But two, also, it sort of limits the ability of researchers to find significant results when there aren't any by just trying and trying and trying

Starting point is 00:32:50 again until they find the thing they want to find, right? It's better in many ways if we just don't give the researcher that degree of freedom. So you're like the Steve Jobs of econometrics data analysis packages in the sense that you want to have everything bottled up and not give the user a lot of choice in what to use or how to plug it in? I mean, you know, there's a fine line there. Discretion is useful, right? But I do think at the very least, we want to have the options available to people to say, look, I did this completely standardized thing. So you should believe me when I give you these results as opposed to, no, no, I didn't do the standardized thing because blah, blah, blah. And then you should expect a little bit more scrutiny because you decided to do something sort of very bespoke.

Starting point is 00:33:34 Absolutely. Well, if you could make as much money as Steve Jobs with this software package, and if you have any giveaway, you know where my office is. Well, this is tremendous. I want to shift gears to talk about another area where you're also an expert, which is the economics of platform markets. And I want to just start very broad. What makes e-commerce different from older ways of selling stuff? Well, so I think there are a bunch of differences. The most obvious is that you're buying something online. And so this is something I worked on actually in my PhD thesis when I was looking at people buying cars on eBay. So it's 2007. People were buying a lot of cars on eBay.

Starting point is 00:34:24 And everybody thought it was very surprising because cars of all things are the case where you kind of want to go kick the tires, literally, right? That's what you want to do. And yet people were comfortable with it. There I was curious, that got me really interested in this topic of what makes e-commerce work. And one thing is that products are often standardized. And so you don't have to worry about kicking the tires in person. Another thing is that you have reviews. So you have many other users telling you whether this is a good product or not a good product. The third thing is that the platform itself, and actually let me back up a bit, platform, Amazon, Etsy, these are places where we go to buy things. That's kind of a new word in

Starting point is 00:35:04 many ways. It's been around for a while now, maybe 20 years. But before places where we go to buy things. That's kind of a new word in many ways, right? It's been around for a while now, maybe 20 years. But before that, we didn't have platforms, right? We had shops. Now we have these platforms, and they do lots of things that shops don't do. They aggregate these reviews. They recommend things to you by showing you what's available. They make it easy to find things because online, you don't have to go and actually wander through aisles. You just type things into a keyword box and they magically appear for you. So this platform is like a shop, but it's like a shop with a lot of other features. And those other features have made shopping online a pretty good experience for many people. And so many people are buying there.

Starting point is 00:35:37 Platforms are, of course, not new to the last 20 years. There's a sense in which I could think of a singles bar as a platform, the credit card networks are a platform. But the idea of certainly online platforms are new to the internet era. Yeah. I mean, you're thinking of it in the classic sort of economic sense of a platform market, which is, you know, so two-sided marketplace, right? Now I'm thinking of much more in this sort of like e-commerce-y kind of like, oh yeah, this is where I go to buy things on the internet. And you're right. The economics of the one category, the two-sided marketplace, are old and super interesting and apply to, gosh, so many different industries. But this sort of modern online phenomenon that I'm thinking of is, yeah, it's much newer.

Starting point is 00:36:18 So I want to talk specifically about the role of the platform in providing recommendations or product rankings to people. What are platforms doing there? And what does that mean for consumers and the benefits that consumers receive? So let me actually start off by saying it's not obvious what they're doing, right? This is one of the sort of opaque things about parts of the internet. Back in the day, when you looked at Yelp and you asked, what did I get shown when I typed in a search query for restaurant, you got shown the highest rated by Yelp restaurants. That is no longer the case unless you work pretty hard, right? That's not the default.

Starting point is 00:36:59 So what are they doing? One of them is they're determining relevance for you. They're trying to figure out, given what you typed in, what you want. And that is pretty solidly aligned with your interests as a consumer. If you're looking for an Indian restaurant, you don't want them to show you Polish restaurants. The second thing they're trying to do is they're trying to figure out the best of the options that are relevant. And then the question of what is best becomes a little bit complicated. Because best for whom, right? On the one hand, they want to serve their customers. So they want to give customers recommendations which are good. But of course, their customer base is not one homogenous block. It's many different kinds of

Starting point is 00:37:36 people. And so they have to think a little bit about the diversity of the things that they recommend. And on the other hand, they're taking in fees from merchants from the other side of the market, from restaurants in the case of Yelp or from manufacturers in the case of Amazon. And so they have to think a little bit about what they want to promote among the sellers, like what they can do in terms of ranking recommendations to make sellers better sellers, and also how much money they're getting from each seller. And so let me talk first about the virtuous side and then talk a little bit about the less virtuous side. So the virtuous side of this is that platforms have this incredible opportunity to discipline sellers by saying, I will not make your product very prominent

Starting point is 00:38:18 unless you are an attractive opportunity for my customer base. Both Amazon and eBay have thought really hard about free shipping. So Amazon does it through Prime, but eBay, people could give you shipping or not give you shipping. They could charge you $50 for shipping and price their product at $2, or they could charge you $2 for shipping and price it at $50, and they didn't care which one it was. But then what they just started to do is they said, look, we're actually going to say that if you have free shipping, you get to float right to the top of the search results. And so suddenly everybody was like,

Starting point is 00:38:47 oh, free shipping. That's something customers want. That's something we will offer. Amazon has a tool called the Buy Box. You want to be at the top of their rankings. You better be one of the cheapest instantiations of that kind of product. If you're very expensive

Starting point is 00:38:59 for what you're selling and it's kind of a commodity product, you won't be near the top of their rankings. So again, that creates incredible competition among sellers to drop their margins, which means that Amazon customers get very good deals. So that's the sort of the good part of the platform building or platform ecosystem. The sort of the less talked about part is that it is also true that as a platform, if I have a better contractual deal with some of the seller

Starting point is 00:39:24 side of the market, those are the people I with some of the seller side of the market, those are the people I want to put at the top of the rankings, because those are the people who are giving me the most money on every transaction. And so that's also obviously being taken into account. Most platforms say they don't do very much of this, but it's actually quite hard to study because it's hard to see their contractual obligations. So I feel like another issue which is really apparent in many searches on Amazon is just the proliferation of very similar goods that may be very similar or there may be lots of differences across the sellers in terms of quality. And so do you have a sense of why that's such a hard problem to fix? Yeah, I don't. Maybe I can offer you some conjectures, but it's interesting to think about the intermediation business, right?

Starting point is 00:40:13 Because there are a limited set of manufacturers and given any one manufactured product, it can be marketed in many different ways, which is, I think, what you see in Amazon. It's the same product being marketed by many different people. And then what is the differentiation mechanism? Well, that's often whatever the search algorithm ranks as being important. For a brief while, eBay actually made this visible to sellers. They gave them an opportunity to type in the title of their listing, and eBay would tell them how good a title they had according to their relevance algorithm. So as you can imagine,

Starting point is 00:40:49 every seller spent a lot of time trying to game that. And it turned out that their algorithm was kind of a machine learning algorithm that wasn't very thoughtful. So it turned out like free, free, free, free shipping was better than just one free before shipping. And so you've got these insane titles. And so that's crazy. But if you think about like the whole search engine optimization business, the same thing applies in retail.

Starting point is 00:41:08 If I can be the guy who cracks the Amazon code and figures out how to get my particular version of the exact same product stacked at the top of the rankings, I make a ton of money. So you can have many different businesses in the business of figuring out how to get to the top and they can leapfrog each other. And one goes up and makes some money and then they kind of don't do so well for a while. Then another goes up and they make some money. And how do you get rid of that? I don't know. You have to basically figure out if you're Amazon, that this is an exact match of the product. And I don't know if their incentives for that are that strong, because maybe they think that the customer seeing five versions of the same product doesn't really make them that unhappy. I don't know. So you talked about the good aspects and then the bad aspects of the platform's ability to rank different sellers. And I'm curious, given those opportunities for

Starting point is 00:41:54 contractual relationships that the customer may not know about, or incentives in terms of how else they might exercise market power, does any of this lead in a direction where you would advocate for regulation from the government? Or do you think this is mostly a set of issues where the platforms will get close to what's best for society by just acting in their own best interest? I think that's a pretty difficult question. That's why you're on this podcast, Greg, to answer difficult questions. Difficult questions. My sense is that you worry... Where I get worried is about something that approaches close to monopoly power, right? And there are many platforms like this in different domains. And so ultimately, if you think of these companies as being the final stop in any

Starting point is 00:42:43 journey from a product through to a customer, right, then if I'm the bottleneck at the end of some process, I can demand a lot of the value of that entire process. Now, many of these companies have bet up until now, at least in the way they talk about it publicly, but I sort of believe them actually, that their incentives at the moment are just to grow in scale. They just want as much of the market as possible. And so what they're going to do is make the deal the best for the customer they can. So from an antitrust point of view, they're in fact like antitrust heroes, right? They're out there to make customers happy. And you can argue about whether that's the right definition of what we want for the economy,

Starting point is 00:43:20 but it's certainly good for customers. The problem is that once you have enough scale, then you might want to exploit that scale. And I think this is what everybody in regulation is thinking about right now. What's to stop a company from starting to demand more and more of the pie? Antitrust authorities can make all these contracts visible, right? So they can launch an investigation and see entire chains. So they can ask, is it the case that suddenly a large share of the vertical value between sort of the cost of production and the final transaction price is going to one single entity?

Starting point is 00:43:51 That's where the interesting debate lies, is that if we think that the economic principle is that you should not be able to demand monopoly-sized profits from any vertical chain, what's the regulation that will allow us to actually get to the point where we can enforce that? I want to shift gears a little bit. One of the things we know about the modern internet economy is that companies are collecting lots of data and selling that data. What concerns does that raise for you as an economist? Yeah, that's a good question. I think that one of my big concerns around privacy is that customers or users don't really know how much loss of privacy they're signing up for.

Starting point is 00:44:41 When a company can see my data, that feels like one thing, but perhaps I didn't realize that often these companies are then selling that data on to other sellers. And so now my mortgage company might, for example, be thinking about me based on data that tracks my behavior on the internet. And so your concern here sounds almost like a consumer protection type concern, like it's my right to know about my data. Is that right? And or do you have any concerns about market efficiency?

Starting point is 00:45:18 Yeah, I mean, so on the one hand, I do think that there's a pure like creepiness kind of element, which, you know, a lot of our colleagues in other fields will think about, you know, I do think that there's a pure creepiness element, which a lot of our colleagues in other fields will think about. I do think people have a right to know how the data is used, although I think that often, back to our interpretability question, I can give you a data dump and then it has to be interpretable. And I think actually when you look at companies like Google, they've made it much easier to

Starting point is 00:45:41 see how your data has been used. The interpretability of your browsing history is sort of quite useful. But then there's also an economic question. And the economic question is, am I going to get targeted offers based on my browsing history? So in an economic context, am I going to get an offer to shop at Starbucks, for example, or to get a better rate of my mortgage. And when I get these targeted offers, is that going to be for my benefit or for the benefit of the companies making those offers? And economic theory tells us that in general, targeted offers need not be a bad thing. Because for example, if it's knowable that somebody really can't afford to pay $2

Starting point is 00:46:27 for a cup of coffee at Starbucks, but they'd be willing to pay a dollar, maybe they'll get an offer to buy it at 75 cents rather than $1.50, and that's good. On the other hand, they might get targeted and offered at exactly a dollar, which means Starbucks got to sell another cup of coffee that they wouldn't otherwise have sold, and the customers basically know better off than before. And so there's this question of both, does it improve the efficiency of the market? That is to say, do we sell more cups of coffee, roughly? But then also, who's benefiting from that? Is that Starbucks or is that the customer? And in which case is it the one or the other? So in your example, the way that a customer's data is being used is to give her a different

Starting point is 00:47:07 price on the cup of coffee. But my sense is that the primary use case of data is for product discovery or reminder ads that have a fixed price, but just say, hey, remember that you were looking at Gucci shoes the other day and you should buy these, or, you know, have you thought about going to Starbucks today or whatever? So it's not price discrimination. It's advertising something that has a fixed price. Are the efficiency implications of data sharing different in that case? Yeah. And let me say, I actually think they're kind of the same sort of topic,

Starting point is 00:47:43 although I gave one of them, I think they all sort of fall into the general category of marketing. And the one kind of marketing is sort of a discounted targeted price offer. But the other kind of marketing is just, hey, look at my product again, right? Like you say. And I think that the efficiency implications of the first under sort of, I guess, think of as like standard economic models are pretty positive, right? Because I know a lot about you, I can remind you of things that are actually really useful to you. You really had meant to buy that pair of shoes, right? It kind of floated out of your head for a little while. I put it back in front of you. You go, oh yeah, that's the pair of shoes. I really did want that pair of shoes. Great. I'm going to go ahead and

Starting point is 00:48:21 buy it, right? And so that's potentially a positive thing. As we all know, as people who've been served these ads, oftentimes you buy the pair of shoes and then you keep getting those ads for the shoes. So suddenly this doesn't look quite so useful and it starts to look annoying. And so that feels like a failure of ad technology, but there's a matching process. The best case scenario is it's just improving matching. The more complicated issues arise when we're now getting differential pricing. I see. Do you agree with my assessment that most of targeted advertising or sorry, most of the use of customer data is for targeted advertising at a fixed price

Starting point is 00:48:56 as opposed to price discrimination when charging a different price across different customers? Yes, I think so. I think that's by far the overwhelming use of sabotaging. So that's the overall optimistic view of the value of data sharing across companies for targeted ads. I think it is, but maybe with some sort of caveats, which is that I think the cases that we think of as being potentially most problematic are high value cases. So like mortgage applications, insurance applications, job applications, right? Where you might imagine that the way in which this data is used is to differentiate between applicants in ways that we may think are not fair potentially, or may lead to very differential offers that we wouldn't be entirely

Starting point is 00:49:47 comfortable with. I get the sense that the pricing angle as opposed to the matching angle is not that prevalent, but maybe is more likely to be prevalent in exactly the cases where the stakes are high. I see. So I guess the credit scoring example used for mortgage pricing, credit card pricing seems to be a leading case study there. As you see lenders in the US and around the world using more and more data to assess whether somebody has a good credit risk and then using those data to offer different prices or just to offer credit or not offer credit to different people. So can you talk us through how you think the use

Starting point is 00:50:31 of increased customer data impacts the market, especially in light of the fact that you also have these, what you'd call adverse selection issues that are at play in these markets that are not at play in the buying shoes example we talked about before. Right. So yes, exactly. So this is definitely a more complicated set of issues. So I would think about this sort of modular adverse elections without thinking through

Starting point is 00:50:55 who's going to subsequently be able to repay a mortgage or not be able to repay a mortgage. I'm now going to have targeted offers where I'm going to figure out what is, based on the data, my guess as to the maximum willingness to accept. So what rate can I charge you so that I'm still going to win your business and not lose to the competitors? I'm going to try and make that as high as possible. And so for many customers, potentially, this is going to be very different than the rate I would have charged you when I had to win every customer. If I figured out that you're the doesn't care about their mortgage repayment that much customer, right? And again, we might hope the competition is going to get rid of this, but suppose I'm some company that's got a very specific handle on you.

Starting point is 00:51:40 I figured you out better than my competition has. Maybe I'm able to give you a slightly different offer, or maybe I'll change the terms of the offer in such a way that they look more appealing to you, but actually profitable for me. And now that customer is going to be paying a little bit more than they would have than in the case where I had to make an offer that appealed not only to the Gucci customers, but to everybody else as well. So there's now this potential inframarginal loss. On the other hand, because I have much more data about who's going to repay, I've got way more information, I'm also not going to be making as many loans that are simply complete losers, where I make the loan,

Starting point is 00:52:16 this person gets the credit and then can't do anything with it, and subsequently defaults. That again is good for the firm. They're avoiding defaults. Banks not as getting any of these defaults. It might also in that case be good for society because those same dollars could potentially be reallocated to somebody who's less for credit risk, who might be able to put the same amount of dollars to productive use. So I think in those cases, the economics are a little bit more complicated. And you might think that competition is going to do a lot of the work for you here in making sure that no particular customer group gets exploited, unless there's sort of differential access to information so that some, say, banks have a preferential

Starting point is 00:52:57 relationship with data providers that would allow them to sort of exploit customers more effectively. Why, Greg, are ads so irrelevant despite all the data that tech companies have about me? I am getting ads for products I would never think of buying. And you mentioned the retargeting example where I am sometimes getting relevant stuff, but it's only relevant because I just bought it yesterday and now it's not relevant anymore. Why in a world that is awash in data and awash in concerns like the ones you're talking about, how companies might be misusing our data and micro-targeting that is somehow bad for the consumers. Why is it that as far as I can tell, the reality is that the relevance of ads that I'm being served with my data is not very high? I would point to two things. The first is that, you know, you and the rest of us are complex, multidimensional people.

Starting point is 00:53:58 And so what you look like on one day is not the same as what you look like on the next day. And so, you know, in some sense, there's sort of like a consistency issue, right? That in fact, the things that you find interesting today or care about today are not the same things as you want tomorrow. That's one thing. But I think the second thing is just the strategies are dumb, right? There's a, I'm awash with data and then there's, how do I figure out what to do with that to give you exactly what would be relevant to you? And not only that, I have to give you that diversity. I can't figure out the one thing that you might care about and then just show it to you a thousand times. It's going to be diversity. And it's got to be the case that

Starting point is 00:54:37 that has to somehow be achieved across many different platforms. So there's no coordinating mechanism, right? There are all these people who are trying to send you messages, and they're not coordinated with respect to the set of messages. So that, of course, also results in sort of a loss relative to what we might imagine would be the right sort of ads to show you. And then there's a grouping and privacy question. So we've talked a lot about privacy, but it is often the case that in reality, most groupings of ads are done at the level of like a consumer segment so they've tagged you as something like five keywords and if i tagged you with like hiking um then they're like everything hiking all the time hiking as if the only thing i ever like to do is hiking but it turns out like

Starting point is 00:55:20 you know like five keywords is already quite a lot of keywords like i've already put you in a pretty fine group you know a segment group right and so quite a lot of keywords. Like I've already put you in a pretty fine group, you know, a segment group, right? And so, and there's no sense in which the computer models have a model of you. They have a model of people like you. And it turns out you are not like, you know, all the people in your peer group. You're like a lot of other things too. And they can't figure that out. Oh, I appreciate your recognition that I am a multi-dimensional person.

Starting point is 00:55:41 Although I resent you suggesting that I'm a flip-flopper whose desires and opinions change from day to day. I'm sure you're more consistent than the average bear. And notwithstanding that, the ads I get are not relevant. Well, this is tremendous, Greg. Is there anything else that you'd like to touch on today? I'd like to mention briefly this idea that maybe customers would be better off if they could control their own data.

Starting point is 00:56:18 Yeah. And so this is something I have worked on recently with my colleagues, Najiba Lee, who's at Penn State, and Shosh Basman, who's at Stanford. And we've been thinking about what would happen if customers could control what companies knew about them? And could that make them better off? And there's some really interesting economics here. Because imagine that you get to say to the world,

Starting point is 00:56:46 look, I'm Hunt Alcott. I like these things and these things and these things and not these things. Okay. Let me say something that's not in our paper, but I think is interesting and reminds me of something that our colleague Nancy Boehm has said before, is would that be nice? Because then you could decide what it was that companies put in front of you. You would have suddenly this instantaneous control over what you wanted the world to think of you, and maybe you'd get more relevant ads. Why don't we make this ad targeting not so much a data acquisition and deployment kind of thing and more of a conversation? Tell me what you'd like to see today, Hunt. We'll figure that out. In the paper, we say, look, suppose you can control your data. You might decide which groups of people to join, because it may be the case that certain groups of people are going to get certain deals from companies, right? They've identified

Starting point is 00:57:35 with certain kind of interest groups, and that tends to give them certain kind of targeting opportunities. And it may be to your advantage then to have an opportunity to self-declare which kind of group you'd like to be in, right? Part of what's going on in our paper is a little bit more complicated than this, because what we're interested in the paper is the game theoretic implications of me saying I belong to some group and therefore the platform making deductions from the fact that I've declared for a group or not declared for a group. If I say I'm not in some group, what should they then think about me? What kinds of offers should they then make to me since I've decided not to declare for some group? Another part of that research is around the packaging of groups. If I was a benevolent company, like a credit card

Starting point is 00:58:21 company, for example, that actually controls a lot of consumer data, could they package up the consumer's data in a way that would be favorable to the consumers? So suppose I'm Visa. I have something very important. I have data that's actually data. It's not like you declaring that you like something, which you might actually fake. You might decide you like hiking because you want a discount from Octerix. So they actually have like past purchase data. This person has in fact bought a lot of actual goods. So it's verifiable. That's key.

Starting point is 00:58:50 And a second part of it is that they have data from a large, large group of people. And so one of the pitches that they could make as a reason to have a Visa credit card is we're going to get you better deals from merchants than anybody else. And so then instead of the customers having to self-declare and then the packaging happening, you could imagine a credit card just deciding to do that by themselves. They're going to package up their customer data with the express intent of extracting the best possible deals from their customers from upstream markets. And that's a very different model. And it's a model that only works if that

Starting point is 00:59:25 credit card company is one of the leading repositories of information about customers. Because if there are 20 other places that Starbucks and Octerix can get data from, then why do they need to go through the credit card company? Why do they need to give their credit card company people these very good offers? Because they already know a lot about you from other sources. So they can just say, I declined to participate in any of this stuff. And I'll give you an offer from all that information sources. Well, we've now identified, I think, multiple business opportunities for Greg Lewis. So again, when you become a billionaire, you know where my office is.

Starting point is 01:00:03 Greg Lewis, thank you for being a part of the Microsoft Research Podcast. For more information on Greg's research, people can head to gregmlewis.com. And thanks to everybody for listening to the Microsoft Research Podcast. For more info on Microsoft Research, check out microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 122 - Econ2: Causal machine learning, data interpretability, and online platform markets featuring Hunt Allcott and Greg Lewis

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.