Behind The Tech with Kevin Scott - Percy Liang: Stanford University Professor, technologist, and researcher in AI

Episode Date: March 19, 2020

We talk with Stanford University Professor Percy Liang. They discuss the challenges of conversational AI and the latest leading-edge efforts to enable people to speak naturally with computers. Visit o...ur site for more info: https://www.microsoft.com/en-us/behind-the-tech Listen and subscribe to other Microsoft podcasts at aka.ms/microsoft/podcasts

Transcript
Discussion (0)
Starting point is 00:00:00 At the end of the day, we're computer scientists building systems for the world. And I think humans make mistakes. They have fallacies. They have biases. They're not super transparent sometimes. And why inherit all these when maybe you can design a better system. And I think computers already clearly have many other advantages that humans don't have. Hi, everyone. Welcome to Behind the Tech. I'm your host, Kevin Scott, Chief Technology Officer for Microsoft.
Starting point is 00:00:43 In this podcast, we're going to get behind the tech. We'll talk with some of the people who made our modern tech world possible and understand what motivated them to create what they did. So join me to maybe learn a little bit about the history of computing and get a few behind the scenes insights into what's happening today. Stick around. Hello, and welcome to Behind the Tech. I'm Christina Warren, Senior Cloud Advocate at Microsoft. And I'm Kevin Scott. Today, our guest is Percy Lang. Percy is an Associate Professor of Computer Science at Stanford University and one of the great minds in AI, specifically in machine learning and natural language processing. Yeah, and Percy talks about the need for AI to be, quote, safely deployed. And he says that given society's increasing reliance on machine learning, it's critical to build tools that make machine learning more reliable in the wild. Yeah, I completely agree with Percy's point of view. And honestly, with like a bunch of his other very interesting
Starting point is 00:01:46 ideas about how machine learning and natural language processing are unfolding over the next few years. So I'm super interested in having this conversation. So let's find out what Percy's up to. Our guest today is Percy Lang. Percy is an associate professor of computer science at Stanford University. He's also one of the top technologists at Semantic Machines. His two research goals are to make machine learning more robust, fair, and interpretable, and to make it easier to communicate with computers through natural language. He's a graduate of MIT and received his PhD from UC Berkeley. Hey, Percy, welcome to the show. Thanks for having me. So, we always start these shows with me asking how you first got
Starting point is 00:02:37 interested in technology. Were you a little kid when you realized that you were interested in this stuff? Yeah, I think it was around maybe end of elementary school or middle school. My dad always had a computer, so it was around, but he didn't let me play with it. And what did your dad do? He was a mechanical engineer. Gotcha. Yeah. And I remember maybe my first memories are in after school, in middle school, there was a computer lab and there was a HyperCard, which is a multimedia program for the Macintosh back then. And I got really fascinated in building these relatively simple applications, but they had a scripting language so you could start to code a little bit and there's animation and all that. So it was kind of fun to get into that. I remember HyperCard as well. I believe one of the
Starting point is 00:03:27 first programs I wrote, I may be a little bit older than you are, but I do remember at one point writing a HyperCard program that was like a multimedia thing that animated a laser disc. Like you remember laser discs, like the big gigantic precursors to DVDs. Yeah, it was really such a great tool. Yeah. At that time, I also tried to learn C, but that was kind of a disaster. What are pointers and all this stuff?
Starting point is 00:03:54 C is sort of a formidable first language to attempt to learn. I mean, like one of the things, like given that you're a computer science educator, I'd be curious to hear how you think about that evolution of entry into computer science. Like on some levels now, it seems like it's a lot easier to get started than when we were kids maybe. But in other ways, it's actually more challenging because so much of the computing environment, like the low-level details are just abstracted away and like the layering is very high and it's a lot to get through. Yeah. So somehow computer science thrives on abstraction, right? From low-level machine code to C and we have Python and programming languages. And at some level, you just have graphical interfaces. So picking the right entry point into that for someone is, I think there are multiple ways you can go.
Starting point is 00:04:52 I probably wouldn't start with C if I were teaching an intro programming class, but more at kind of a conceptual level of here are the kind of computations that you want to perform. And then separately, I think a different class would talk to you about how this is actually realized, because I think there is some value for a computer scientist to understand how it goes all the way down to machine code, but not all at once. Yeah, I'm still convinced that one of the most useful things that I had to learn as a programmer who learned to program in the 80s was fairly quickly, I had to learn assembly language.
Starting point is 00:05:39 And you just sort of had to know what the low-level details were of the machine. Now, granted, the machines were vastly less complicated back then than they are now. But, like, just sort of at that atomic level, knowing how the actual machine works, just made everything else that came after it less intimidating. Yeah, it's kind of satisfying. It's kind of you're grounded. It's like playing with blocks almost.
Starting point is 00:06:02 So, you started with HyperCard. And, like, where did things go from there? Yeah, so for a while I was, I think I also learned BASIC. I was just kind of tinkering around. There wasn't, like today, as many resources as you can imagine for just kids interested in programming.
Starting point is 00:06:22 So a lot of it was kind of on my own. I think maybe a turning point happened at the beginning of high school where I started participating in this USA Computing Olympiad, which is a programming contest. You could think of it as a programming contest, but I really think of it as a kind of algorithmic problem-solving contest. So the problems that they give you are, it's kind of like a puzzle. And you have to write a program to solve it.
Starting point is 00:06:50 But much of the work is actually kind of coming up with the insight of how to, what algorithm to do it kind of efficiently. So an example might be, how many ways are there to make change for $2 using a certain set of coins. And it would be kind of this eureka moment when you found, aha, that's how you can do it. And then you have to, you know, code it up. So I think that competition really got me to kind of value this type of kind of rigor and attention to detail, but also kind of the creative aspect of computing, because you have to come up with new types of solutions.
Starting point is 00:07:31 That's awesome. And so what was the most interesting problem you had to solve in one of these competitions? Oh, that's a really good question. I think it's been a while, so I don't remember all the problems. But one, I think, one memorable maybe class ofly, you can make something that would otherwise run in years or millennia in a matter of seconds. And I remember having to, it was always these problems and you had to really figure out what was the kind of recurrence relation to make it all work. And a lot of problems were kind of centered around. Yeah, one of the amazing things about the dynamic programming technique is it really does teach you, and it might be one of those foundational things when you're getting your head wrapped around how to think algorithmically about problem decomposition. Yeah. Because, like, it's one of those magical things where if you break the problem down in just the right way, all of a sudden a solution to the problem becomes possible when it was
Starting point is 00:08:46 like intractable before. Yeah. Yeah, I think I liked it because it wasn't that you had to memorize a bunch of things or you learn, if you learn these 10 algorithms, then you'll be set. But it was kind of a much more open-ended way to think about problem solving. Yeah, that's awesome. And so you go to MIT as an undergraduate student. How soon did you know exactly the thing inside of computer science that you wanted to do?
Starting point is 00:09:13 That, I think, took a little bit of evolution. So coming out of high school, I was much more interested in his algorithmic questions and got interested in computer science theory because that was kind of a natural segue. So it was, and I started doing research in this area. And it wasn't until kind of towards the end of my undergrad where I started transitioning into, you know, machine learning or AI. And when was this? What year? This was around 2004. Okay.
Starting point is 00:09:47 Yeah. So still, like, machine learning was... Yeah, people didn't use the word AI back then. Yeah, yeah. Yeah, I mean, I remember, like, right around that time was when I joined Google, and I had been a compiler guy when I was an academic. And so, like, I'd never done AI at all, and, like, I didn't know what machine learning was when I started. And yet, you know,
Starting point is 00:10:12 three months after I joined Google, I was tasked with doing a machine learning thing and, like, you know, reading this giant stack of papers and formidable textbooks trying to get myself grounded. But it was a very interesting time, like 2004, and you sort of picked a great time to get interested in machine learning. Yeah, I had no idea that it would be the field that it is today. And why was that interesting? So I can sort of get why the theory was interesting, like you love these problems and the challenge of them.
Starting point is 00:10:50 What was interesting about machine learning? I mean, I think there's definitely this background AI would be kind of mystical aspect of, you know, intelligence that I think I'm not unique and kind of being drawn to. So when there was an opportunity to connect the things that I was actually doing with a theory with some element of that, I took the opportunity to kind of get into that. And then I stayed at MIT for my master's, which was on machine learning and natural language processing. So then that kind of really cemented kind of the direction that I really started pursuing. And so, I'm sort of interesting because if you did your master's degree there, this was right before the deep learning boom. So, it wasn't the same flavor of machine learning, natural language processing that folks are very excited about right now.
Starting point is 00:11:40 That came quite a bit later. What was your thesis about, like in particular? Yeah, so my thesis actually at MIT was about semi-supervised natural language processing. So in some ways, there are spiritual connections to a lot of the things like BERT and these things that you see today. The idea that you can use a lot of unlabeled data, learn some sort of representations. Those were based on this idea called brown clustering. And that was used to then improve performance on a number of tasks. Of course, you know, with data sets and compute
Starting point is 00:12:11 and all the regimes were different, but some of how the central ideas have been around for a while. Yeah. And so what did you do your dissertation on? Well, so during my PhD at Berkeley, I did a bunch of different things, ranging from more theoretical machine learning to applying natural language processing. But towards the end of the PhD, I really kind of converged on semantics or semantic parsing as a problem. So how do you map a natural language utterances into some sort of executable program or meaning representation? So an example is if you have a database of U.S. geography, you can ask, what's the tallest mountain in Colorado? It would translate into a little program that perform a database query and deliver you the answer.
Starting point is 00:13:02 Right. Yeah. And the challenge there is, like, you might have a database that's got, you know, like, a whole bunch of geographical objects in them. And, like, you have a type, which might be mountain. And, like, the thing might have a height property. And, like, and it's all described in this very exact way. And, like, the human utterances are very inexact sometimes. Exactly, yeah.
Starting point is 00:13:28 So the main challenge behind all of natural language processing, no matter what task you take, is just the fluidity of human language. You can say something, the same thing in many different ways, and there's nuances. So I could ask, you know, what's the tallest mountain in Colorado? In Colorado, what's the highest mountain? And so on. So having to deal with that ambiguity is, I think, the value proposition of natural language, you know, processing. Interesting. And so, like, when you were finishing up your degree, did you know that you wanted to be a professor at that point? Yeah, because I think, you know, the exact research area, I think, was still a little bit up in there. I was having a lot of fun with the semantic parsing problem. Then I spent a year in Google actually working on a semantic parser for them that powers a lot of back then was Google now.
Starting point is 00:14:28 So they have a semantic parser with the funniest name in existence, this thing called Parse McParseface. That wasn't your thing, was it? That was later, yeah. Yeah. Which is a very silly name for a parser. It's very memorable. But you can well imagine how this technology might be super important in search, where the whole search problem is asking questions of a search engine, and the search engine needs to understand something about the question so that it can get reasonable answers.
Starting point is 00:14:58 Yeah, yeah. Search and assistance and all these cases where there's a human with some information need or some action that needs to be taken, the most natural way is to use, well, natural language. And how to get computers to understand that to the extent of being useful, delivering something useful to the user is kind of the central question. And so you got your PhD at Berkeley, and then what happened next? So I applied to jobs. I got a position at Stanford, which I was very happy about. Then I took a year off to, I mean, in quotes, a year off to do something different. And I knew I
Starting point is 00:15:43 was going to be a professor and write papers. So I wanted to see how I could take this technology and actually make it kind of real in some sense. So I did a postdoc at Google and was trying to figure out how to use semantic parsing for something. And at that time, so this was 2011, I think Siri had just come out for the first time. So I think there was a sense inside Google that we should do something big about this. And so other people and I formed a team and we built the semantic parser that then powered kind of relatively simple commands, but then increasingly over time got to powering questions and all sorts of other things.
Starting point is 00:16:40 So that was really exciting to see how the tech transfer happens from academic research to actual products. And explain to folks how it's different. Like building a product where it just sort of has to work all the time for all of the users is sometimes different from building a thing that's good enough to write a paper about. Yeah, definitely. I think there's quite a big gap between what counts as a product and what counts as a paper. And the desiderata are also different, right? I think in academia, the currency is kind of intellectual ideas. Do you have something interesting to say?
Starting point is 00:17:29 And a lot of the techniques actually are interesting, but they aren't really ready to be deployed because they don't work nearly well enough. And if you're launching a product, it has to work, like you said, you know, 99% of the time, at least. And it can't make embarrassing errors. And it has to be fast and usable. So I think there's a lot of pieces that have to go into making a product. Also, in academia, people work on data sets, but the data sets are insufficient to represent the diversity of things that you would see in the real world. So that's something
Starting point is 00:18:12 that needs to be solved as well. I think there's actually a lot of interesting research problems around the kind of ecosystem of product deployment, which are not so much the focus of some academic research, probably because it's actually hard to get an idea of that ecosystem. But it's super valuable. So did you ever, like either yourself or the teams that you work with, struggle with this split between like this sort of very intellectually interesting and challenging part of building a product versus the very like, you know, sort of mundane, grunty part of building a product? So at that time, I wasn't interested in writing a paper. I just wanted to kind of
Starting point is 00:19:01 execute. So I don't think there was so much of that attention. It was just do whatever it takes to get this. It's super interesting because I've had, I've managed teams of people doing machine learning work and who had PhDs in machine learning. And like the thing that attracted them to machine learning in the first place is they were interested in the core research, like the challenging problem, like how to make this very complicated thing, like one epsilon better than what preceded it. And who got frustrated very quickly with what production machine learning looks like, which is more like lab science than it is like theoretical computer science, for instance. And, you know, sometimes I've had people who, you know, like on paper look super qualified, you know, because they've written a dissertation on machine learning to work in an ML team where someone who has a degree in applied physics, for instance,
Starting point is 00:20:11 is much more excited working on the machine learning problem because they are more interested in this sort of iterative approach to, you're wrangling the data and doing experiments and whatnot. So it's great that you, like, you never felt that tension. Like, that's almost a superpower. Yeah, I mean, I think it's, I think at some level I'm interested in, you know, solving problems. And I think there's actually, in my head, there's sometimes even a deliberate dichotomy between what am I trying to do? Am I trying to build a system that works or am I trying to understand a fundamental question?
Starting point is 00:20:50 And sometimes research can get a little bit muddled where it's not clear what you're trying to do. I have some more theoretical work which has no direct implication on product, but it's just so intellectually stimulating that you pose this question and you try to answer it. And do you think that's one of the benefits of academic research, like doing what you do in a university versus a company where you've got the freedom to have this mix of these multiple things that you're pushing on? Yeah, definitely. I feel like the benefits of academia are, is, is the freedom. I feel, you know, pretty much full freedom to think about what are the ideas that I
Starting point is 00:21:32 think are interesting and, you know, and pursue them. I think also students come into the picture quite, quite heavily because they're the ones also contributing and thinking about the ideas collectively with me. So, yeah, I think it's really, you know, an exciting environment. So, like, back to your story. So, like, when did you decide to do semantic machines? Yeah, so I started at Stanford in 2012.
Starting point is 00:22:12 And for the first three years or so, I was just trying to learn how to be a professor, teach classes, advise students. So there's plenty of stuff to do. I wasn't looking to join a startup. But then around 2016, so Dan Klein, who was one of my advisors at Berkeley, came to me and he was working on smart machines, which I'd known about, and basically convinced me to join. And I think the reason for doing so is, if I think about my experiences at Google, where you take ideas and you really get them to work and practice. I think that it was a kind of a very, you know, compelling environment and the cement machines had a lot of great people, some of which I knew from grad school. And I think the kind of critical mass of, you know,
Starting point is 00:23:22 talent was, I think, one of the main draws because you have a lot of smart people working on this incredibly hard problem of conversational AI or dialogue systems. Yeah, so it was kind of irresistible, even though, you know, my sanity probably suffered a little bit from that. So what are some of the big challenges that we still have open in conversational AI? So like you're trying to build an agent that you can communicate with as you would another human being. So like some things are like really great, like speech recognition, like turning the utterances that come out of your mouth into some sort of structured representation, like, that's pretty good now. But, like, there's still some big open problems, right? Yeah, there are a ton of open problems. I'm not worried about losing my job anytime soon. I think maybe the way to think about it is that the history of NLP has always been kind of this tension between breadth and depth, right?
Starting point is 00:24:25 We have, in the 80s and 70s, very deep language understanding systems and domains, and you could ask all sorts of questions and would do a good job. But once you go out and leave the confines of that domain, then all bets are off. And on the other hand, we have things like search, which are unstructured, they're just broad. They don't claim to understand in any sense of word, understand anything, but they're incredibly useful just because they have that breadth.
Starting point is 00:24:59 I think there's still a huge gap between and the open challenge on how do you kind of really marry the two. And a lot of these kind of conversational systems where you actually have to do things in the world, not just kind of answer questions, do require some amount of structure. And how do you marry that with kind of open-endedness of something like search? Yeah. Well, and just to like, just the way that I think about those two ends of the spectrum, right, is you have these like structured dialogue systems where you have to ask the question
Starting point is 00:25:40 in exactly the way or like pretty close to exactly the way the system expects you to ask the question in order for it to be able to respond. And on search, you can get a broad range. You can ask the question like a bunch of different ways and like expect to get a response because the question has been answered in like a gazillion possible ways on the web. And like you're going to get, you know, maybe one of those answers returned to you. And like the hard part is like in between
Starting point is 00:26:10 is of like something really understanding the question that you're asking or the command that you're giving to the system and like understanding it enough so it can then go like connect to whatever knowledge repository or a set of APIs or whatever else that is going to do the thing that they want done. I mean, one thing that search, I think, did really well is the interface, right?
Starting point is 00:26:34 The interface promises nothing. It promises template links or maybe some summaries. And I think as opposed to a system where it's just framed as an AI who is trying to do the right thing for you and there's only disappointment when it doesn't. Whereas a search, how many times you search and you don't find what you want and it's like, okay, well, it's user error.
Starting point is 00:26:54 Let's try again. But that allows you to get so much more data and signal and a potential for improvement, which whereas if you have an assistant that just doesn't work, then you just give up. Yeah, there is this weird psychology thing, right, where with the interfaces, like, you almost feel embarrassed when you ask the, like, the software question verbally and it doesn't give you the right answer.
Starting point is 00:27:24 Like, you just sort of assume you've done something wrong. Whereas somehow or another with search, like we've, and like it reminds me a little bit of my mother. Like my mother, whenever she can't get her computer
Starting point is 00:27:37 to do what she wants it to do, she always assumes that it's her fault, which is a weird way to approach technology. So let's go back to the work that you do at Stanford. So you spend part of your time teaching students, like in particular, like you're teaching some of the AI curriculum at Stanford, and then you're doing research. So talk a little bit about the teaching. Like, how has teaching students machine learning changed over the past handful of years?
Starting point is 00:28:15 Yeah, so the main class I teach at Stanford is CS221, the main AI class. And I've been teaching this since 2012. When it started, there were less than 200 students in the class. And last year, there were 700 or so. So there's definitely the most salient thing that has happened is just a sheer number of students wanting to learn this subject matter. So that has presented a number of challenges. I think people are taking the class from a fairly heterogeneous population. There's undergrads who are learning computer science and trying to, you know, are excited about AI and want to learn about it. There's master's students who have a little bit more research experience, maybe. There's people from other departments who have actually quite advanced mathematical abilities
Starting point is 00:29:15 and are trying to learn about AI. There's people, professionals, who are working full-time and trying to learn about AI. So one of the challenges has just been how to accommodate all this diverse population. And how do you do that? It's challenging. There are certain things that we try to do, trying to have materials which are presented
Starting point is 00:29:42 from kind of slightly different perspectives and have review sessions on certain types of topics. But honestly, I don't have a, you know, great solution. We have a lot of TAs who can, you know, help. But it's, I think scaling education is one of those very hard problems. Like when I was teaching computer science, when I was working on my PhD, the thing that was always super challenging for me, like I taught CS 201 I think a couple of times, which was like at the University of Virginia. It was the first serious software engineering course that you took, or programming course. And like we had such a broad range of students taking the class that it was, and you would have people who came in who were,
Starting point is 00:30:35 like had years and years of experience, like by the time they got there programming, like they learned a code when they were 12. And, you know, you sort of risked every other thing that you were doing, boring these poor kids to death. And then you had folks who were, like, coming in because they were interested in computer science. And, like, they had almost no background whatsoever. They never programmed. And, like, they might not even have the, you know, sort of analytical, you know, background that is helpful when you're learning to code. And like, that was always a huge challenge for me. Like, I don't know whether I was ever any good at it or not.
Starting point is 00:31:11 Yeah, I think that if I had much more time, I would kind of sit down and really think about how to best structure this. I mean, I think the way to do it is trying to break things down into modules and making sure that people understand basic things before they move on to more advanced things. I think when you have these kind of banner courses like AI, people take it, but they don't really, they land somewhere in the middle,
Starting point is 00:31:39 and they're trying to figure out things, and it's much more of a kind of treading water kind of situation as opposed to like really kind of building up, you know, building blocks. So one of the interesting things that I think has really happened over my career doing machine learning things is in 2003, when you were doing machine learning stuff, like you were more or less starting from scratch whenever you're trying to build a system. And like now, if you want to do
Starting point is 00:32:12 something with machine learning, you've got PyTorch, you've got, you know, you've got like notebooks, like Jupyter notebooks, you've got all of this sort of incredible infrastructure that is available to you to, like, build things. Like, my favorite anecdote is, like, the thing that I did at Google, which is my first machine learning project that took, like, reading a bunch of, like, heavily technical stuff and, like, probably six months worth of very hard work, like a high school kid with sufficient motivation, like using a bunch of open source tools could do in a weekend, which is just incredible. But I'm guessing that also puts pressure on the curriculum,
Starting point is 00:32:58 like what you provide as programming exercises for kids where you just sort of just keeping pace with the overall field's got to be challenging, right? Yeah. So, it's certainly very incredible how far we've come in terms of tools. And again, this is the kind of the success story of abstractions and computer science where we don't, many people don't have to think about registers to program and get even close to kind of assembly level. People programming Python might not have to think about memory management. And now when you're working with something like PyTorch or TensorFlow, you can think about the modeling and focus on the modeling without thinking about how the training works. Of course, I think in order to get off the ground and have a kind of a hackathon project,
Starting point is 00:33:49 you can get by with not knowing very much. I think to get kind of really kind of serious, these abstraction barriers are also leaky, and I think someone would be well-served to understand what are gradients and how are those, you know, computed. So, I think in the class that I teach, we definitely expose students to the raw, kind of the bare metal, so to speak.
Starting point is 00:34:21 For example, in the first class, I show people how to do stochastic gradient descent. That's fantastic. And I code it up, and it's 10 lines of code, and not using PyTorch and TensorFlow. And I want people to understand that some of these ideas are actually pretty simple. But you have to kind of, but I wanted people to get exposed to the simplicity rather than being kind of scared off by, oh, that's underneath the PyTorch wrapper. Yep.
Starting point is 00:34:55 And because at some level, all of these pieces are actually quite, you know, understandable. Yep. And I think that's a great thing that you're doing for your students, because one of the things I do worry about a little bit is that we have these very powerful abstractions, but the abstractions make a bunch of assumptions that are not necessarily correct. That, for instance, stochastic gradient descent is the best numerical algorithm to fit the parameters of a deep neural network. It's a very good technique, but like we shouldn't assume that that is a solved problem. ordinary differential equations where they were, like, modeling the interior state of a DNN using
Starting point is 00:35:45 ordinary differential equations and using, I think, something like fourth-order Runge-Kutta or something to solve, which is very, very, very different from, you know,
Starting point is 00:35:55 stochastic gradient descent. Yeah. And, like, the fact that, like, that sort of exploration is great that it's still happening. Yeah. One thing I do in the AI class is be very kind of structured about the framing of a class in terms of modeling and algorithms. Right.
Starting point is 00:36:15 So you can think about, for a given problem, how do you construct the model? It could be a neural net architecture, but it could talk about some other topics like graphical models. It could be like what your Bayesian network looks like. And then separately, you think about how I'm going to perform inference or do learning in these type of models. And I think that decoupling is something that I find students often kind of find it hard to think about because your knee-jerk reaction to solve a problem is to go directly solve the problem.
Starting point is 00:36:49 But figuring out how to model the situation, which specify kind of what you want to do, and then the algorithms are how you want to do it, is really, I think, a powerful way to think about the world. So as an NLP person, what do you think about all of this stuff happening with self-supervised learning right now and natural language processing? So this is the BERTs, the GPT-2s, the Excel Nets. Like we even, you know, just a little while back, like Microsoft disclosed this new, like Turing NLG, which is a 17 billion parameter model that we've been working on for a little bit.
Starting point is 00:37:29 Yeah. No, it's super impressive. I mean, I would have said that, you know, four or five years ago, I wouldn't have predicted kind of the amount to the extent to which these things have been successful. And it's certainly not the idea of doing so. I mean, these are things I even explored in my master's thesis, but that's clearly a big difference between having an idea and actually showing that it actually works. So clearly these methods are being deployed everywhere, and I think people are getting quite a bit of mileage out of them. I think there's still problems that these methods are not sufficient to solve by themselves. I mean, I think they're going to be part of any probably NLP solution for until the end of time. But I think
Starting point is 00:38:21 kind of deeper language understanding beyond kind of these benchmarks that we have are going to possibly demand some other ideas. Yeah, and that certainly seems true, even though I'm very bullish about, like, the fact that we seem to be able to get performance, like improving performance by making the models bigger on the things that the models are good at. It still is unclear to me that they're going to be good enough at everything. Like this is not going to solve AGI in and of itself, I don't think. Yeah. I mean, it's interesting to ponder how far you can push these. Like if you train it on literally all the texts in the world, you know, what do you get? And you gave it as many neurons as a human brain.
Starting point is 00:39:09 Yeah. Yeah, we're likely to find out at some point. I think there is cases where, that, you know, we've been doing in our research group at Stanford where even the most kind of advanced models make the kind of the most, the dumbest mistakes, where you have a question answering system, you add an extra comma, or you replace a word with a synonym, and it goes from working to not working. Yeah.
Starting point is 00:39:39 So even with BERT and things like this. So it's kind of interesting to ponder the, you know, significance of that. So from a practical perspective, you know, it doesn't matter actually that much because you throw more data, you can kind of get things to work and on average things will be fine. But from a kind of intellectual perspective of do these models really understand language, the answer is a kind of a clear, at least for me, a clear no, because no human would make some of these mistakes. Yeah, although that's
Starting point is 00:40:08 an interesting thing that I've been thinking about. And I think I actually agree with you. But like one of the things that I've been pondering the past few months is just because these models, and it's not just the speech models,
Starting point is 00:40:22 like vision models also, you like stick a little bit of what looks like uncorrelated noise into them and all of a sudden, like, you know, you recognize my face is like my boss's face, right? Like they make mistakes in ways that are very idiosyncratic to the models and very, very much not like the mistakes that humans would make. But humans make mistakes as well. And I just sort of wonder whether, like for myself, that I am creating an unnecessary false equivalence between these AI systems and like biological intelligent systems where just because the software makes a mistake that a human wouldn't doesn't mean that it's not doing useful things.
Starting point is 00:41:11 And just because I can solve problems easily that it can't doesn't invalidate the thing that the machine learning system can do. Yeah, definitely. I think these examples merely illustrate the kind of the gap between these machine learned models and humans. And I think it's absolutely right to think of machine learning AI as not chasing human intelligence, but more, they're a different thing. And I've always thought about these things as, you know, tools that we build to help us. I think a lot of AI does come from this, you know, chasing human intelligence as inspiration, which has gotten, you know, quite a bit of mileage. But at the end of the day, you're computer scientists building systems for the world.
Starting point is 00:42:08 And I think humans make mistakes. They have fallacies. They have biases. They're not super transparent sometimes. And why inherit all these when maybe you can design a better system? And I think computers already clearly have many other advantages
Starting point is 00:42:26 that humans don't have. They don't need to sleep. They have memory and compute that vastly exceeds. They don't get bored. Yeah, so I think leveraging these, which we already have, but kind of further just thinking holistically about how do we build the most useful tools might be a good way forward. more about task intelligence than general intelligence and, you know, trying to derive inspiration from biology, but like not being, you know, not being fixated on it. Yeah, I mean, this kind of whole debate goes back to, you know, the 50s and with AI versus
Starting point is 00:43:16 IA, artificial intelligence versus intelligence augmentation, where intelligence augmentation kind of more, where it is this kind of spiritual ancestors of of spiritual ancestors of the field of HCI, human-computer interaction. And at some level, I think that I'm more kind of philosophically attached to that kind of way of thinking about how we build tools. But clearly, AI is providing this kind of massive set of assets that we should be able to use somehow. Yeah. So a couple more questions before I ask you something fun. How do you think academia and industry could be doing more together? Like one of the things that I'm a little bit worried about, like less so this year than last, is some of these machine learning workloads now just require exorbitant levels of resources to run. So, like, training one of these big self-supervised models, just, like, the dollar cost on the computations is getting to be just gigantic,
Starting point is 00:44:35 and it's going to grow. Like, it's been expanding at about, you know, 8 to 10x a year. And so I sort of worry about, like, with this cost escalating, like how can everybody participate in the development of these systems, like especially universities, like even well resourced ones like Stanford? Yeah, I think it's on a lot of people's mind, the compute required for being kind of relevant and kind of modern ML. I think there's a couple of things. Well, certainly, I know that companies have been providing, you know, cloud credits to academia.
Starting point is 00:45:17 And certainly, this has been helpful. Probably more of it would be more helpful. But I think that's maybe not, you know, in some sense, a kind of a panacea because however many cloud credits industry gives, industry is always going to have more that they can do in-house. But I think a lot of the way I've been thinking about research at Stanford without unlimited resources is a lot of times you can be kind of orthogonal to what's going on. So some of our recent work focuses on methods for understanding what's going on in these BERT pre-trained models or how to think about interpretability or fairness. And I think some of these questions are fairly kind of conceptual and the bottleneck there isn't just doing more compute,
Starting point is 00:46:19 but to actually even define the question and think about how to frame it and solve it. And I think that another thing which I alluded to earlier is that there are clearly a lot of real-world problems that industry is facing, not just in terms of scale, but the fact that there's real systems with real users,
Starting point is 00:46:45 there's feedback loops, there's biases and heterogeneity. And I think there's a lot of potential for surfacing these kind of questions that I think the academic community would be helpful in kind of answering at a kind of conceptual level. I think product teams are probably too busy to be pondering about what is the right way to solve these problems, but they have the problems. And if these can be somehow brought out,
Starting point is 00:47:18 I think we would probably be able to leverage all the kind of intellectual horsepower in academia to solve kind of real, really relevant problems. That sounds like a great idea. So two more questions. One in your role as an AI researcher. So what's the thing that excites you most about what you see on the horizon, like what's going to be really interesting over the next few years? Yeah, if only I could predict the future. I think one of the things that has been exciting to me is program synthesis. The idea that you can automatically write programs from either test cases, examples, or natural language.
Starting point is 00:48:08 And in some ways, this is kind of an extension of some of the work I did on, you know, semantic parsing. But if you think about it from at a high level, you know, we have users, and they have, you know, desires and things that they want to do. How can you best harness the ability of a computer to meet those needs? And currently, well, you have, you can either program if you know how to program
Starting point is 00:48:38 or you can use one of these existing interfaces. But I think those kind of two are very limiting. If you could have users that could kind of express their desire in some sort of more fluid way, even with examples or language, and have computers kind of synthesize these programs or computations, then you could really, I think, amp up the amount of leverage that ordinary people have.
Starting point is 00:49:07 And also to think about how even not kind of end users, but programmers could benefit a lot from having better tools. We have these enormous code bases, and programming is, at the end of the day, a lot of in the weeds work. And I think the use of machine learning and program synthesis could really open up the way towards maybe a completely different way of thinking about programming and code. And that's kind of, as a computer scientist, that is very fascinating.
Starting point is 00:49:44 Yeah, and I'm really glad to hear you say that you're excited about the prospect of that. Because one of the things that I do worry about is we, you know, we're now at the point where non-tech companies are hiring more software engineers than tech companies. It really is the case that every company has to deal with code and software and the value that they're going to create over the next several decades of their businesses is going to be in the IP and software artifacts that they're creating to run their businesses and solve their customers' problems. And, like, there just aren't enough programmers on the planet to go do all of this work. And, like, a lot of our customers, like, especially when you're talking about machine learning, like, they just can't hire.
Starting point is 00:50:36 Like, we have a hard enough time hiring all the people in the tech industry in Silicon Valley, right? And so this idea that, like, we could change the paradigm of computing to be like we all know how to teach our fellow human beings how to do things. Like if you could figure out how to teach computers how to do things on your behalf, like that then opens things up to like an unbelievable number of people to do an unbelievable number of things. Yeah. I want to blur the line between what a user and a programmer. Yeah.
Starting point is 00:51:10 And also it's a really hard problem. Yes. Like the best technologies that we have can maybe synthesize 20 lines of code. But if you think about the types of code bases that we're dealing with, there's millions and millions of lines. So I think, you know, as a researcher, I'm kind of drawn to these challenges where you might need kind of a different insight
Starting point is 00:51:31 to make progress. Super cool. So one last question. So just curious what you like to do outside of work. I understand that you are a classical pianist, which is very cool. Yeah, so piano has been something that's always been with me
Starting point is 00:51:51 since I was a young boy. And I think it's also been kind of a counterbalance to all the other kind of tech-heavy activities that I've been doing. What's your favorite bit of repertoire? I like many things, but late Beethoven
Starting point is 00:52:07 is something I really enjoy. I think this is where he becomes very reflective about, and his music has a kind of inner... It's very deep. I kind of enjoy that.
Starting point is 00:52:24 What particular piece is your favorite? So he has a Beethoven sonata. So I've played the last three Beethoven sonatas, Opus 109, 110, 111. They're wonderful pieces. Yeah. And one of the things that I actually, you know, one of the challenges
Starting point is 00:52:44 has been incredibly hard to make time for kind of a serious hobby. And actually in graduate school, I was very, there was a period of time when I was really trying to enter this competition and see how well I could do. Which competition? It was called the International Russian Music Piano Competition. It was in San Jose. I don't know why they had this name. But then,
Starting point is 00:53:14 I practiced a lot. There's some days I practice like eight hours a day. But at the end, I was just like, it's just too hard. I can't compete with all these people who are the of the professionals. And then I kind of, I was thinking about how, what is the bottleneck? Like often I have these musical ideas and I know what it should sound like, but you have to do the hard work of actually doing the practicing. And kind of thinking maybe wistfully,
Starting point is 00:53:47 maybe machine learning and AI could actually help me in this endeavor because I think it's kind of an analogous problem to the idea of having a desire and having a program being synthesized or an assistant doing something for you. I have a musical idea. How can computers be a useful tool to augment my inability to find time to practice? Yeah.
Starting point is 00:54:13 And I think we are going to have a world where computers and machine learning in particular are going to help with that human creativity. But one of the things i i find classical piano is like this very fascinating thing because on it's a one of those disciplines and like there are several of them where it's just blindingly obvious that um the difference between expertise and non-expertise. Like, no matter how much I understand,
Starting point is 00:54:49 and so, like, I'm not a classical pianist. Like, I'm just an enormous fan. Even though I understand the, I understand harmony, I understand music theory, I can read sheet music, I can understand all of these things, and I can appreciate Martha Argerich playing, you know, Liszt's Piano Concerto No. 2 at the proms. There's no way that I could sit down at the piano and, like, do what she does because she has put in an obscene amount of work training her neuromuscular system to be able to play and then to just have years and years and years of, like, thinking about how she turns notes on paper to something that communicates a feeling to her audience.
Starting point is 00:55:34 And it's like really just to me stunning because there's just no, there's no shortcutting it. Like you can't cheat. Yeah. It's kind of interesting because in computer science, there's sometimes an equivalence between the ability to generate and the ability to kind of discriminate and classify, right? If you can recognize something, whether something's good or bad, you can use that as objective function to hill climb. But it seems like in music, we're not at the stage where we have that equivalence. You know, I can recognize when something is good or bad, but I don't have the means of producing it. And some of that is physical, but I don't know. Maybe this is something that is in the back of my mind, in the back pocket, and I think it's something that maybe in a decade or so I'll revisit.
Starting point is 00:56:16 The other thing, too, that I really do wonder about with performance is there's just something about... Like, for me, it just happens to be classical music. I know other people like have these sorts of emotional reactions to rock or jazz or country music or whatever it is that they listen to. But I can listen to the right performance of like Chopin's G minor ballad. And like there are people who can play it and like I'm like, oh, this is very nice and like I can appreciate this. And there are some people who can play it
Starting point is 00:56:50 and it like every time I listen to it, 100% of the time I get goosebumps on my spine. Like it provokes a very intense emotional reaction. And I just wonder whether part of that is because I know that there's this person on the other end and they're in some sort of emotional state playing it that resonates with mine and whether or not like you'll ever have a computer be able to do that. Yeah, that's a, I mean, this gets kind of a philosophical question at some point.
Starting point is 00:57:20 If you didn't know it was a human or a computer, then what kind of effect would it have? Yeah, and I actually, you know, I had a philosophy professor in undergrad who, like, asked the question, like, would it make you any less appreciative of a Chopin composition knowing that he was being insincere when he was proposing it? Like, he was, you know, doing it for some flippant reason. And I was like, yeah, I don't know. Like it's... Well, one of my piano teachers used to say that you kind of have to, it's kind of like theater. You have to convey your emotions, but there has to be some, even when you go wild, there has to be some element of control on the back because you need to kind of continue the thread.
Starting point is 00:58:04 Yeah. Yeah, for sure. Yeah. But also, it is, for me also, just the act of playing is the pleasure of it. It's not just having a recording that sounds good to me. Yeah, I know I'm very jealous that you had the discipline and did all the hard work to put this power into your fingers. It's awesome. Well, thank you so much for taking the time to be with us today.
Starting point is 00:58:36 This was a fantastic conversation, and I feel like I've learned a lot. Yeah, thanks for having me. My pleasure. It's awesome. So that was Kevin's chat with Percy Lang from Stanford University. And, Kevin, you know what was really interesting was hearing both you and Percy reminisce about your experiences with HyperCard. And that was Percy's kind of introduction to computing, or I guess programming. That was actually my introduction to programming too in a weird way.
Starting point is 00:59:05 Oh, that's awesome. And yeah, before I built web pages, I was building HyperCard things. And what kind of struck me is as you were talking about how to teach the next generation and talking about different tooling, the idea of or the concept of like a HyperCard for AI, that's something that I think would be really, really beneficial.
Starting point is 00:59:26 What are your thoughts? Well, I think he was getting at that a little bit when he was talking about his ideas around program synthesis at the end of the interview. So it's really interesting. I find this to be the case with a lot of people that the inspiration, like the thing that first tugged you into computing and programming, oftentimes sticks with you your entire career. And so he started his computing experience thinking about HyperCart, which is this very natural, easy way to express computations. And still to this day, like the thing that he's most excited about is how you can use these very sophisticated AI and machine learning technologies to help people express their needs for compute in a more natural way so that the computer can go help people out. Like, I think that's so awesome. Yeah, I do too. And I thought the same thing when we was talking about the program synthesis. That has some people, I think, understandably maybe freaked out, right? Like the idea that, oh, these things can write themselves. But when you put it in that context of it might make things more accessible and less intimidating and
Starting point is 01:00:39 more available across a variety of different things, I think it becomes really exciting. Yeah, I've been saying this a lot lately. There's a way to look at a bunch of machine learning stuff and get really freaked out about it. And then there's a way to look at machine learning where you're like, oh, my goodness, this piece of technology is creating a bunch of abundance that didn't exist before, or it's creating opportunity and access that people didn't have before to more actively participating in the creation of technology. And that's the thing that really excites me about the state of machine learning in 2020. Yeah, I agree.
Starting point is 01:01:20 I think that there's massive potential for that. And kind of pivoting from that, one of the things the two of you talked about towards the end of your conversation was, I guess, the relationship between academia and industry when it comes to AI and ML. And you were talking about do you see as the opportunity for academia and industry to work together? And what do you think are the – what's maybe one of the areas where there's friction right now? Yeah, I think that Percy nailed it in his assessment. So there's certainly an opportunity for industry to help academia out more with just compute resources. Although, like, I think these compute resource constraints, in a sense, aren't the worst thing in the world. Like, the brutal reality is that even though it may seem that industry has an abundance
Starting point is 01:02:18 of compute relative to a university research lab, if you are inside of a big company doing these things, the appetite for compute for these big machine learning projects is so vast that you have scarcity even inside of big companies. And so I think that's a very interesting like constraint for both academia and industry to lean all the way into
Starting point is 01:02:44 and to try to figure out clever ways for solving these problems. And I'm super excited about that. But like the point that he made, which I found particularly interesting, is the fact that if we could do a little bit better job sharing our problems with one another, we could probably unlock a ton of creativity that we're not able to bring to bear solving these problems right now. And that's something that's one of the reasons I love doing these podcasts. So I'm going to go back and do my job as CTO of Microsoft and see if I can try to make that happen more. I love it. I appreciate you doing that, and I appreciate Percy's work as well. Well, that's just about it for us today.
Starting point is 01:03:28 But before we end, I just have to say that, Kevin, I have been excitedly anticipating the release of your book, which will be out on April 7th. It's called Reprogramming the American Dream. And I've actually had a tiny sneak peek, and it's really, really well written. It's really good. Oh, thank you. You are too kind. So I am looking forward to it being out as well. I got a box of books in the mail the other day. This is the first book that I've ever written, so I was like I had this pinch me moment when I opened this box, and there were this stack of hardcover books that had the words printed in them that I had written.
Starting point is 01:04:08 So that's sort of amazing. That's so cool. I love that so much. And I'm definitely going to be recommending it to my friends and my fellow tech nerds out there. Because what I really like about the book is that it really does break down a lot of the things we've been talking about in this conversation, like AI in an understandable way, in a way that is pragmatic and not, you know, scary. Yeah, that was a goal. I was hoping to take a bunch of material that can be relatively complex and presented in a way that hopefully it's accessible to a broad audience. So I think it's actually critically important, like one of the most important things in AI is to have all of us have a better grounding of what it is and what it isn't so that we can make smart decisions about
Starting point is 01:04:50 how we want to employ it and how we want to encourage other people to use these technologies on our behalf. I love it. I love it. All right. Well, that does it for us. As always, please reach out anytime at BehindTheTech at Microsoft.com. Tell us what's on your mind and be sure to tell everyone you know about the show. Thanks for listening. See you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.