Microsoft Research Podcast - 099 - Program synthesis and the art of programming by intent with Dr. Sumit Gulwani

Episode Date: November 20, 2019

Dr. Sumit Gulwani is a programmer’s programmer. Literally. A Partner Research Manager in the Program Synthesis, or PROSE, group at Microsoft Research, Dr. Gulwani is a leading researcher in program ...synthesis and the inventor of many intent-understanding, programming-by-example and programming-by-natural language technologies – aka, the automation of “what I meant to do and wanted to do, but my computer wouldn’t let me” tasks. Today, Dr. Gulwani gives us an overview of the exciting “now” and promising future of program synthesis; reveals some fascinating new applications and technical advances; tells us the story behind the creation of Excel’s popular Flash Fill feature (and how a Flash Fill Fail elicited a viral tweet that paved the way for new domain investments); and shares a heartwarming story of how human empathy facilitated an “ah-ha math moment” in the life of a child, and what that might mean to computer scientists, educators and even tech companies in the future. https://www.microsoft.com/research  

Transcript
Discussion (0)
Starting point is 00:00:00 99% of people who use computers do not know programming, and they get stuck with repetitive, tedious tasks. But these are quite creative people, and it is just that programming as it exists today creates an artificial barrier to program computers. Program synthesis can allow these end users to express themselves naturally and program computers as easily as they would command a personal assistant. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and
Starting point is 00:00:36 the scientists behind it. I'm your host, Gretchen Huizenga. Dr. Sumit Gulwani is a programmer's programmer, literally. A partner research manager in the program synthesis or pros group at Microsoft Research, Dr. Gulwani is a leading researcher in program synthesis and the inventor of many intent understanding, programming by example, and programming by natural language technologies. Today, Dr. Gulwani gives us an overview of the exciting now and the promising future of program synthesis, reveals some fascinating new applications and technical advances, tells us the story behind the creation of Excel's popular Flash Fill feature and how a Flash Fill fail elicited a viral tweet
Starting point is 00:01:23 that paved the way for new domain investments, and shares a heartwarming story of how human empathy facilitated an aha math moment in the life of a child and what that might mean to computer scientists, educators, and even tech companies in the future. That and much more on this episode of the Microsoft Research Podcast. Sumit Gulwani, welcome to the podcast. Thanks, Gretchen. It's great to be here.
Starting point is 00:01:59 So I like to situate guests at the beginning, and you're a partner research manager in the Program Synthesis or PROSE group at Microsoft Research, and you describe yourself as a scientist seeking connections between ideas, between research and practice, and with people in varied roles. So in other words, you're a dot connector. In broad strokes, tell us what gets you up in the morning. What X are you and the other members of your group trying to solve for? It is simply the opportunity to work with the fantastic team that I am part of. We have rich experts in programming languages, formal methods, software engineering, machine learning, and even human-computer interaction. We have researchers and engineers working closely with program managers
Starting point is 00:02:46 and user experience designers. I'm also terribly excited about the charter of this team, which is to advance the state of the art in program synthesis and to deliver these innovations as magical experiences across a wide range of Microsoft products. Well, we have a lot of ground to cover today on the topic of program synthesis, so let's set the stage and operationalize the term. What is program synthesis, and why is it significant?
Starting point is 00:03:15 Program synthesis is any capability to automatically generate programs or code fragments from users' intent expressed in some natural form like input-output examples, demonstrations, partial programs or even keywords or natural language. This has the potential to facilitate two disruptions. First, creation of 100x more programmers because 99% of people who use computers do not know programming and they get stuck with repetitive tedious tasks. But these are quite creative people and it is just that programming as it exists today creates an artificial barrier to program computers. Program synthesis can allow these end users to express
Starting point is 00:04:05 themselves naturally and program computers as easily as they would command a personal assistant. The second disruption relates to improving the productivity of existing developers and data scientists by 10 to 100x in many task domains. Most of the code that these folks are writing is simply boilerplate code. And there is very little algorithmic creativity involved. Program synthesis can liberate programmers from having to focus on boring details and enable them to focus on creative aspects of programming. So let me drill in a little bit there on the claims of 100x and 10x to 100x increase in productivity and so on.
Starting point is 00:04:50 What data do you have to support the claims that this is going to make our productivity go way up? We did some user studies related to accomplishing a given task. So what took Python programmers around 30 minutes to accomplish was doable using our program synthesis technologies in less than a minute. And this was quite representative of the tasks in the space of so-called data wrangling or data cleaning. Right. So is this technology being used and you're seeing it in action and seeing these increases or is it just on sort of research study end of things right now? So is this technology being used and you're seeing it in action and seeing these increases, or is it just on sort of research study end of things right now? In fact, it is very real. The first mass market deployment of the program synthesis technology was in the form of this feature called Flash Fill in Excel,
Starting point is 00:05:40 which has been in Excel since 2013. And since then, we have released similar experiences based on example-based interaction across many different Microsoft products. Well, let's zoom in and talk about Flash Fill, which you just mentioned. And it synthesizes string transformation programs quickly with minimal input-output examples, sometimes even just one. Tell us about Flash Fill. What inspired the idea in the first place? And tell us technically how you went about building in those efficiencies. Ten years ago, I was flying from Frankfurt to Seattle,
Starting point is 00:06:16 and there was a lady sitting next to me in the airplane. She was impressed to know that I have a PhD in computer science and that I work for Microsoft Research. So I ought to help her with science and that I work for Microsoft Research. So I ought to help her with the task that she was struggling with. She opens up her laptop, fires up Excel, shows me a column of names in the format, first name space last name, and asks me how can she reformat it in the form last name comma first name. Now at that time, I had no idea about the programming model underneath Excel, so I had
Starting point is 00:06:45 to excuse myself out of the situation. After returning home, when I searched for a solution to that problem on Excel help forums, it is then that I realized that there were many, many people who struggled with simple repetitive tasks and would solicit the help of an expert on a health forum while communicating their intent using a few input-output examples. A typical interaction would take place over a course of several days. Now this inspired me to develop the Flash Fill system that can automate the role of the expert on the health forum and bring down the interaction time from a few days to a few seconds. The key technical aspect of Flash Fill is the search strategy that it uses. Instead of searching over a general purpose programming language,
Starting point is 00:07:32 we restrict the search to an appropriately designed domain specific language that includes operators for string transformations such as regular expressions, substring, concatenate, and limited form of conditionals to allow for dealing with data in different formats. Secondly, instead of blindly enumerating programs over this underlying DSL, we leverage logical properties of the operators in this DSL to allow for a goal-directed search.
Starting point is 00:08:01 So essentially, we flow the input-output examples down the grammar of the DSL, and this identifies programs that are consistent with the examples. The third idea stems from the observation that we also use several heuristics to guide the search over many choices that remain even after logical reasoning has pruned most of the possibilities in the search tree. Authoring and maintaining these heuristics has been an expensive proposition. So recently, after logical reasoning has pruned most of the possibilities in the search tree, authoring and maintaining these heuristics has been an expensive proposition. So recently, we have actually started using machine learning to learn these heuristics
Starting point is 00:08:33 from data that relates to past exploration of the search space over various benchmark tasks. Another defining aspect of Flash Fill, as you mentioned, is the fact that it can learn the user's intent from very few examples. Right. And it does this by ranking programs to pick a candidate from among the many programs that satisfy the user's examples. We prefer programs that are smaller, simpler, use fewer constants, use smaller sized constants. We also examine the output generated by these programs on the other test inputs that are available. We prefer those programs that generate outputs that are similar and
Starting point is 00:09:15 look uniform. For instance, suppose we have one program that generates all outputs that are valid dates, and another program that generates mostly dates, but also some gibberish stuff. Right. Then we have an additional reason to prefer the first program. valid dates, and another program that generates mostly dates but also some gibberish stuff, then we have an additional reason to prefer the first program. Let's talk a bit about a scenario where Flash Fill didn't work. I'll call it a Flash Fill fail. Tell us that story and then tell us what you learned as a researcher from that experience.
Starting point is 00:09:43 Flash Fill can automate a wide variety of string transformations. However, there are many transformation tasks that it cannot do, such as number transformations or date transformations. But the user experience is so inviting that people invariably want to give it a try, even on tasks that it was never meant for, and then talk about it when it does not work. So there was this recent tweet on October 2018 that made fun of Flash Fill and was re-shared
Starting point is 00:10:07 more than 10,000 times. Twitter user Darren wrote, AI is going to take over the world, but look what Excel auto-populated for me today. Darren gave an example converting DEC to December and Flash Fill auto-completed JAN to Janember and OCT to October. My team found it funny enough to print this tweet on a t-shirt. And now this t-shirt is my dress shirt for my keynotes. The positive thing here is that this kind of feedback has been extremely useful for us to decide what new domains to invest into for program synthesis. While we're on the topic of Flash, Phil, I don't need to remind anyone that Microsoft Excel is one of the planet's most widely used
Starting point is 00:10:54 software programs, and by extension, as Ben Zorn points out, the spreadsheet is among the most widely used programming languages. So we're firmly in product territory here with Excel, and you're firmly planted in research territory. How do you manage the balance between academic research and product engineering? So by the way, Ben Zon has been a great mentor and collaborator and has shaped some of my thinking on this topic. The key is to not treat this as a conflict. The most important bit that facilitates this is to make thoughtful choices behind the problem definitions that we pick as researchers. Problems that will satisfy our intellectual curiosity, but will also justify our corporate funding.
Starting point is 00:11:35 Flash Fill was the most important turning point in my career. I went from instead of searching for the hardest problem I can solve, to searching for the simplest problem that will have the most impact. So I imagine program synthesis technology applies or will apply in many other cases besides Excel. So what other applications have you developed the capabilities of program synthesis for? So we have developed program synthesis capabilities for a variety of map transformations. I already talked about string transformations, number transformations, date transformations. We have also developed these capabilities
Starting point is 00:12:28 for lookup-based transformations and column-splitting transformations. Another area that we are heavily invested in is in the space of filter-based transformations and specifically for file ingestion. Normally, you may spend a few hours writing parsers to extract this information, but programming by example experiences allow you to simply specify one to two examples of various
Starting point is 00:12:54 fields in the output table, and the parsers can be automatically synthesized. Another related domain that we have looked at is that of reshaping tables in semi-structured spreadsheets. Turns out that 50% Excel spreadsheets are semi-structured, meaning the data is logged into some ad hoc format. While this is easy to visualize, it becomes tricky to analyze it. We can enable reshaping of these semi-structured tables using an example-based experience where the user can simply specify a few example tuples in the intended output table. So all of these capabilities come under the broad umbrella of so-called data wrangling, which is the task of transforming data from one semi-structured format to another to facilitate further downstream processing. Data scientists apparently spend 80% of their time wrangling data, bringing it into a form
Starting point is 00:13:48 that they can then build ML models over. Program synthesis can facilitate easier and faster data wrangling. Another domain that we have developed some synthesis capabilities for is that of repetitive editing inside documents, such as Word, PowerPoint, or even code. We are also investigating development of program synthesis from natural language capabilities for several domains, including querying tables, visualizations, and even machine learning workflows. So the scope for program synthesis is quite broad, simply given that programming is so broad. Any task domain where you can describe your intent naturally is a potentially useful candidate for program synthesis.
Starting point is 00:14:30 Look into the future then. There are some exciting new developments in program synthesis that leverage recent advances in symbolic reasoning and machine learning. So tell us about these developments and where we are heading with these innovative threads of research and what you call predictive synthesis and modeless synthesis. The first prototype of FlashFill that I developed would take three to four examples on average per scenario. The Excel team told me that they cannot ship FlashFill until I made it work with one example on most simple cases. No pressure, though. Otherwise, otherwise users would lose trust in the system or would make fun, you know, like the tweet that I showed you. Recently, I was challenged to do even better. Now you might wonder how much better can we be
Starting point is 00:15:14 than one example? I can't even imagine. Well, how about zero examples? So initially when this was proposed to me, I thought this was a crazy idea. How can you read the mind of the user from zero examples? That's what I was going to say, is you're reading my mind now. Just go into my brain and decide what I'm thinking. But then when I thought more about it, it just started to make sense for some domains. Consider, for instance, the task of extracting tabular data from a custom text file or a web page. Sure enough, giving one to two examples of each field is a much better experience than having to write the parts
Starting point is 00:15:49 yourself. But if there are tens of fields, then giving one to two examples of each field can itself be a tedious task. But when I show such a semi-structured document to a human, they can often guess the underlying tabular structure. So I thought, why can't machine do that? So this is what we call predictive synthesis. The idea here is to synthesize an intended program from several inputs as opposed to few input-output examples. And we enable this by learning or synthesizing a structure within the inputs. We have developed predictive synthesis capabilities for a few domains, including splitting a column into multiple columns or extracting tables from text files, web pages,
Starting point is 00:16:31 and even PDF documents. Our predictive synthesis capability for extracting tables from text files ships as part of SQL Server Management Studio, where it is used to power the flat file import wizard. And apparently now this is the most used wizard across all of SSMS. We also recently shipped our predictive synthesis capability for extracting tables from PDF documents and from web pages as part of Power BI. Another new development is what we call modeless synthesis. There are many mundane repetitive tasks that users may not even naturally think of as something that can be programmed. For instance, converting all red text to green in a PowerPoint slide deck, or all dates in US format to European format in a Word document.
Starting point is 00:17:19 And this is where there's a need for a personalized agent that can non-intrusively watch, learn, and make suggestions. So recently we shipped an agent in preview mode in Visual Studio that looks out for any repetitive code edits. When it identifies a couple of code transformations that can be related, it generalizes them into a more generalized script and uses that to proactively suggest all other places where I might need to make that edit. We've talked about the inner workings of program synthesis, but I'm really interested to know what kinds of form factors might be useful for different applications of the technology.
Starting point is 00:17:55 So give us an example of where and how someone might make use of program synthesis and on what kind of device. Most of the use cases that I've talked about have been for end users to help them automate their tasks. One big leap that we are now investing in is to synthesize a readable, editable program in a programming language of choice.
Starting point is 00:18:18 This would be especially important when the synthesized program needs to be executed on big data or deployed for future executions. Hence, we have been investing in generating readable code in specific target languages like Python, R, PySpark, and involving use of specific libraries like Pandas. Besides facilitating transparency, this also provides a lot of educational value and most significantly would make it possible to incorporate the synthesized programs inside a developer
Starting point is 00:18:51 or data scientists existing workflow in IDs or notebooks I'm terribly excited about notebooks in particular program synthesis is good about generating small fragments of code which matches the granularity of what users write in notebook cells. Moreover program synthesis can generate readable code in various target languages and using various libraries. And this shall address the challenge that notebook users will have around polyglot programming and discoverability issues around new libraries and SDKs.
Starting point is 00:19:24 And notebooks provide an ideal platform for that interactivity. So cross-disciplinary research is one of four big bets you lay out in a blog post from last year, Sumit. And your own rather prolific publication record in program synthesis spans a surprisingly diverse array of computer science conferences. And you've talked already a little bit about diversity and multidisciplinarity, but I want you to go a little further here and talk about why it's important in research, particularly as it relates to program synthesis. This is where the Indian adage, one plus one equals 11, that refers to the sum being more than its parts, comes true. Now, program synthesis
Starting point is 00:20:06 specifically is a highly interdisciplinary topic. We need serious cross-disciplinary innovation to build useful technologies and usable experiences over those technologies. This has led to several publications in programming languages and software engineering conferences like POPL, PLDI, OOPSLA, and ICSI. Then we use machine learning to learn various heuristics that are difficult to author by hand and maintain, and this has led to publications in conferences like ICML and ICLR. Since all of this work falls under the broad space of AI, we have several publications in AAAI and HCI as well. Some killer applications
Starting point is 00:20:46 have been in the space of data wrangling. And because of the domain relevance, we have published in data conferences like SIGMOD, VLDB, and KDD. And last but not the least, we have to pay attention to usability issues as well. And this has led to publications in HCI conferences like WIST and CHI. If you think about the graduate system that produces us, the emphasis is on creating an individualistic identity, preparing students to define hard problems of their own and to come up with solutions of their own, and they tend to become deep experts in a narrow area. So very siloed.
Starting point is 00:21:21 Exactly. And then the natural tendency is to continue exploration in that deep vertical. But I think it requires a different kind of thinking and leap to start appreciating the importance of other research areas and to figure out ways in which you can work together and hence achieve results that you would not have been able to produce by yourself. I think the goal of graduate education should be to make you expert in one narrow area, but also give you enough breadth so that you know what is the right tool and research area to be applied to solving a given problem or a sub-problem, so that you're not always looking at it with your own biased lens. Whenever we talk about the promises of innovative new technologies, we also have to talk about the perils. And this is the part of the podcast where I always ask what could possibly go wrong.
Starting point is 00:22:28 I do this because I want to know if there are things we should be aware of or thinking of or even concerned about when we contemplate building and using these highly complex systems you're talking about that users will have very little understanding of. So is there anything about your work, Sumit, that keeps you up at night? So now that program synthesis technologies are becoming mainstream, the increasing worry on my mind has been that of correctness. How does a user know that the program that has been synthesized is correct? Right.
Starting point is 00:23:00 Essentially, we need to start thinking about the debugging experiences in this new world of programming. And I think the key is going to be to regard program synthesis not as a one-shot process, but as an interactive conversation with the user. In fact, it turns out that when you do not commit to a program yourself, but you rather program by intent, we can actually enable some unique debugging experiences that are not going to be there in the standard programming world.
Starting point is 00:23:30 We can, for instance, synthesize multiple programs from a few examples. Each of those programs is consistent with these examples that the user provided. And run all these programs in parallel on the remaining test inputs. If they all produce the same result, it doesn't really matter which program you pick.
Starting point is 00:23:46 But if these programs generate different results on some test input, it is a sign of ambiguity in the user's intent on that test input. And we can surface that test input to the user and ask them to provide the correct output on that input. This is one of my favorite ideas, and we call it distinguishing inputs. You can liken this to active learning in the machine learning terminology. We published this idea in
Starting point is 00:24:12 ICSI 2010, and just a few hours ago, I learned that this work has been selected for ICSI 2020 as the most influential paper for the ICSI 2010 edition. I love stories. Tell us yours, Sumit. What was your path to computer science and how did you end up doing what you're doing right now at Microsoft Research? I finished my schooling in India and then appeared for the joint entrance examination for IITs. I managed an all-India rank of somewhere between 100 to 200 out of around a million students who
Starting point is 00:24:46 take this examination. My rank was sufficient to get me into computer science but not in the city that was closer to where my parents lived and hence I had to make do with enrolling in the electrical engineering program at IIT Kanpur. After taking the introductory programming course in my first year there I developed a love for the subject. The course instructor teased us that 2x2 matrices can be multiplied with less than 7 multiplications, but wouldn't tell us how unless I was in an advanced class meant only for computer science students. So I retook the IIT entrance examination and I managed a two-digit rank in 30s.
Starting point is 00:25:29 I argued with the university to promote me to the computer science department for the second year. Since the first year curriculum was same for students in all the departments, and I had managed to prove that I deserve to study computer science, but they wouldn't agree. So I decided to drop out and re-enter the university. Now actually there's a ruling that would not permit you to re-enter the
Starting point is 00:25:51 university like I did. So I had to repeat the first year and was forced to take the same classes again, but I got to study what I loved and that spurred me towards more excellence in the field and hence PhD was a natural choice. I was fortunate to get PhD admission offers from many top computer science universities. CMU called me and told me that the cost of living here is so low that I can even buy a house with my grad student salary. I wrote to my advisor George Nakula at UC Berkeley and asked him about a counter offer. He said, yes, the cost of living here is high, but this is because everyone in the world wants to live here. So now you make your choice. So that's how I landed up in Berkeley. And then after my PhD at Berkeley,
Starting point is 00:26:36 I had several academic offers for faculty positions, but MSR was an easy choice for two reasons. The sheer amount of cross-disciplinary talent under one roof. It is like many university departments put together. And the opportunity to leverage Microsoft's broad reach to customers to create real practical impact. Almost every researcher I've had in the booth has a side quest, I like to call it, or a personal passion. And I know from talking to you and from reading some of your personal blog posts that one of yours is inculcating human empathy as an important cultural attribute in the next generation. So circling back to your secret identity as a dot connector, tell us how you might bring those two together.
Starting point is 00:27:19 There are a few principles that are being increasingly embraced around the tech industry. Customer obsession, diverse workplace, and work-life balance. These principles relate to the three kinds of relationships that we have in our lives. With our customers whom we serve, with our work colleagues, and with our personal contacts. Empathy holds the key to understanding and nurturing these relationships. Kids are highly attuned to being empathic. I will tell you one story involving my child Sumay when he was in preschool. I wanted to teach him some conceptual content and in particular a math theorem.
Starting point is 00:28:07 I chose a theorem that relates odd numbers. Odd plus odd equals even. Now while it's a simple theorem, the student was a preschooler who had many more fun things to do. And this theorem was his first. The first evening, I made all kinds of diagrams. But unfortunately it didn't work. The next evening, I tried using all the various number-related toys in the house. Still no luck.
Starting point is 00:28:38 Finally, I realized I had to meet Sumay where he was and not push to a level that he wasn't ready for. The following evening, I told him a story. An odd number is like a group of guys that are all paired up, except one. And he's the lonely guy. When two odd numbers come together, then the two lonely guys can pair up with each other and become friends. And that's an even number. Suddenly I could see by the look on his face that he understood and got it. And now I experienced this firsthand when Sumay was busy programming his robot on his surface in the night. And then I tell him,
Starting point is 00:29:09 Sumay, it's time to go upstairs and sleep. And he's so engrossed, he doesn't respond. And then I raise my voice and ask, can you please go upstairs now? Again, no response. And then I tell Sumay in a sad voice, I'm going upstairs, but I will be lonely in the odd house up there.
Starting point is 00:29:27 Can you join me and make it even? And then he immediately comes to hold my hand and says, Yes papa, let's go up. That's bliss. Children are naturally terrified of being abandoned or cast out and hence are very tuned into feelings of either rejection or belonging. Unfortunately, our ability to empathize gathers dust as we grow up. Hence, we need to reinforce this in our school education and corporate trainings. And you know, Gretchen, at the end
Starting point is 00:29:57 of the day, the goal of technology should be to make us more humane. At the end of every podcast, I give my guests a chance to share some advice or wisdom with our listeners. And I think you kind of just did that. So I'm going to move over into the predict the future lane. What's on the horizon for the future of programming summit? And what might be a call to action for our listeners? If you look at the history of programming, we went from punched cards and assembly language to high-level languages and beautiful code editors. The next evolution will take programming closer to human conversation,
Starting point is 00:30:36 involving use of examples and natural language to express intent. And interactive, wherein computer will act as your peer programming agent. My team is invested in developing an SDK that can facilitate development of program synthesizers for new task domains. I hope that with frameworks like these, we can take the art and science of developing program synthesizers from Microsoft research laboratories to developers who are involved with writing libraries or programming tools. The grand question now is, what does this future mean for society?
Starting point is 00:31:13 Programming in the future would be much easier and accessible than it is today. Hence, there should be even higher incentive to incorporate so-called computational thinking in the school curriculum. And this can empower people to effectively leverage the computational devices to unleash their creativity and hence achieve more in their lives. Sumit Gulwani, thank you so much for joining us on the podcast today. It's been an absolute delight. Same here, Gretchen. Thanks for having me here.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.