Microsoft Research Podcast - 099 - Program synthesis and the art of programming by intent with Dr. Sumit Gulwani
Episode Date: November 20, 2019Dr. Sumit Gulwani is a programmer’s programmer. Literally. A Partner Research Manager in the Program Synthesis, or PROSE, group at Microsoft Research, Dr. Gulwani is a leading researcher in program ...synthesis and the inventor of many intent-understanding, programming-by-example and programming-by-natural language technologies – aka, the automation of “what I meant to do and wanted to do, but my computer wouldn’t let me” tasks. Today, Dr. Gulwani gives us an overview of the exciting “now” and promising future of program synthesis; reveals some fascinating new applications and technical advances; tells us the story behind the creation of Excel’s popular Flash Fill feature (and how a Flash Fill Fail elicited a viral tweet that paved the way for new domain investments); and shares a heartwarming story of how human empathy facilitated an “ah-ha math moment” in the life of a child, and what that might mean to computer scientists, educators and even tech companies in the future. https://www.microsoft.com/research
Transcript
Discussion (0)
99% of people who use computers do not know programming, and they get stuck with repetitive,
tedious tasks.
But these are quite creative people, and it is just that programming as it exists today
creates an artificial barrier to program computers.
Program synthesis can allow these end users to express themselves naturally and program
computers as easily as they would command a personal assistant.
You're listening to the Microsoft Research Podcast,
a show that brings you closer to the cutting edge of technology research and
the scientists behind it.
I'm your host, Gretchen Huizenga.
Dr. Sumit Gulwani is a programmer's programmer, literally.
A partner research manager in the program synthesis or pros group at Microsoft Research,
Dr. Gulwani is a leading researcher in program synthesis and the inventor of many intent understanding,
programming by example, and programming by natural language technologies. Today, Dr. Gulwani gives us an overview of the exciting now and the promising future of program
synthesis, reveals some fascinating new applications and technical advances, tells us the story behind
the creation of Excel's popular Flash Fill feature and how a Flash Fill fail elicited a viral tweet
that paved the way for new domain investments,
and shares a heartwarming story of how human empathy
facilitated an aha math moment in the life of a child
and what that might mean to computer scientists,
educators, and even tech companies in the future.
That and much more on this episode of the Microsoft Research Podcast.
Sumit Gulwani, welcome to the podcast.
Thanks, Gretchen. It's great to be here.
So I like to situate guests at the beginning, and you're a partner research manager in the
Program Synthesis or PROSE group at Microsoft Research, and you describe yourself as a scientist seeking
connections between ideas, between research and practice, and with people in varied roles. So in
other words, you're a dot connector. In broad strokes, tell us what gets you up in the morning.
What X are you and the other members of your group trying to solve for?
It is simply the opportunity to work with the fantastic team that I am part of.
We have rich experts in programming languages, formal methods, software engineering, machine learning, and even human-computer interaction.
We have researchers and engineers working closely with program managers
and user experience designers.
I'm also terribly excited about the charter of this team,
which is to advance the state of the art in program synthesis
and to deliver these innovations as magical experiences
across a wide range of Microsoft products.
Well, we have a lot of ground to cover today on the topic of program synthesis,
so let's set the stage and operationalize the term.
What is program synthesis, and why is it significant?
Program synthesis is any capability to automatically generate programs
or code fragments from users' intent expressed in some natural form like
input-output examples, demonstrations, partial programs or even keywords or
natural language. This has the potential to facilitate two disruptions. First,
creation of 100x more programmers because 99% of people who use computers do not know programming
and they get stuck with repetitive tedious tasks. But these are quite creative people
and it is just that programming as it exists today creates an artificial barrier to program
computers. Program synthesis can allow these end users to express
themselves naturally and program computers as easily as they would
command a personal assistant. The second disruption relates to improving the
productivity of existing developers and data scientists by 10 to 100x in many
task domains. Most of the code that these folks are writing is simply boilerplate code.
And there is very little algorithmic creativity involved.
Program synthesis can liberate programmers from having to focus on boring details
and enable them to focus on creative aspects of programming.
So let me drill in a little bit there on the claims of 100x and 10x to 100x increase in productivity and so on.
What data do you have to support the claims that this is going to make our productivity go way up?
We did some user studies related to accomplishing a given task. So what took Python programmers around 30 minutes to accomplish was doable
using our program synthesis technologies in less than a minute. And this was quite representative
of the tasks in the space of so-called data wrangling or data cleaning.
Right. So is this technology being used and you're seeing it in action and seeing these
increases or is it just on sort of research study end of things right now? So is this technology being used and you're seeing it in action and seeing these increases,
or is it just on sort of research study end of things right now?
In fact, it is very real. The first mass market deployment of the program synthesis technology was in the form of this feature called Flash Fill in Excel,
which has been in Excel since 2013. And since then, we have released similar experiences
based on example-based interaction across many different Microsoft products.
Well, let's zoom in and talk about Flash Fill, which you just mentioned.
And it synthesizes string transformation programs quickly
with minimal input-output examples, sometimes even just one.
Tell us about Flash Fill. What inspired the idea in the first place?
And tell us technically how you went about building in those efficiencies.
Ten years ago, I was flying from Frankfurt to Seattle,
and there was a lady sitting next to me in the airplane.
She was impressed to know that I have a PhD in computer science
and that I work for Microsoft Research.
So I ought to help her with science and that I work for Microsoft Research. So I
ought to help her with the task that she was struggling with. She opens up her laptop,
fires up Excel, shows me a column of names in the format, first name space last name,
and asks me how can she reformat it in the form last name comma first name. Now at that time,
I had no idea about the programming model underneath Excel, so I had
to excuse myself out of the situation.
After returning home, when I searched for a solution to that problem on Excel help forums,
it is then that I realized that there were many, many people who struggled with simple
repetitive tasks and would solicit the help of an expert on a health forum while communicating their intent
using a few input-output examples. A typical interaction would take place over a course of
several days. Now this inspired me to develop the Flash Fill system that can automate the role of
the expert on the health forum and bring down the interaction time from a few days to a few seconds. The key technical aspect of Flash Fill is the search strategy that it uses.
Instead of searching over a general purpose programming language,
we restrict the search to an appropriately designed domain specific language
that includes operators for string transformations such as regular expressions,
substring, concatenate, and limited form of conditionals
to allow for dealing with data in different formats.
Secondly, instead of blindly enumerating programs
over this underlying DSL,
we leverage logical properties of the operators in this DSL
to allow for a goal-directed search.
So essentially, we flow the input-output examples
down the grammar of the DSL,
and this identifies programs that are consistent with the examples. The third idea stems from the
observation that we also use several heuristics to guide the search over many choices that remain
even after logical reasoning has pruned most of the possibilities in the search tree.
Authoring and maintaining these heuristics has been an expensive proposition. So recently, after logical reasoning has pruned most of the possibilities in the search tree, authoring
and maintaining these heuristics has been an expensive proposition.
So recently, we have actually started using machine learning to learn these heuristics
from data that relates to past exploration of the search space over various benchmark
tasks.
Another defining aspect of Flash Fill, as you mentioned, is the fact that it can learn the user's intent from very few examples.
Right.
And it does this by ranking programs to pick a candidate from among the many programs that satisfy the user's examples.
We prefer programs that are smaller, simpler, use fewer constants, use smaller sized constants. We also examine the output generated by these programs
on the other test inputs that are available.
We prefer those programs that generate outputs that are similar and
look uniform.
For instance, suppose we have one program that generates all outputs that
are valid dates, and another program that generates mostly dates, but
also some gibberish stuff.
Right. Then we have an additional reason to prefer the first program. valid dates, and another program that generates mostly dates but also some gibberish stuff,
then we have an additional reason to prefer the first program.
Let's talk a bit about a scenario where Flash Fill didn't work. I'll call it a Flash Fill fail.
Tell us that story and then tell us what you learned as a researcher from that experience.
Flash Fill can automate a wide variety of string transformations. However, there are many transformation tasks that it cannot do,
such as number transformations or date transformations.
But the user experience is so inviting
that people invariably want to give it a try,
even on tasks that it was never meant for,
and then talk about it when it does not work.
So there was this recent tweet on October 2018
that made fun of Flash Fill and was re-shared
more than 10,000 times.
Twitter user Darren wrote, AI is going to take over the world, but look what Excel auto-populated
for me today.
Darren gave an example converting DEC to December and Flash Fill auto-completed JAN to Janember and OCT to October.
My team found it funny enough to print this tweet on a t-shirt. And now this t-shirt is my dress
shirt for my keynotes. The positive thing here is that this kind of feedback has been extremely
useful for us to decide what new domains to invest into for program synthesis. While we're on the topic of Flash, Phil,
I don't need to remind anyone that Microsoft Excel is one of the planet's most widely used
software programs, and by extension, as Ben Zorn points out, the spreadsheet is among the most
widely used programming languages. So we're firmly in product territory here with Excel,
and you're firmly planted in
research territory. How do you manage the balance between academic research and product engineering?
So by the way, Ben Zon has been a great mentor and collaborator and has shaped some of my thinking
on this topic. The key is to not treat this as a conflict. The most important bit that facilitates
this is to make thoughtful choices behind the problem definitions that we pick as researchers.
Problems that will satisfy our intellectual curiosity, but will also justify our corporate funding.
Flash Fill was the most important turning point in my career.
I went from instead of searching for the hardest problem I can solve, to searching for the
simplest problem that will have the most impact. So I imagine program synthesis technology applies or will apply in many other cases
besides Excel.
So what other applications have you developed the capabilities of program synthesis for?
So we have developed program synthesis capabilities for a variety of map transformations.
I already talked about string transformations, number transformations, date transformations.
We have also developed these capabilities
for lookup-based transformations
and column-splitting transformations.
Another area that we are heavily invested in
is in the space of filter-based transformations
and specifically for file ingestion.
Normally, you may spend a few hours
writing parsers to extract this information,
but programming by example experiences allow you to simply specify one to two examples of various
fields in the output table, and the parsers can be automatically synthesized. Another related domain
that we have looked at is that of reshaping tables in semi-structured spreadsheets. Turns out
that 50% Excel spreadsheets are semi-structured, meaning the data is logged into some ad hoc format.
While this is easy to visualize, it becomes tricky to analyze it. We can enable reshaping of these
semi-structured tables using an example-based experience where the user can simply specify a few example
tuples in the intended output table. So all of these capabilities come under the broad umbrella
of so-called data wrangling, which is the task of transforming data from one semi-structured format
to another to facilitate further downstream processing. Data scientists apparently spend 80% of their time wrangling data, bringing it into a form
that they can then build ML models over.
Program synthesis can facilitate easier and faster data wrangling.
Another domain that we have developed some synthesis capabilities for is that of repetitive
editing inside documents, such as Word, PowerPoint, or even code.
We are also investigating development of program synthesis from natural language capabilities for several domains, including querying tables, visualizations, and even machine learning workflows.
So the scope for program synthesis is quite broad, simply given that programming is so broad.
Any task domain where you can describe your intent
naturally is a potentially useful candidate for program synthesis.
Look into the future then. There are some exciting new developments in program synthesis
that leverage recent advances in symbolic reasoning and machine learning. So tell us
about these developments and where we are heading with these innovative threads of research and what you call predictive synthesis and modeless synthesis. The first prototype of FlashFill that I developed
would take three to four examples on average per scenario. The Excel team told me that they
cannot ship FlashFill until I made it work with one example on most simple cases. No pressure,
though. Otherwise, otherwise users
would lose trust in the system or would make fun, you know, like the tweet that I showed you.
Recently, I was challenged to do even better. Now you might wonder how much better can we be
than one example? I can't even imagine. Well, how about zero examples? So initially when this was
proposed to me, I thought this was a crazy idea.
How can you read the mind of the user from zero examples?
That's what I was going to say, is you're reading my mind now.
Just go into my brain and decide what I'm thinking.
But then when I thought more about it, it just started to make sense for some domains.
Consider, for instance, the task of extracting tabular data from a custom text file or a web page.
Sure enough, giving one to two examples of each field is a much better experience than having to write the parts
yourself. But if there are tens of fields, then giving one to two examples of each field
can itself be a tedious task. But when I show such a semi-structured document to a human,
they can often guess the underlying tabular structure. So I thought, why can't machine
do that?
So this is what we call predictive synthesis. The idea here is to synthesize an intended program from several inputs as opposed to few input-output examples.
And we enable this by learning or synthesizing a structure within the inputs.
We have developed predictive synthesis capabilities for a few domains, including
splitting a column into multiple columns or extracting tables from text files, web pages,
and even PDF documents. Our predictive synthesis capability for extracting tables from text files
ships as part of SQL Server Management Studio, where it is used to power the flat file import wizard. And apparently now this is the most used wizard across all of SSMS.
We also recently shipped our predictive synthesis capability for extracting tables from PDF
documents and from web pages as part of Power BI.
Another new development is what we call modeless synthesis.
There are many mundane repetitive tasks that users may not even naturally think of as something that can be programmed.
For instance, converting all red text to green in a PowerPoint slide deck,
or all dates in US format to European format in a Word document.
And this is where there's a need for a personalized agent that can non-intrusively watch,
learn, and make suggestions. So recently we shipped an agent in preview mode in Visual Studio
that looks out for any repetitive code edits. When it identifies a couple of code transformations
that can be related, it generalizes them into a more generalized script and uses that to
proactively suggest all other places
where I might need to make that edit.
We've talked about the inner workings of program synthesis, but I'm really interested to know
what kinds of form factors might be useful for different applications of the technology.
So give us an example of where and how someone might make use of program synthesis and on
what kind of device.
Most of the use cases that I've talked about have been for end users
to help them automate their tasks.
One big leap that we are now investing in
is to synthesize a readable,
editable program
in a programming language of choice.
This would be especially important
when the synthesized program
needs to be executed on big data
or deployed for future executions.
Hence, we have been investing in generating readable code in specific target languages
like Python, R, PySpark, and involving use of specific libraries like Pandas.
Besides facilitating transparency, this also provides a lot of educational value and most significantly would make
it possible to incorporate the synthesized programs inside a developer
or data scientists existing workflow in IDs or notebooks I'm terribly excited
about notebooks in particular program synthesis is good about generating small
fragments of code which matches the granularity of what users
write in notebook cells.
Moreover program synthesis can generate readable code in various target languages and using
various libraries.
And this shall address the challenge that notebook users will have around polyglot
programming and discoverability issues around new libraries and SDKs.
And notebooks provide an ideal platform for that interactivity.
So cross-disciplinary research is one of four big bets you lay out in a blog post from last year, Sumit.
And your own rather prolific publication record in program synthesis spans a surprisingly diverse array of computer science conferences.
And you've talked already a little
bit about diversity and multidisciplinarity, but I want you to go a little further here
and talk about why it's important in research, particularly as it relates to program synthesis.
This is where the Indian adage, one plus one equals 11, that refers to the sum being more
than its parts, comes true. Now, program synthesis
specifically is a highly interdisciplinary topic. We need serious cross-disciplinary innovation
to build useful technologies and usable experiences over those technologies.
This has led to several publications in programming languages and software engineering
conferences like POPL, PLDI, OOPSLA,
and ICSI. Then we use machine learning to learn various heuristics that are difficult to author
by hand and maintain, and this has led to publications in conferences like ICML and ICLR.
Since all of this work falls under the broad space of AI, we have several publications in
AAAI and HCI as well. Some killer applications
have been in the space of data wrangling. And because of the domain relevance, we have published
in data conferences like SIGMOD, VLDB, and KDD. And last but not the least, we have to pay attention
to usability issues as well. And this has led to publications in HCI conferences like WIST and CHI. If you think about the graduate system that produces us,
the emphasis is on creating an individualistic identity,
preparing students to define hard problems of their own
and to come up with solutions of their own,
and they tend to become deep experts in a narrow area.
So very siloed.
Exactly.
And then the natural tendency is to continue exploration in that deep vertical.
But I think it requires a different kind of thinking and leap to start appreciating the importance of other research areas
and to figure out ways in which you can work together and hence achieve results that you would not have been able to produce by yourself. I think the goal of graduate education should be to make you
expert in one narrow area, but also give you enough breadth so that you know what is the
right tool and research area to be applied to solving a given problem or a sub-problem,
so that you're not always looking at it with your own biased lens.
Whenever we talk about the promises of innovative new technologies, we also have to talk about the perils. And this is the part of the podcast where I always ask what could possibly go wrong.
I do this because I want to know if there are things we should be aware of or thinking of or
even concerned about when we contemplate building and using these highly complex systems you're
talking about that users will have very little understanding of. So is there anything about
your work, Sumit, that keeps you up at night?
So now that program synthesis technologies are becoming mainstream,
the increasing worry on my mind has been that of correctness.
How does a user know that the program that has been synthesized is correct?
Right.
Essentially, we need to start thinking about the debugging experiences
in this new
world of programming. And I think the key is going to be to regard program synthesis
not as a one-shot process, but as an interactive conversation with the user.
In fact, it turns out that when you do not commit to a program yourself, but you rather program by
intent, we can actually enable some unique debugging experiences
that are not going to be there in the standard programming
world.
We can, for instance, synthesize multiple programs
from a few examples.
Each of those programs is consistent with these examples
that the user provided.
And run all these programs in parallel on the remaining test
inputs.
If they all produce the same result,
it doesn't really matter which program you pick.
But if these programs generate different results
on some test input, it is a sign of ambiguity
in the user's intent on that test input.
And we can surface that test input to the user
and ask them to provide the correct output on that input.
This is one of my favorite ideas,
and we call it distinguishing inputs. You can
liken this to active learning in the machine learning terminology. We published this idea in
ICSI 2010, and just a few hours ago, I learned that this work has been selected for ICSI 2020
as the most influential paper for the ICSI 2010 edition. I love stories.
Tell us yours, Sumit.
What was your path to computer science and how did you end up doing what you're doing
right now at Microsoft Research?
I finished my schooling in India and then appeared for the joint entrance examination
for IITs.
I managed an all-India rank of somewhere between 100 to 200 out of around a million students who
take this examination. My rank was sufficient to get me into computer science but not in the city
that was closer to where my parents lived and hence I had to make do with enrolling in the
electrical engineering program at IIT Kanpur. After taking the introductory programming course
in my first year there I developed a love for the subject.
The course instructor teased us that 2x2 matrices can be multiplied with less than 7 multiplications,
but wouldn't tell us how unless I was in an advanced class meant only for computer science students.
So I retook the IIT entrance examination
and I managed a two-digit rank in 30s.
I argued with the university to promote me
to the computer science department for the second year.
Since the first year curriculum was same for students
in all the departments, and I had managed to prove
that I deserve to study computer science,
but they wouldn't agree.
So I decided to drop out
and re-enter the university. Now actually there's a ruling that would not permit you to re-enter the
university like I did. So I had to repeat the first year and was forced to take the same classes again,
but I got to study what I loved and that spurred me towards more excellence in the field and hence
PhD was a natural choice.
I was fortunate to get PhD admission offers from many top computer science universities.
CMU called me and told me that the cost of living here is so low that I can even buy a house with my grad student salary. I wrote to my advisor George Nakula at UC Berkeley and asked him about
a counter offer. He said, yes, the cost
of living here is high, but this is because everyone in the world wants to live here. So now
you make your choice. So that's how I landed up in Berkeley. And then after my PhD at Berkeley,
I had several academic offers for faculty positions, but MSR was an easy choice for two
reasons. The sheer amount of cross-disciplinary talent under one roof.
It is like many university departments put together. And the opportunity to leverage
Microsoft's broad reach to customers to create real practical impact.
Almost every researcher I've had in the booth has a side quest, I like to call it,
or a personal passion. And I know from talking to you and from reading some of your personal blog posts that one of yours is inculcating human empathy
as an important cultural attribute in the next generation. So circling back to your
secret identity as a dot connector, tell us how you might bring those two together.
There are a few principles that are being increasingly embraced around the tech industry.
Customer obsession, diverse workplace, and work-life balance.
These principles relate to the three kinds of relationships that we have in our lives.
With our customers whom we serve, with our work colleagues, and with our personal contacts.
Empathy holds the key to understanding
and nurturing these relationships. Kids are highly attuned to being empathic. I will tell
you one story involving my child Sumay when he was in preschool. I wanted to teach him some
conceptual content and in particular a math theorem.
I chose a theorem that relates odd numbers.
Odd plus odd equals even.
Now while it's a simple theorem, the student was a preschooler who had many more fun things to do.
And this theorem was his first.
The first evening, I made all kinds of diagrams.
But unfortunately it didn't work.
The next evening, I tried using all the various number-related toys in the house.
Still no luck.
Finally, I realized I had to meet Sumay where he was and not push to a level that he wasn't ready for.
The following evening, I told him a story.
An odd number is like a group of guys that are all paired up,
except one. And he's the lonely guy. When two odd numbers come together,
then the two lonely guys can pair up with each other and become friends. And that's an even number. Suddenly I could see by the look on his face that he understood and got it.
And now I experienced this firsthand when Sumay was busy programming his robot
on his surface in the night.
And then I tell him,
Sumay, it's time to go upstairs and sleep.
And he's so engrossed, he doesn't respond.
And then I raise my voice and ask,
can you please go upstairs now?
Again, no response.
And then I tell Sumay in a sad voice,
I'm going upstairs,
but I will be lonely in the odd house up there.
Can you join me and make it even?
And then he immediately comes to hold my hand and says,
Yes papa, let's go up.
That's bliss.
Children are naturally terrified of being abandoned or cast out
and hence are very tuned into feelings of either rejection or belonging.
Unfortunately, our ability to empathize gathers dust as we grow up. Hence, we need to reinforce
this in our school education and corporate trainings. And you know, Gretchen, at the end
of the day, the goal of technology should be to make us more humane. At the end of every podcast,
I give my guests a chance to share some advice or wisdom
with our listeners. And I think you kind of just did that. So I'm going to move over into the
predict the future lane. What's on the horizon for the future of programming summit? And what
might be a call to action for our listeners? If you look at the history of programming,
we went from punched cards and assembly language to high-level languages and beautiful code editors.
The next evolution will take programming
closer to human conversation,
involving use of examples and natural language
to express intent.
And interactive, wherein computer will act
as your peer programming agent.
My team is invested in developing an SDK that can facilitate development of program synthesizers
for new task domains. I hope that with frameworks like these, we can take the art and science of
developing program synthesizers from Microsoft research laboratories to developers who are involved with writing
libraries or programming tools. The grand question now is, what does this future mean for society?
Programming in the future would be much easier and accessible than it is today.
Hence, there should be even higher incentive to incorporate so-called computational thinking
in the school curriculum. And this can empower people to effectively leverage the computational devices
to unleash their creativity and hence achieve more in their lives.
Sumit Gulwani, thank you so much for joining us on the podcast today.
It's been an absolute delight.
Same here, Gretchen. Thanks for having me here.