Microsoft Research Podcast - 120 - Advancing Excel as a programming language with Andy Gordon and Simon Peyton Jones

Starting point is 00:00:00 I don't think I ever dreamt that we could offer something as exotic as Lambda. You could really write literally any program in Excel now. Suddenly it becomes computationally much more powerful. Welcome to the Microsoft Research Podcast. My name is Andy Gordon. The new Lambda function has been announced in Excel and I'm here to tell you about that with my colleague Simon Peyton-Jones. Simon, would you like to introduce yourself?

Starting point is 00:00:32 Yes, so I've been here at Microsoft Research for 22 years since 1998. I did my undergraduate degree at Cambridge, worked for a few years in a small company and then I worked at University College London and Glasgow University as a professor before moving to Microsoft. And my research interest has always been in functional programming, purely functional programming, as a radical and elegant attack on the entire enterprise of writing programs. But what about you, Andy? How did you get into functional programming and indeed Excel? Well, I was doing a PhD on lazy functional programming back in the late 80s, in fact, when you guys were starting Haskell. And my PhD was on input-output for Haskell.

Starting point is 00:01:15 And I was actually delighted. You personally invited me to sit on the Haskell committee to help standardise input-output using monads. And I joined Microsoft actually in 97. So I was one of the first employees at microsoft research um cambridge yeah that's right you narrowly predated me not just that i interviewed you that was a tough decision well you made a good call i think mostly. Yeah, so about my research.

Starting point is 00:01:49 Yeah, I started out in functional programming and I've done a bunch of other things in like security. And then about 10 years, I got into probabilistic programming for machine learning, which funnily enough led me to spreadsheets because we built a system to support writing probabilistic programs of data within spreadsheets. And at that point, I realized that to take things much further,

Starting point is 00:02:08 it would be good to join forces with your work with the Excel team. And in fact, you were involved with the Excel team pretty early on in your time at Microsoft. Do you want to tell us about that? Yeah, pretty early, because when I first joined Microsoft, I thought to myself, what can I do that would advance the course of functional programming within Microsoft? And then I thought, well, Excel is the world's most widely

Starting point is 00:02:31 used functional programming language. It's not a very powerful one, perhaps, but when you write a formula in a spreadsheet, you are writing in a purely functional language. So no side effects, you don't say, you know, equals print three plus seven, wouldn't make any sense. And moreover, it's extremely widely used. So there must be 100 times as many users of formulae in Excel as there are professional programmers in the entire planet. And so I was thinking, maybe we should think about what would it take to start from Excel, but to grow it using the North Star of mainstream functional programming languages, and try to see how much more powerful we could make it. Then I started talking to research colleagues like Alan Blackwell here at Cambridge and Margaret Burnett in Oregon. And we started to come up with some

Starting point is 00:03:14 ideas about defining functions as Excel. We had two principal things. The first was, Excel provides 600 odd built-in functions, but it doesn't provide you with a way to make a new function out of existing ones. So in every other programming language, you can define a procedure perhaps by writing some code, wrapping it up, giving it a name and some parameters, and then you can call it repeatedly. Not so in Excel. If you want to do that, you have to write your procedure in VBA or JavaScript or something. And that's kind of like crazy. We'd like you to be able to define new functions using the existing formula language. Every other programming language lets you do that. Why not Excel?

Starting point is 00:03:52 So we came up with a design for doing that. And we turned it into a research paper. It was published in ICFP in about 2002 and 2003. It's called User-Defined Functions in Excel. So then we started to realize that it wasn't just enough to have user-defined functions. They needed to be able to take structured data. It wasn't enough for them to take scalar data. And this is a part which you've been much more involved in since, the need for rich data in Excel, which sort of complements the need for user-defined functions.

Starting point is 00:04:19 So maybe you could tell us a bit about that. Yeah, I mean, you're right, Simon. I mean, until pretty recently, the only kind of data that you could have in Excel was text or numbers. That's all you could have sitting in cells. And so if you're trying to turn the formula language into a proper functional programming language, you really need a lot more structures like that. There's only so many functions you can write that just take two scalars and then produce another one as an output. There's not many of those. So for example, you need arrays. You mean as arguments and results of functions? Yes. You want to be able to process big pieces of data in one goal. And it's also important to be able to store them in cells. Within a sheet-defined function, you might want to have

Starting point is 00:04:57 a computation and have it sort of spread out within the grid and then return a whole array, say, as the result of a sheet defined function. So generally, we need first class arrays in the language. Yeah, we should pause a bit just to explain about what is a sheet defined function. The idea that we put in this paper for defining new functions in Excel was to say, imagine you could take a worksheet and nominate certain cells as the input and one cell as the output and give it a name. And then when you call a function with that name, when you call that function, it is as if you had created a fresh copy of that worksheet, filled in the input values, the arguments of the function in those input cells, calc the worksheet and taken the

Starting point is 00:05:34 result out. That was our model for defining a new function. Now, if you want to define a function that works over arrays, for example, sort this array or pick the smallest element of it, then you can't just take an array of a fixed size, like a three element array or a 30 element array. We'd like to have an array of arbitrary size, which presumably would then land in a single cell in the worksheet that defines the function. This is a great vision, and you wrote a paper about that. So how do things develop, Simon? Oh, well then, so having got the idea, I then would travel to Redmond, the Microsoft mothership, on a fairly regular basis, you know, two or three times a year. So I started to become

Starting point is 00:06:11 very familiar with the Excel team. And I discovered that they were very warm towards the idea. We had lots of interesting technical conversations about how Excel might work. But Excel is a very big product and it doesn't move quickly. It's like a sort of super tanker. So for a long time, I felt a bit like a fly bashing against the bow of the super tanker. And there was always a good reason why now is not a good time to do it. And no criticism for that. There were other priorities and there still are. But what actually happened in the end was that I got a bit discouraged and I went away for about 10 years. I forget what provoked me too, maybe it was our lab director, Andrew Blake, who said, you should really have another go at this.

Starting point is 00:06:49 But it turned out it matched up with their priorities. So we then started a much more active partnership between Microsoft Research Cambridge and Excel on this whole idea of sheet defined functions and data. But things came out in a very different order than I anticipated. I'd started on the sheet-defined function idea, but in fact, the first thing that happened was actually to do with data types and rich structured data rather than functions. In particular, the first thing that came out as a change to the product was called dynamic arrays, and you were quite involved in that, weren't you?

Starting point is 00:07:19 Tell us a bit about it. So dynamic arrays are really interesting because if you go way back, you discover that the formula language in Excel could in fact construct arrays. You could refer to bits of grid, a range, and then pull that into a formula and compute with it. But what you couldn't do was return the whole array in one go. You could do things like add two arrays together or add one to every element in an array. You could treat arrays as first class values within a formula. Right and so you could lift formulas over whole arrays so like plus one and so on or you could lift over the whole array and

Starting point is 00:07:56 you could get the result but you could only have a scalar in a cell and so it was very awkward to return the whole array. There was a notion of a thing called array formulas. And if you were on Windows, the way you entered them was to do control shift enter. I think after you'd, you would type the formula in and then you had to remember to go control shift enter. It was very complicated. You would select the range, you know, multiple cells that were to hold the array. You need to know in advance how big it was. And then you would type in the formula and end it with control shift enter. And then that one formula would define the whole array.

Starting point is 00:08:29 So it was really cumbersome. It's sort of hilarious, really, when you think about it. Curly braces would appear around the formula. So it was this advanced feature. And there were a few people who knew how to use it. In fact, there was a person who actually wrote a book called Control Shift Enter, which was a sort of guidebook to how you could sort of use this feature. But it's very difficult to use. But now, I think since about 2018, we've had dynamic arrays. And this is an amazing feature.

Starting point is 00:08:55 You don't have a separate world of array formulas. Ordinary formulas can return arrays, and you don't have to decide in advance how big the array is going to be. Instead you just write the formula and if it computes an array then we say it spills out from the cell. Maybe you write a formula that you know is a three by three array and it will just spill out from you write the formula in the top left corner and then the rest of the array will fill out and then if you want to have another calculation somewhere else if the original array was in cell A1, then if you say A1 sharp, then that expression, that formula A1 sharp refers to the whole array. And then you can compute to it.

Starting point is 00:09:36 You can add that whole array to another one or take the sum of it or take the average of it. So there you don't need to know how big the array in A1 sharp is. And indeed, in different calc cycles, it might be of different sizes. Right. That's really important. The chain of formulae will work regardless of the sizes of the arrays involved, which may vary with the data. When dynamic arrays came out, we didn't have lambdas. So I think to everyone in the team, we knew that lambdas were the thing,

Starting point is 00:09:58 and combinators together with lambdas like map and reduce and fold and scan, these kinds of combinators that are familiar from functional languages like Haskell and others. We knew that those would be really important for programming with arrays, but when they first came out, we didn't have those. So we've been struggling a little bit to really get the full power of dynamic arrays. There's various tricks you can use

Starting point is 00:10:18 to construct dynamic arrays, but certain kind of common patterns are still quite hard to use, which changes with lambdas coming in. Dynamic arrays still produced a lot of common patterns are still quite hard to use, which changes with lambdas coming in. Dynamic arrays still produced a lot of excitement at the time because they really unlocked a huge amount of functionality. It was only previously available through this very arcane control shift enter mechanism. And then only if you knew the size of the array. So they're dynamic. They really work over variable size. It wasn't just arrays either. So arrays are a very

Starting point is 00:10:43 natural data type for Excel. But we also talked to our colleagues in the product team a lot about records. And that came out at a similar kind of time as well, didn't it? Yes. So these are known as entities in Excel, and you can get them from the data tab in the ribbon. And the funny thing is that they are records, but there isn't actually a function to create a record from scratch. Instead, the way it's been introduced is that Excel has got this intelligence feature that can recognise certain names of entities that occur in the workbook. So if you refer to the name of a company or if you refer to a name, say Cambridge, Excel can recognise Cambridge as a geographic entity,

Starting point is 00:11:24 like the city of Cambridge in the UK, where Simon and I are at the moment. And what Excel will do is go out to Bing's database, fetch facts about the city of Cambridge, wrap them up in a record, and then drop that into the cell. And if it's the name of a company, then it would also grab facts like the current stock price or the name of the CEO and drop that into the grid. You have to do something to make it do that, don't you? It doesn't just spot any time you say Cambridge, it doesn't go, yeah, there's some gesture. You go into the data ribbon. And actually, one of the really cool things about those entities is they're known as linked entities. It's not just a record, but they also have this link to where they came from, a sort of pointer to the web, if you like. So it means that if you open up a stock data type in a workbook, Excel will actually automatically

Starting point is 00:12:09 refresh properties like the price, and then it would recalc anything that depends on that. So these have become really popular as ways of building financial models or getting properties of geographies. But for a programming point of view, the interesting thing about this was the value in the cell was no longer a scalar. It really is a record. And you could programmatically say a1.price to extract the field. And that was the first time that kind of rich structured data of nested records was available. Yeah, I think that's the first time Excel had dot, actually, as an operator. So we now have like dot, like in many other programming languages with IntelliSense. So it kind of gives you an option for different fields that you might want actually as an operator so we now have like dot like in in many other programming languages with

Starting point is 00:12:45 intellisense so it kind of gives you an option for different fields that you might want to pull out of the record it was a funny dynamic about that for a programming languages person you'd say well we need an introduction form which lets you create records and an elimination form that lets you take them apart that's the dot but the way it came out in the product is we have the limb form without the intro form i think we will get an intro form but it was just a it was just a an interesting way in which working with a product group makes things come out in a in a different way probably a way that's more useful for users in the first instance that we would have done as academics i thought that dynamic was interesting yeah actually i mean stepping back slamming i think this is one of the really big things that i've learned from working with the Excel team.

Starting point is 00:13:25 I mean, we've been working really intensely on this the last five years, which is that the kind of people who use Excel, I mean, it's a really broad population. It's a super popular application. But the kind of people who use it are generally not programmers, people who really care about writing code for its own sake. Like most people who are sort of functional programmers, say Haskell programmers, are really passionate about programming. Whereas people who are using Excel, not always, but generally, they are really keen to get some other job done.

Starting point is 00:13:53 Like they're an accountant, they're someone working in finance, they're maybe a teacher tracking statistics about how their students are doing. And they're what are known as end user programmers. They are writing code, they're writing programs, but really for their own purposes. So when the Excel team thinks about how best to serve end user programmers, they often do things a little differently than if you were trying to produce a feature for a very sophisticated

Starting point is 00:14:16 programmer. Entities are a good example of that, where the Excel team realized that end users who maybe cared about stocks would just want a very convenient way to go from the name of a company to an entity that would automatically update the stock price, rather than having to have maybe a function called record that they would call and have to put in arguments to determine which entity to fetch. And I think that's been quite successful, that end users really being able to use these features. And they don't really know what a record is, but they can understand that particular entities matter for what they're trying to do. We also laid a lot of emphasis on compositionality. I think for the first time we had not just

Starting point is 00:14:52 data that wasn't a scalar, but data that was arbitrarily nested. So a record for a country might contain a field that is the states of that country. And that's an array, an array of states. So it's a record that contains an array, and that array contain states of that country. And that's an array, an array of states. So it's a record that contains an array, and that array contain entities themselves that are described as records about the state. So you get the sort of compositional, deeply nested form of structures that we're familiar with in programming languages, but which is really not in Excel at all until this point. Yeah, it's really rich. For example, Microsoft has done this amazing deal with Wolfram, the company that does the Wolfram Alpha knowledge base.

Starting point is 00:15:27 And so Microsoft is giving access to a huge range of data types from Wolfram, like chemical compounds and stars and different food items with their calories and so forth. So there's a great repertoire of data and a lot of it is nested in the way that you're talking about. We started to go further than that even. After we got arrays and records, those are like two data types, you might think, well, what if we want more? So we started to think, what would it mean to make Excel extensible with respect to its data types? Maybe you could walk up to it as a third party developer, now not as an end user, but as a real program, and plug in something that adds another data type to Excel, let's say images, right? So you might have an abstract data type of images with operations you provide that can overlay them or blur them or rotate them. And then you have a little functional programming language whose data types are images. And Excel serves as a sort of

Starting point is 00:16:19 orchestration language that lets you write formally that connect it. But the new data type is provided as a third-party plugin. That's not available yet. It's part of our, you know, your and my vision. And I think the team in Redmond take it pretty seriously. And we've designed the implementation to accommodate that. But we're starting with some fixed data types. But the idea that ultimately we could turn Excel

Starting point is 00:16:39 into an extensible platform for arbitrary data apps that do really quite big, remarkable things, you know, financial contracts, say, is quite an exciting one, I think. Yeah, that's going to be really cool, because we've seen that customers often have used things like previous extensibility features like VBA to, in a way, sort of hack up some of these features for representing things like financial contracts or financial instruments, and sort of simulate things like arrays and records using strings and various other tricks like that. But once this feature comes out, it'll be possible for third-party developers to sort of really enrich the product with new data types. It'll be extremely cool.

Starting point is 00:17:29 So we talked quite a bit about the way in which our partnership has influenced the product, but the reverse has also happened. In fact, we've used the credibility with the lab of this very effective product group partnership to justify growing our research program into a group that we call the Calc Intelligence Group, which you run now. Maybe you could just tell us a few examples of the broader research-based things we've been doing on the back of all of this. Yeah, we've done a range of different things, and we've got papers out on different topics. So one example is what we call elastic SDFs.

Starting point is 00:17:56 So Simon, you'd explain the idea of sheet-defined functions, that we can describe a calculation by a piece of grid with some of the cells being inputs and some of them being outputs and i think one thing in your original paper that you didn't cover was what would happen if one of the inputs was actually a range was like an array of items and this is one of the cool things that we've worked on together is like what if the example that you gave was maybe a row of like three items but you it to generalize to a sheet defined function that wouldn't work just on a row of three items,

Starting point is 00:18:28 but maybe would work on a row of five items or any sort of array input size. And very often we can do a sort of analysis of the formulas. And if they are processing those three inputs in a uniform way, typically it would be exactly the same formulas applied to each of them, a sort of parallel computation on the three inputs, and maybe with aggregates at the end

Starting point is 00:18:49 where maybe you do some calculations and then you sum the results, then that calculation can be sort of stretched, it can be elasticised is what we say, to cover maybe an input of size five. So we wrote this paper that I think is really cool that figured out a theory for when it is possible to generalise an example calculation of the grid that has fixed sizes to arbitrary sizes. And then even better, we worked with HCI experts who are in the Calc Intel team to do a user study to ask, well, how do people actually find this in practice? And we did a study and found that they

Starting point is 00:19:26 loved it, that it was a very natural way to program sheet-defined functions. So that's a paper I'm really proud of. It was a funny paper, though, wasn't it? Because it straddles everything from user studies right the way through to a proof of most, you know, principal generalization. But we found it quite difficult to publish. We did. We shouldn't be bitter. Let's not be bitter, Simon. We're not bitter at all. We're not bitter. But in our field, the kind of three major programming language at conferences, ICFP, POPL and PLDI. And on this one, we got the clean sweep. We were rejected by ICFP, rejected by POPL

Starting point is 00:20:01 and rejected by PLDI. Bravo! That's right. It was a great paper. It is now published in the Journal of Functional Programming. We're laughing now. It's a funny story, but there's a thing behind it, which is that it is interdisciplinary work, right? It covers the stretch from programming languages theory through to HCI. That means it doesn't actually fit in any of the existing buckets very neatly. So that can be a challenge when you're

Starting point is 00:20:28 doing interdisciplinary research. I think it is a shame that these programming language conferences have a hard time, you know, accepting papers that use HCI methods. I'd love it if we could start to have maybe tracks of HCI research at these conferences or co-locate with HCI conferences. Yeah, but after all, what's the purpose of a programming language? It is the user interface for the computer, isn't it? The measure of whether a programming language is any good is whether it helps people get programs to work correctly more quickly. And so it's an HCI problem, programming.

Starting point is 00:21:02 It totally is. And having worked a bit with HCI people, I'm really humbled, actually. I mean, there's some great HCI folks in our team. And I learned such a lot from talking to them. And the thing is, they really start from the users talking to people and asking them, what are the sort of problems you're having that might need some new features? And then they try and build features and then compare them. In the SDF paper that I've just been talking about, we had two different ways of processing arrays. And then the study was actually a comparison of the two. And I've really come to respect HCI people for the kind of methods they do to actually get rigorous results when working with people.

Starting point is 00:21:39 I think it'd be really great if we could have more co-location of HCI research conferences with more kind of technical PL conferences because I think people would learn from each other in ways it just doesn't happen at the moment when they're separated out at different conferences. So Calcview is another example of such a project isn't it about sort of HEI aspects of programming you want to just tell us a bit about Calcview what is it it? CalcView, yeah, we actually published this at an HCI conference called VLHCC. So it's a sort of radically different view of the grid. We have a spreadsheet grid, but then we have this second representation of it, a bit like a programming language.

Starting point is 00:22:16 So we use a textual notation, and we would literally have the text like maybe A1 equals 2, and then maybe A2 equals a1 plus 1 you'd actually see formulas like that equations like that sort of visibly as if it was like a program a textual program like a textual program kind of thing exactly and so the the idea of comp view is that you have uh you have the grid and then to the right of it you have this textual view and certain things are really easy to do in the textual view. Like, for example, we have this nice notation for a formula that has been copied many times. You know, so if the same formula is copied from A1 to A100, you could literally say A2 colon A100 equals A1 plus 1.

Starting point is 00:23:01 And so that's a formula that would basically increment a counter. But textually, it's a very short program, a bit like a loop. And if you want to sort of change it, you can just change the one copy of that formula, and then every copy in the grid gets updated. And so we did a user study. And again, we found that people were more effective at certain tasks using CalcView. But what I thought was interesting about it was that we're so used to the in programming we're so used to the textual view of a program and we execute it later and in Excel we're so used to just seeing the data and having it execute sort of online all the time you barely see the formulae and with CalcView we're trying to do both at the same time and show

Starting point is 00:23:39 you the data view continuously calced on one side of the screen and the textual view on the other side of the screen but and you can edit in either and it affects the other I thought that was a really interesting project. So Simon when you started working with the Excel team it wasn't really clear was it what the formula language was there wasn't really a sort of programming language style description of the Excel formulas. Yeah that's right when I first started I sort of looked at the formula language and thought hey how hard can this can this be? You know, it's just some formula. It's pretty simple syntax. And then the more I dug into what Excel actually does, it's really hard to find good descriptions vector of the values in a1 to a3, you're passing a reference to the range a1 to a3 as a first class value to a function that can do sort of introspective things like say, what is the row number of the last element? I didn't really

Starting point is 00:24:34 realize that at all. So it turns out there are quite a lot of wrinkles like this. So I started to write down a paper called the semantics of Excel that was meant to be a document that says, nevermind the implementation, this is the semantics of a formula. And it was split into two in the end, the semantics of a single formula, and then the semantics of calc, which is how is a whole spreadsheet full of formulas calced? And how do we make sure that it's always kept up to date and calced in the right order with the right cells being made dirty? That led to all sorts of interesting conversations, sometimes with the right cells being made dirty. That led to all sorts of interesting conversations, sometimes with the Excel engineers who implemented particular features, because

Starting point is 00:25:09 really only they knew the honest to goodness ground truth about this. Again, it was an interesting dialogue between somebody who's coming at it completely from a semantics of programming languages point of view and the engineers who'd actually built it. But there was no independent standalone description in that kind of way. And that semantics then led to, we then, oh, let's implement it. Let's implement a reference semantics for Excel

Starting point is 00:25:32 as I think then as an F sharp program. So that was meant to be, as it were, an Excel formula evaluator lookalike, a standalone completely separate from the Excel code base that implemented the reference model. Because after all that could be a useful standard loan entity. We thought that in an abstract way, and then you had the idea it wasn't just abstractly useful we might be able to concretely use it um that way that turned into calc.ts

Starting point is 00:25:53 didn't it yes so we did a hackathon about three years ago mid 2017 so we had this f-sharp implementation that was actually getting pretty complete for at least the formula language. We didn't have all the worksheet functions, but we could evaluate a lot of formulas. And we were starting to wonder, maybe we could take Excel formulas into other products. And so Excel and Word have got an add-in model based on JavaScript. So basically, we did this hackathon where we added Excel formulas to Word. It was a sort of grand hack. We had this code in F sharp and we wanted to turn it into JavaScript. So we used, oh, what was it called?

Starting point is 00:26:30 Is it WebSharper? Yes, WebSharper is this sort of transpiler that compiles F sharp code to JavaScript. And so we took this hunk of code and got it running in the hackathon. And then we made this silly video. And you were actually on the judging panel, but we won the MSR Cambridge Hackathon that summer. It was great fun. Basically, we made a silly video that made people laugh.

Starting point is 00:26:53 So that was great. But I mean, I think people liked the idea that we could take the formulas on a journey outside Excel. And then a really interesting thing happened that this vice president of engineering, a gentleman called Alesh Holacek, was visiting Cambridge, was looking at our work. And he has a view about what's happening right across the different parts of Excel. And there is a web version of Excel that is in the web browser. This is Excel online, right? Excel online. Yeah. Every time you go to

Starting point is 00:27:23 a web browser and wake up Excel, that's what you get. Yeah, and the way it works, or at least the way it worked until then, was that it actually opened the Excel workbook in Azure inside a virtual machine. And then the client, the browser, was sort of like a dumb client, if you like. It didn't do any calculation there. It did the UI. You could enter formulas and you could enter data, but everything was calculated in Azure. So what this... But if you entered a number, it would have to send a message back to a machine halfway across

Starting point is 00:27:52 the planet to add three and four and send the result back to the web browser. Exactly. So it felt a bit sluggish and there was a big push to make the product more usable. And so they decided they wanted to evaluate formulas in the browser. But unfortunately, the C++ code base that goes back to, I guess, 1985, when Excel started, it just wasn't practical to run that code base in the browser. And so Alesh said to us, you guys can help out. You guys need to go to Israel because the WebExcel team is based in Herzliya in Israel and they need a calc engine. So we had this amazing work trip out to Israel, met the guys and started collaborating. And we realized pretty early on that we couldn't ship the F-sharp code. So we did a big rewrite and we rewrote it in TypeScript and then started a big push to write lots of the worksheet functions in TypeScript.

Starting point is 00:28:47 And WebExcel, because it's a website, can move really fast. So the amazing thing was that like six months after the hackathon, the initial version of Calc.js went live and it was starting to speed up people's calculations because it was doing the calculations in the browser. We had to do a bit of research to really make this viable because not all the data in the cloud workbook ends up in the browser. So we had to have a calc engine that worked in the presence of partial data. And then we hit some precision problems because the original C++ code doesn't use IEEE arithmetic, believe it or not, because it was originally written before the IEEE. Predated IEEE arithmetic, believe it or not, because it was originally written before the IEEE. Predated IEEE arithmetic. This is like code that has been around since forever.

Starting point is 00:29:31 That's right. But although it wasn't possible to do the following for all of the C++ code, we could use WebAssembly to take the core of the mathematics part of the calc engine that did floating point arithmetic and it was in cbus plus and we were able to compile that to javascript using webassembly or using that tool chain first to webassembly then back to javascript and so now we've got great performance numbers so accuracy is like 99.95 we can evaluate more than 99 of formulas and a slightly goofy statistic is that there are so many customers of excel for the web that cartes saves them a total of seven years every single day because it speeds up the response you get without needing to wait for the server also it must be helping global warming

Starting point is 00:30:17 right if you're going to add three and four then that should be one machine instruction if you've got to send it across the planet to a data center that must must be like bazillions of machine instructions, you know, that's just being wasted. Going back to Excel as a product, that brings us sort of more or less up to date to the release of the Lambda function. Maybe you could say a little bit about what that is. I think we're all really super thrilled about having lambdas in Excel. So lambda is, it's the same lambdas that you have in programming languages. You know, it's an anonymous function. You can say double lambda x. Right now you'd write in Excel as lambda x comma x plus x. And then that formula returns a function that if you give it an x, doubles it. Same as in any functional programming language.

Starting point is 00:31:14 And this is going to be really great for users of Excel because it's going to make formulas more readable. Excel formulas are notorious because they tend to be really complicated if you're trying to do something clever. These notorious mega formulas that are really big things that are really hard to read, but are sort of squished down to a single cell. And Lambda will let you give names to parts of those.

Starting point is 00:31:37 And we also have Let that lets you give names to sub-expressions within a formula. So overall, it going to make expressions much more readable and also much more reusable. You can define a computation once and then you can call it from many places. And just like in other programming languages you get the great benefit that if there is something wrong with your initial definition or it was a preliminary definition for testing and you want to improve it and fix a bug, you just need to fix it in the one place, and then that will propagate to all the uses. So this is another funny thing, right? So when Margaret and Alan and I were initially thinking

Starting point is 00:32:14 about user-defined functions, we thought, well, we have to define it as a worksheet. I don't think I ever dreamt that we could offer something as exotic as Lambda, which sounded very geeky and programming language-y, and yet it serves the same function. You can define named abstractions. But because it doesn't require much UI, it's just another form of formula. It's been much easier to introduce as the first form of the product. I still hope that we'll get sheet-defined functions. But it turns out that Lambda, and this really is the full-on lexically scoped Lambda. You can define church numerals. You can do the whole thing. You can define the Y combinator. This is real lambda, is now, you know, part of Excel. And that's amazing, that makes the language too incomplete, because we can write the Y combinator, you could really write literally

Starting point is 00:32:53 any program in Excel now. But that's a qualitative shift, right? So suddenly, it becomes computationally much more powerful. And because we can name these lambdas, you can have functions that call functions that call functions, you can have functions that call functions that call functions. You can even write recursive functions. These named lambdas can call each other. Yes, I mean, an example I love, it's a very simple one,

Starting point is 00:33:13 but just reversing a string. Believe it or not, although Excel has had lots of functions for processing strings, you can't write a function before lambda anyway within the formula language that can reverse a string. But with lambdas, you could just write a nice little recursive function that does that. There's real demand for a bit of functionality like that, but it wasn't possible to do within

Starting point is 00:33:34 the formula language. I love the way this is sort of empowering users, right? Now users can write these new functions themselves, and in fact can write functions specific to their particular domain or area of expertise or work group, and then call those functions just as easily as the built-in ones. I think that's a huge, huge change. It's going to be really interesting to see how the kind of things people do

Starting point is 00:33:57 and how we can share them. I'm hoping there'll be a sort of open source community build up around Lambdas. Yeah, I really hope that there'll be a sort of third-party library, people who say, here's my library for doing operations on, I don't know, tensors or something. And we're going to need to provide better support for libraries and richer data types. But I think that'll come. And then people can produce libraries themselves. Yeah, it's really great. I mean, because we've been using Lambdas internally for a few years now,

Starting point is 00:34:22 and it's just a different kind of experience. It's live programming. You've got a grid, you've got data just right in front of you. It's really highly interpretive. You know, you can change one lambda one place and instantly all the changes percolate. You're saying that even as a programming language person, doing functional programming in Excel with lambdas and dynamic arrays feels quite different and more engaging than doing programming in, I don't know, Haskell or F sharp. Is that what you mean? Yeah, exactly. That actually leads me to just sort of, I guess, drawing towards the end, I was kind of reflecting on what have we learned from all this. One of the things that for me would be a very exciting outcome, if we can really make Excel into a language that doesn't have this sort of glass ceiling that prevents you getting beyond

Starting point is 00:35:03 a certain point that lets you write functions that can call functions that can call functions that can have data that has data that has data inside it then we could maybe imagine introducing programming to children for the first time through the medium of excel so the child's first programming language maybe alongside something like scratch which is wildly popular could be a spreadsheet initially very, you know, direct visceral array through just entering data and perhaps visualizing it with a chart, but then some simple data transformations using functions, and then the ability to wrap those functions up into functions of your own. It would bring the pieces of my life together, my functional programming life, my Microsoft life, and my education life. You could bring that all together into, we teach children programming using Excel for the first time.

Starting point is 00:35:48 I think that's amazing. I think that's a fantastic, exciting vision, Simon. Those are things you've worked on. I mean, it's really cool that those things coming together. But I actually really believe that the spreadsheet environment is a great place to learn because it's so live. You know, you make a little change and you can instantly see the results. It's a really sort of evocative way to program and to learn. Anything else that you feel you've learned about from this experience? There's two things. I mean, I've touched on HCI and programming languages. So I won't say so much about that, except that I see a lot of excitement now in that interplay. There's a few, you know, professors, I'm thinking people like Eleanor Glassman

Starting point is 00:36:25 or Amy Koh, professors who are on the intersection of programming languages and HCI and are doing really cool stuff. And there's summer schools on the topic. So I think that's really growing as a research area and I'm really excited to be part of that direction. And the other thing is, like I said at the very, or near the start, that before getting into spreadsheets, I'd spent quite a while looking at probabilistic programming, which is a way of expressing sort of Bayesian decision-making using code. And I sort of put that aside for a bit because I really, you know, I thought the way to achieve that, to sort of empower people to do Bayesian reasoning, to sort of use probabilistic reasoning to make good decisions.

Starting point is 00:37:05 I think the way to get there is to get it into spreadsheets. And so I think the kind of things that we've been talking about are a step in that direction. When I was doing probabilistic programming, that was really aimed at quite sophisticated users, people who are like data scientists. But I think we are seeing in society generally a really big need for the general public to understand probabilities and to understand uncertainty. I'm thinking about things like election forecasts. I'm thinking about the various uncertainties around COVID. I think to be sort of numerate about these uncertainties, you need to be numerate to deal with them properly, to make good decisions. And to be numerate, you need to take the uncertainty into account, which is usually

Starting point is 00:37:47 using probabilities. So I think that the kind of features that we're adding to Excel will make easier being able to visualize probabilities, to have charts that take probabilities into account, to sort of package up ways of visualising probability distributions inside lambdas so we can give textual descriptions of probabilities. People find, you know, a number like, you know, probability of 0.7 a little bit hard to understand. There's a big idea called natural frequencies where you basically remove the points and you can say what 0.7 means. And studies have shown that people respond much better to those kinds of natural frequencies, 70 out of 100, than probabilities like 0.7. So I think that the programming features we're adding to Excel

Starting point is 00:38:36 will allow people to build libraries that make it easier to construct, to say, represent distributions using natural frequencies and eventually to make charts that represent probabilities nicely. And then a bit further out, this is quite a long-term dream, the idea of actually using Excel to build probabilistic model of situations will then be possible given the kind of work we're doing on adding Lambda to Excel. So that's my long-term dream of where we're going to go with Landers and these other features in Excel. More generally, I'm quite excited about the possibilities of taking end user programming seriously, by which I mean that because Excel starts from being an end user tool,

Starting point is 00:39:19 that's what it has always been by end users, people who are trying to get some other job done, that if we can bring that constituency of end users on a journey with us to empower them, to give them the tools to express more and do more, you'll be quite excited to see how far we can go. And I think that functional programming using Excel is, you know, there's a really high ceiling there that could go a long way. But I think just as we were talking about with CalcView and your sense that the programming using this just felt different, I think it may take us on a journey that isn't altogether where we expect. We've already seen that in the dynamics that we've discussed between us and the product group. So I think the future is quite exciting there.

Starting point is 00:39:57 Thanks, Simon. It's been great talking with you this afternoon. We're so excited about this work and hope everyone listening is too. Do check out the Microsoft Research website for more information. Thanks for joining us on the podcast.

Microsoft Research Podcast - 120 - Advancing Excel as a programming language with Andy Gordon and Simon Peyton Jones

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.