CoRecursive: Coding Stories - Tech Talk: Idris, Proofs and Haskell with Edwin Brady

Starting point is 00:00:00 Hello, and welcome to Code Recursive, where we share the stories and the people behind the code. Today's episode is an FP interview. It focuses on functional programming, and I'm talking to one of the most interesting people in that space, Edwin Brady. Edwin is the creator of Idris, and he's just really great at explaining himself, and he has some super interesting ideas about where programming could go. If you've seen some of the tooling he's built around Idris, it's pretty magical. It seems like weird alien technology from the future, but it's here today. So let's get to the interview. I hope you enjoy it. Edwin Brady is the creator of the Idris programming language and the author of Type-Driven Development

Starting point is 00:00:40 with Idris. Edwin, welcome to the show. Good afternoon. Thanks for having me. So I have your book here in front of me. I really enjoy it. I have not gotten all the way through it, but I've definitely learned a lot. I thought before we really dig in, maybe we need to lay down some terminology. So what is a type? Ah, no, this is a surprisingly hard question to answer. So I think if you ask a hundred computer scientists, you'll get at least 100 answers. So I would say that a type is a way of classifying data. So it's a way of putting some kind of meaning to the data that you're working with.

Starting point is 00:01:19 But I think depending on the kind of context you're in, type could mean something different. So, you know, if you're the computer, then the type is a way of describing how data is laid out. And if you're a programming language, then a type is a way of making sure the data is interpreted consistently. But as far as a programmer is concerned, you know, what we're all interested in. So type is a way of describing the data we have. What then is a dependent type? So dependent type is, these days I'm preferring to use the phrase first class types. The reason being that if you, so when we talk about functional programming and what distinguishes it from, say, imperative object-oriented programming

Starting point is 00:02:06 or what additional features it gives us. We like to say that functions are a first-class part of the language. That is, you can assign functions to variables, you can pass functions to other functions, you can return functions, and so on. So when I say dependent types or first-class types, I mean the same kind of idea, but for types. That is, you can assign types to variables. You can write functions which compute types. You can pass types as arguments to functions. And so what makes this dependent is the fact that you can then use types or use values to more precisely describe the data that you're working with. So types can be computed from or depend on other bits of data. So the classic introductory example of this is, imagine you have a list and then you can make that more precise by saying, okay, I have a list where

Starting point is 00:03:03 all of the elements of the list are the same type. And then I can make that more precise by saying, okay, I have a list where all of the elements of the list are the same type. And then I can make that even more precise by saying I have a list where all the elements are of the same type, and it has some specific length. So list with length is the example of calculating the type from a number being the length. But that's really the essence of it. The fact that types become first class, and you can compute types from values, and to some extent, vice versa. So what is Idris? So Idris is a programming language that I've been working on, really, as a part of my research in programming language design and software correctness. So it's a purely functional programming language.

Starting point is 00:03:48 It takes a lot of its ideas from Haskell, largely because I quite like Haskell. And it's something I've been using to put into practice a lot of the research ideas that have been coming from all sorts of people over the last, I don't know, 20, 30 years in using programming languages to better express correctness of software. So there's all these wonderful ideas that have come out of various places, such as the Coq theorem provers that come out of France, the AGZA programming language that's come from a group in Chalmers in Sweden, and research dating back to Perlmutt and Lerf in the 70s.

Starting point is 00:04:38 And I think all of this stuff is wonderful, and it allows us to be very precise about the programs we want to write and very precise about the data we're working with. But it's really not something yet that mainstream programmers can pick up and run with. And one thing I would really like to do with Idris and possibly with most likely with successor systems is make these techniques available to practitioners and software developers. Because, you know, it's really cool and I want people to use it. What is a theorem prover? Aha. So when we talk...

Starting point is 00:05:12 Where to start? So theorem proving and programming are in some very real sense the same thing. There's something... When you write a program, let's say you write a program in some very real sense, the same thing. There's something – when you write a program, let's say you write a program that will stick with functional programming, so you write a program that takes an integer as an input and gives you an integer as an output, just something as simple as that. Then if you can write that program, then that is –

Starting point is 00:05:44 that program constitutes a proof that it's possible to take an int and give you back an int. So you can read it as the theorem, int implies int. Now, that as a theorem is not particularly exciting. Nevertheless, it is a theorem. It's a fairly trivial theorem. But as you start adding a little bit more precision into your type system and you start looking at the relationships between what is a type and what is a theorem, you start getting rather more interesting properties of your program.

Starting point is 00:06:16 So if I think of a nice simple example, say a simple theorem such as if A is true and B is true, then A is true. Okay? So A and B apply A. So in terms of a program, so this is just some logical proposition, simple logical proposition. In terms of a program, so this is just some logical proposition, simple logical proposition. In terms of a program, what is that?

Starting point is 00:06:47 Well, it's a program that takes a pair of an A and a B and returns an A. So this would be taking the first projection of a pair, for example. So as you start adding more interesting features into your types, so you start adding even more interesting features into your theorems, such as if you have a type that represents a sorted list and you have another type that represents an unsorted list. So if you can add that level of precision, then a program that goes from unsorted list to sorted list is with certain extra assumptions and properties added, you've got a function that is guaranteed to be a function that sorts a list.

Starting point is 00:07:37 So the question is, is that a proof or is it a program? Is it a proof that this algorithm sorts a list or is it a program that actually does sort a list? And the answer is, it's both. So, I mean, I started out this from, you know, I came into academia from having been a C++ programmer in industry for a couple of years. And I thought, okay, yeah. I basically went to a talk on theorem proving and decided this would be more fun. And so I went into the other direction and started doing the theorem proving and generating implementations of sorting algorithms from specifications.

Starting point is 00:08:19 And started thinking, well, what is the relationship between this C++ programming that I've been doing and this very abstract theorem proving that these people have been teaching me? So I think, well, there's a very close relationship. When people talk about this idea, you'll hear it referred to as the Curry-Howard correspondence. So it's a pretty well-known idea. It's been known for decades now. But it's not something that people typically think about when they're programming, but you very much are using this concept that programs are proofs. And when you say that Idris is Pac-Man complete, I think this is, is this a reference to the fact

Starting point is 00:09:06 that these other dependently typed languages don't actually run or? Not so much that. So people ask questions about programming languages like, is it Turing complete? And, you know, that's, people are worried about this sort of thing because, you-completeness means that you can solve all of the interesting computable problems. But I just don't think Turing-completeness itself is a very interesting property of a practical programming language.

Starting point is 00:09:37 For example, C++ templates are Turing-complete, but you probably wouldn't write Pacman using only C++ templates are Turing complete, but you probably wouldn't write Pac-Man using only C++ templates. Make files are probably Turing complete. White space, the programming language is Turing complete. I mean, all these things are Turing complete, but you couldn't write Pac-Man in them. Whereas it's not beyond the realms of possibility that you could take a language

Starting point is 00:10:03 that isn't quite Turing complete. That is, you can always guarantee that the program terminates in a certain finite time. And you could still write Pac-Man in it if you had the graphics libraries and the operating system interface libraries, provided that you had a suitable upper bound on the amount of the time the program is going to run. So I think it's a much more interesting property of a practical programming language that you could write Pac-Man in it than that you could solve any computable problem in it. Because what you, I mean, this is opening quite a can of worms, so I'll sort of gloss over some of the details. But theorem provers like COC or AGDA,

Starting point is 00:10:51 generally the ones we talk about in world-dependent type programming, they require programs to be total, which basically means that they need to cover all well-typed inputs and programs or functions need to produce a non-empty finite prefix of their output in finite time. That is to say, they need to do some useful work in finite time. And unless you do a few little tricks, this can mean that your language isn't Turing

Starting point is 00:11:32 complete. Because in a Turing complete programming language, you can't solve the halting problem in general because, well, you can't write the program that says whether any program, any given program with any given input is going to terminate. But in a theorem prover, everything has to terminate. So that kind of suggests that it can't possibly be Turing complete. But the question is, what kind of useful things can you do? Or rather, what useful things can't you do? And I haven't yet found a useful

Starting point is 00:12:05 thing I can't do by disallowing programs looping forever and not producing any output on the way. I'm not entirely convinced these programs exist. Something that merely makes my machine get hotter while it's running is probably not something that's going to be useful to me. Now, it does obviously make some programs a little bit harder to write because I have to produce some guarantees that this thing is going to produce some output eventually. On the other hand, the payoff is that I know for certain that this program is going to give me a result at some point. So, I mean, if nothing else, it's a useful technique for ruling out silly mistakes. Anyway, all of that is why I think Pac-Man completeness is more interesting.

Starting point is 00:12:49 It's because I can still write Pac-Man within these constraints. But if you've picked some arbitrary Turing complete thing, an amazing number of things are accidentally Turing complete, but you can't write Pac-Man in them. It's quite hard to be accidentally pac-man complete so i think by implication um idris is a total language is it always total or it's not always total so this is one of the um one of the decisions that i made quite early on is that um checking checking that something is total is important because it's a check that allows you to be certain of various properties of your programming. So remember early on, I was talking about programs and proofs being the same thing. And while this is true, in a Turing complete programming language, your proofs come under the assumption that your program is also total, or your program produces a result. So all your proofs, my int arrow int example,

Starting point is 00:13:52 it's really only a proof that an int implies int if this program terminates. If your language is total, then it really is a proof that int arrow int will return an int. So in Idris, it will always check whether something is total because if something, if you know something is total, if the machine tells you

Starting point is 00:14:12 that I'm convinced this is total, you can place a lot more trust in it than if it's not total. However, there is this goal of being accessible to the more mainstream software developers and bringing it closer to what people do in practice. And I think you have to take things a little bit slowly if you want people to adopt your ideas.

Starting point is 00:14:37 And there's a very nice concept I came across only quite recently. I think it's due to Steve Klabnick, who's very heavily involved in Rust. He calls it the language strangeness budget. So if you make a new programming language, you have a target audience. So in my case, it was kind of Haskell, ML, OCaml developers, to a lesser extent, Scala developers. So you pick your target audience, and then you decide how many things you're going to do that is away from what they're used to. So how many odd concepts you're going to introduce.

Starting point is 00:15:13 And those odd concepts have to have some, at least, well, I would say some clear value, maybe not some obvious value, but over time, those concepts become clear. You can only have so many of them. And my feeling at the time was, although I didn't know it under this name, my feeling at the time was that if I were to require totality as a default, rather than non-totality as a default, then I would have blown my language strangeness budget many times over. I was sort of feeling that first-class types is already spending that strangeness budget, and that's enough for people to deal with as a first step. So I think future versions or future languages that, you know, hopefully there will be future languages that are inspired by Idris.

Starting point is 00:15:55 Hopefully they will take this idea of totality much further as people get used to the idea. So I think once the tools are better for detecting where the programs are terminating, it's going to be just like these days you have the arguments between dynamic typing fans and static typing fans. Maybe in the future we'll have arguments between total language fans and non-total language fans. Who knows? I like this concept. I think it's a lovely concept. It explains so much about language design choices. Although there's a certain amount of, I don't know,

Starting point is 00:16:34 disappointingness, if that's even a word, because languages seem to take decades to reach popularity, and if it can only differ two steps in one direction... I think we're talking here popularity and if it can only differ two steps in one direction. I think we're talking here about mainstream languages. If you look in the academic

Starting point is 00:16:52 research community, the kinds of languages that people are coming up with. If you go to one of the academic conferences like Principles of Programming Languages, people tend to come up with new languages to demonstrate a point. They have all the exciting features, linear types or session types or total programs. And if the very best

Starting point is 00:17:15 ideas come out of these languages and then get adopted by mainstream languages, then as academics, I think we've won. And the same goes for Idris, really. I mean, I don't have the resources for Idris to be a mainstream language that's adopted by the world now. But if people were to take the ideas, like even just the totality checking and reporting errors or warnings if things are not total, and people put them in mainstream languages like Scala or even, you know, Java, C++, then we're definitely winning. So if you keep an eye on the academic community there's all sorts of interesting ideas coming out so is it possible to take uh first class types and and bring it to it to an existing language um so there i don't see why not um in principle there would be all kinds of interesting challenges to get them to interact with um uh with with the type system as it stands so you know sometimes i uh i see

Starting point is 00:18:15 discussions on i occasionally get tagged on discussions about uh extensions to the rust type system for example which would be pretty cool to get a bit more dependent types in there. But the difficulties, I suppose, are to do with mutation. So how do you make types which are known at compile time interact nicely with mutable variables? So something which is true about a value at the start of a function's execution might become not true after things are modified. So in Idris, we don't have a problem with that

Starting point is 00:18:56 because Idris is a purely functional language and things can't update. So you might have to introduce some restrictions on what you can do. But in principle, we've thought about side effects and dependent types. We've thought about interaction with the outside world and dependent types. So if we find the right restrictions for the type level language, then it's something that could be doable. So Idris being purely functional means that it's all about evaluation. So an Idris program, when we talk of what it means to be a first class type, we're only allowing things to evaluate. We're not allowing things to execute.

Starting point is 00:19:40 So in order to get for a purely functional program to execute in the outside world, you really need to think of it as an operating system executing the description of the actions that have been computed by the functional program. So if you start putting, let's say, hypothetically, you add first class types to Java and you start saying, okay, now everything that's a Java program can go in a type. Well, at that point, you're able to execute arbitrary IO actions while you're compiling the program, which is probably not a good idea. You get it wrong. You get a compile error. You actually delete your source code, that sort of thing.

Starting point is 00:20:21 So you need to be a little bit careful about the restrictions and you need to think carefully about how it interacts with subtyping, for example. But I'm sure it's doable. I really thought you were going to say the reverse, that it wasn't possible, that we would need a new language. Well, so maybe a new language is a better place to experiment with these ideas. So, you know, as languages evolve, they do tend to get very big. I mean, if you look at C++ now, or even Haskell now, it's not clear that everyone is using the same language when they say they're writing a Haskell program. Well, they're definitely not using the same language when they say they're writing a Haskell program.

Starting point is 00:20:59 So, I think it would make sense to, you know, rationalize the ideas into a new system. But then, of course, if you make a new language, people aren't going to use it because they're heavily invested in tools that already exist. So I don't know. I don't know how to solve that problem. I'm just here to throw out ideas and prototype implementations and see what people run with. Yeah, it's interesting. Like you mentioned Rust, which is quite a new language and they've managed to kind of, you know, coalesce a whole community

Starting point is 00:21:31 around themselves and, you know, seem to be in the process of rewriting everything a C++ developer would need using Rust. I don't really know how they've done it. No, I mean, Rust seems to be taking off and I should get around to learning it, to be honest, because there's all sorts of interesting ideas in there that I think we could pick up and try out.

Starting point is 00:21:53 Sorry, go ahead. an organization like Mozilla behind it, and you have this flagship system that you're developing, I think that's going to help to some extent. So, you know, in Idris, we don't have those resources. But maybe it helps. I don't really know. You mentioned Haskell. I saw a talk by, I think it's Stephanie Wyrick, about dependent Haskell. So she did a pretty cool demo with like a regex library. It was really neat, but then she showed me,

Starting point is 00:22:31 like I didn't understand how it was written at all, but using it was quite cool because you had the types of the regex there kind of at compile time. But my question is, so why do we need Idris? I guess if people can do this. Right. I think it comes back to this point about languages evolving and getting bigger and bigger over time.

Starting point is 00:22:58 And I think it's good to rationalize these extensions. So is there one single extension or small number of extensions that could capture all of the ideas that, that are in the many, many extensions that Haskell has. Now I think it's fantastic that all of these things are coming into Haskell because it's, it's,

Starting point is 00:23:18 it's, it's, I mean, it's, it's, it's, it's, it's,

Starting point is 00:23:21 it's, it's, it's, it's, it's, it's, it's, it's,

Starting point is 00:23:24 it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it sounds funny to me to call Haskell a wider mainstream audience, but I guess from where I'm sitting, it is. So the fact that these ideas that we talk about in the full dependent types world are being increasingly adopted by people, but not only in Haskell, but in Scala, thanks to the work of people like Miles Sabin. People are starting to see what we're doing in a context that they can use in their day-to-day life. So that's brilliant. But the question of why do we need this, I think it's always good to experiment with new ideas without the constraints of the decades of design decisions, language decisions that have come before. So if you're to go with Haskell, for example,

Starting point is 00:24:09 you are going to have to take lazy evaluation, for example. So that may or may not be a good thing. I think that's up to individuals to decide. You're going to have to deal with the fact that types and values are distinct things. So you don't have truly first-class types. So there may be plans to improve that. But you sort of have to work with the tools you've already got.

Starting point is 00:24:38 Whereas if you take a new language, you're totally free of that. You can make your own decisions. But I think just this idea of Idris is significantly simpler than Haskell. When you take the fact that you've got the first class types, you can do all of the fancy things that can happen in Haskell's type system, and you only have to learn one notation to do it. So I kind of feel that if I'm teaching Haskell, which I do to undergraduates here in St. Andrews, I'm not going to teach them pendant types.

Starting point is 00:25:10 I'm going to teach them basically more than Haskell 98, but the modern Haskell without any of the extensions. So I will teach them that, but I'm not going to teach them any of the really advanced features I will teach them that, but I'm not going to teach them any of the really advanced features just because they've got enough to deal with already while thinking about moving from Java, which they've done pretty much exclusively so far, moving on to Haskell. On the other hand, if I'm teaching Idris, I actually feel that I can move on to dependent types really early.

Starting point is 00:25:42 I can write, I mean, you see this in the book that in the, in, within the first chapter, I give an example of, of computing types and, you know, by chapter three, we're onto, you know, here's, here's a, here's a thing that just allows you to compute the length of a list in the type. And I don't really feel like I'm adding anything, any new language features to do that. It's just a natural thing that happens if you're able to compute with types. And that's really not the case in Haskell, in my opinion. Yeah, I agree.

Starting point is 00:26:12 It's pedagogically kind of much simpler. I don't understand all of these extensions that were turned on to do the dependent Haskell. And I have a feeling that she did. Right. Well, I mean, she's been heavily involved in creating them, which probably helps. And, you know, I've seen some of her talks,

Starting point is 00:26:32 and there's some fantastic stuff that they're doing there. So, yeah, I mean, it's great to see. But I think at some point there, I would say, needs to be a new language, which is not necessarily Idris. Well, probably not Idris, but some new language that takes these ideas, brings them all together, and has enough people behind it that it can become robust and production-ready. But we'll see. So one of my favorite things from this book is the interactive editing.

Starting point is 00:27:07 So that's something new to me. I wonder if you could describe it. Right. So this is, if you use IDEs for, which many people do, particularly if you're using a statically typed language, you're kind of used to the idea that writing a program is not just a case of you type a load of stuff and then you feed it to the Oracle and it says, well, it basically says no, doesn't it? It tells you why your program doesn't work. But if you're using an IDE, it's not just writing a program and feeding it to the system. You've got this interaction with giving suggestions for completing method names

Starting point is 00:27:46 and so on. So if you have a precise type system, that gives the machine more to work with. The machine should be able to help you write your program a lot more than if you just have a fairly simple type system. So in fact, I think this is a much more important, much more interesting thing about types than what people often say, which is about correctness and refactoring. So people always come up with this argument that if you have a statically typed language, you can be more confident that you've done the right thing and there will be fewer bugs. Well, I think in practice, we don't tend to see that because unit testing is a very useful thing. So far more interesting to me is this idea of interactive

Starting point is 00:28:31 editing. The fact that you can use what you've said about the program to help you end up at the right program with the machine's help. So the idea is write the type first, ask the machine to give you a small candidate definition that matches that type, and then just refine the program, get a few more details into the program according to what the machine might be suggesting. So to pick an example, we always go back to vectors. Let's say you're writing a program to... let's say you've got two vectors of the same length, two vectors of integers of the same length, and you want to add the corresponding values in each vector. So what do you do? You say, okay, well, if both of the vectors are empty, then the result is empty. If both of the vectors have a head element and a tail, then I will add the corresponding head elements

Starting point is 00:29:26 and I will add together everything in the tail. And there's no other cases. So the types tell me that there can not possibly be any other cases. So what I'll do is I'll write the type. I'll say, you know, VectIntN to VectIntN to VectIntN. So that both inputs are VectIntN and the output is VectIntN. And then I will say to the machine, tell me all of the possible values

Starting point is 00:29:49 of the first input. And somehow this seems odd to describe in words rather than on a screen, but we'll persist. So I say to the machine, tell me all of the things that that first input can have,

Starting point is 00:30:03 and it will split the definition, give me the two possibilities. Then I will say for one of those two possibilities, I will say, tell me all of the values that this could have. So in the first case, where the first input is empty, I will say, tell me all of the possibilities for the second input, and it will tell me only the empty input is possible. So already I've got something out of the interactive editing here, and I've got something out of the types. So I haven't had to write any of this definition other than say to the machine, you know, I'll tell the machine the steps we're going to take, and the machine will tell me what the program looks like. And then even then,

Starting point is 00:30:43 I can go a little bit further, and it'll say you know what is what is the output if both of the inputs are empty and it will tell me well the output you're looking for must be a vector of length zero there is only one vector of length zero so i say to the machine you tell me what the output will be and it will tell me so that you get more commands to by having a bit more information this time you get more commands for interacting with the machine. So not just things like, tell me what functions or what methods work with these input types, but tell me exactly what values can this input be? Which values are ruled out? What values can the output be?

Starting point is 00:31:21 Just search for the output for me. So it becomes much more of an interaction. It becomes much more of a conversation with the machine than programming typically is. And I think really in the end, if we're going to write programs that work, it needs to be an interactive process. It needs to be a conversation.

Starting point is 00:31:42 It's almost, to some extent, pair programming with the machine as your pair rather than just writing the program on your own. Now, that said, we've only just scratched the surface of what's possible here. So as types get more complicated, some engineering needs to be done.

Starting point is 00:32:00 So the Idris, as it stands, just gets a bit too slow once you have much larger programs. Still useful, but you tend to be waiting a few seconds for the reply rather than getting the reply instantly. So I think there's a lot of work we need to do here. But if we really start thinking about how can we get machines to support type-driven development that go way beyond what IDEs concurrently do, I think that's really an interesting topic to explore. We should do a lot more deeply.

Starting point is 00:32:28 So probably the key concept that I think all languages can adopt, or at least all statically typed languages can adopt, is this concept of holes. So a hole is a part of a program that you haven't written yet, but that is nevertheless syntactically valid. So in Idris, you would mark that as a question mark something. So it's a question mark, then a name. So a question mark foo.

Starting point is 00:32:53 And then I can ask the machine, what is the type of foo? And it will tell me what's available and what type needs to go there. So I might have some fairly complex program with some tricky manipulation to do. And I'm in the middle of a definition, and I just need to know the types of all the variables, and I need to know the types that goes in that hole. And if the machine can tell me that, then that really helps with development, because I very rarely have a program that doesn't compile as a result of this. So instead of writing, coming up with some plausible but wrong

Starting point is 00:33:24 implementation, what I typically do is write as much of the implementation as I'm sure I understand and leave a hole for the rest. It's basically just say to the machine, over to you, give me more information. And it almost gets to a point where, you know, people say type errors are really difficult to understand if you have a fancy type system. And, you know, it's not quite true, but you can almost say, what type errors? I don't get type errors because I write the part of the program I understand, and I ask the machine to tell me what I need. So rather than giving the program to the machine and the machine saying, you're wrong, it's say to the machine, tell me what I need here to be right. So of course, it's not true to say what type errors in practice, but you do tend to find that type errors aren't –

Starting point is 00:34:12 they don't come up so often when you – or you don't spend so much time scratching your head over them when you can just replace the wrong bit of a program with a hole and see what the machine's expecting. I think, yeah, it's odd to talk about just an audio form, but yeah, you've really turned the development model inside out, right? So rather than, you know, writing a program and then trying to see, oh, well, what type should this return?

Starting point is 00:34:39 The method here is, you know, write out the type, use the sort of built-in features to have this expanded. And so you're sort of, I guess that's the title, right? That's why it's type-driven development. And I'm assuming that's also why it says with Idris rather than using Idris, I'm guessing, right? Well, yeah, I mean, it was originally intended as just a book on Idris.

Starting point is 00:35:05 And then it was originally intended as as just a book on idris and then it was like okay um if if we're trying to sell this as a mainstream publisher nobody's heard of idris so what do what do we do to sort of bring out the uh bring out the details for uh for a wider audience so it was it was the the focus in the end became i mean even though it clearly it is a book about idris there's a lot more of the the process how you arrive at the program, not just how you write the program. So that's kind of why the book is full of all of these steps of type, define, refine, start with the program. Here are all the intermediate steps. There's kind of a tricky thing to do in writing rather than animation. Nevertheless, that is the key idea.

Starting point is 00:35:41 You can follow the ideas in other languages, too, but the tools aren't there so much. But I hope they will be. And the concept of the hole, if I can just describe it a little bit. So I could write a Java program and whenever I don't quite know what goes here, I could write throws new, not implemented exception. And the program will compile.

Starting point is 00:36:06 So in a way that's kind of like a hole except the holes so so sorry go ahead the difference is that that's a runtime failure rather than compile time information so uh you wouldn't get the information about what type what type needs to go in that hole um you would be able to run a partially written program, which is, of course, really useful. I do it all the time. I mean, I compile and run programs with holes all the time and just accept that the hole is going to crash. Fine.

Starting point is 00:36:35 But yeah, it's really about getting the information at compile time. It's like the compiler knows this information. Your Java compiler knows exactly what needs to go there because it has to know that for this checking. So it's like, why is it keeping the information to itself this is like a you know an obstructive co-worker that doesn't want you to get your uh get your program written um so you know it just i mean i i haven't done much development with uh with ids in java lately so for all i know this is this is something that ids do if If they don't, they should.

Starting point is 00:37:12 Yeah, it's certainly not as well as Idris does it, which is interesting. You know, I feel like the tooling, just using the examples in the book and the tooling in Atom, it's really quite a process. Like I found, you know, even like maybe the example doesn't have to be Java, but if I was writing Haskell and I just put in undefined, the type I'll get out will say it's bottom or something. It won't, at least the last time I used Haskell. So Haskell, the latest versions of Haskell, I forget exactly when this came in. So it was a couple of years ago anyway.

Starting point is 00:37:41 So you can now put an underscore in a program and it will tell you all of the information it has about that whole. A couple of years ago, anyway. So you can now put an underscore in a program, and it will tell you all of the information it has about that hole. So it'll give you a compile error, but it's a compile error that says, okay, this is what you have, this is what you need. So they call them typed holes, I think. And that's pretty much the same concept. It doesn't have the editing,

Starting point is 00:38:04 but it's basically telling you all the information it has. So really useful there. So this concept is kind of working its way out? Sure, yeah. Or? Yeah, it's an idea that I have the impression it's an idea that people like. And I mean, it's certainly not an idea I came up with. It's an idea that's been around for quite some time,

Starting point is 00:38:28 particularly in the theorem-proving community. So if you're working through, if you're doing a proof in Coq, for example, or any of the earlier systems, you're basically developing the proof interactively, and you have holes in the proof that you're filling in just by learning more information about the program so it's it's um i guess this is one of the ideas that's that's coming out of this uh theorem proving background but it's really useful for programming too makes sense yeah to me you know it seems like magic but it makes a lot of sense so the expression search you have a you have an example in the book where I think you do insertion sort, and I don't think you actually write anything.

Starting point is 00:39:09 It almost seems like you write the type and then expand it. Yeah, so insertion sort, you can't quite do it because you're not quite giving enough information about the program. So if I remember rightly, the example in the book is insertion sort on vectors. So what you get out is it's definitely the right length, but you have to put a little bit in yourself to get the things in the right order. But there is one in the book which is transposing a matrix. So n by n vector to n by n vector. And there's very little that you have to write there. So you kind of have to know roughly overall what you need to do. But I mean, genuinely, when I was coming up with that example, I was thinking, how on earth do you transpose a vector? This is a classic example of this.

Starting point is 00:39:54 How on earth do you do it? And in the end, I just thought, why don't I just ask the computer? Why am I even thinking about this? Let's ask the machine to do it. And it pretty much at each step tells you what to do. But actually, the Idris expression search is really embarrassingly simple. And there are much more sophisticated systems. So Agda is another dependent programming language. So I think it's more aimed towards the improving sides of the programming, but it's definitely still a programming language. There's a tool as part of that called Agsy, which goes way further.

Starting point is 00:40:30 It tells you more about pattern matching as well as the expression search. So there's a lot more we can do. I mean, really what Idris does is it just does a brute force search for the first thing it finds that matches the type. And if you give enough information, it's almost embarrassingly good how well it does, given how little effort has gone into it. In the book, you go through an example of printf. So does printf require dependent types to be done?

Starting point is 00:40:59 Yeah. So printf is almost the first example of a dependently typed program that a lot of people see. So, you know, you learn how to print something out in C. OK, well, the types of the arguments to printf depend on the string that's the first argument to printf. So they're computed at compile time. A C compiler will tell you if you've got it wrong, which is interesting because printf is not, as far as I'm aware, part of the C language standard. So the only way it could do that, C not being a dependently typed language, this is something that has to be hardcoded into the compiler. So the reason I picked that example is just because it's a fairly meaty example of what it means to do a calculation of types at compile time.

Starting point is 00:41:46 And it's an example that's familiar to a lot of programmers, at least people who have some kind of C background. It's nice to see how you can extend it with different types, for example. So it's really something that allows you to see what's going on in a concept you understand. And the dependent part is that based on the format string, then the type of the next parameters change, right? Exactly, yes. So you calculate the input type from the format string, and you calculate the output type

Starting point is 00:42:19 from basically the rest of the format string. And as soon as the format string is empty, the output type is just string. So we go through this example in Idris in the book, and the first thing that comes to my mind, of course, because in the Idris example, these secondary parameters are calculated based on the first input, but at compilation time. So how does this work if I'm feeding it a format string?

Starting point is 00:42:43 That's a great question. Yes, so this is essentially what's happening here at compile time is it's checking with the information it has. And at runtime, it doesn't have that information. So this is the basis of the problem and the question. So basically what it means is if you want it to compile based on a runtime format string, you're going to have to

Starting point is 00:43:05 check that format string at runtime. And the only way it's going to compile is if the machine is convinced that you have checked the format string to get the right input. So if you imagine you're doing this in C, what would you do if you were doing the same thing in C? Well, the user types in a format string, what do you expect the next arguments to print F to be in your program? I think you're struggling to get that right. So if you're allowing the user to type in what the form of the data is, you're going to have to check that data before you make any progress in the program. So it's almost going to be like if you maybe in C, maybe you would do some check that says that parses the format string at runtime

Starting point is 00:43:51 and says, okay, if I have percent S, percent D in that order and nothing else, then you're allowed to call this version of print F. So, I mean, I don't know exactly how you do it, but you'd have to do something similar in an Idris program, but it would be checked at compile time that you had done the appropriate runtime checks. So there is actually a simpler version of this that's a bit easier to think about,

Starting point is 00:44:15 but it would scale up Rintef, which is an example of, a fairly contrived example, but back to vectors again. So reading in, I'll take this vector of numbers example again. Why not? We've already talked about that. Let's say we're reading in two vectors of numbers from the user and we're feeding it to our carefully designed and guaranteed vectors of the same length addition function. So we asked the user to enter one vector, then another. And do we assume that

Starting point is 00:44:47 those vectors are the same length? Well, we can't. Users are not noted for typing in the input that we really need them to have. So the only way the machine is going to let me call the vector addition function is if I've checked that the inputs are the same length. And if they're not the same length, maybe I modify them somehow to become the same length. So by having more precise types, we're not freeing the programmer from having to do runtime checks, but we are freeing the programmer from having to do unnecessary runtime checks. And we are telling the programmer exactly where those runtime checks have to be. So if I read in a vector, that's going to be a vector of some length n. If I read in another vector, that's going to be a vector of some length m.

Starting point is 00:45:34 So the next thing I have to do is check that n equals m. So the compiler is going to complain if I don't do that check. So I think this sort of thing has really important implications, particularly if you're interested in security. So if you're taking some arbitrary, unchecked input from a user and you have a reasonably precise type that says, I know this input is safe, provided I can get it to match this type.

Starting point is 00:46:02 So, you know, I can't think of an example off the top of my head. But, you know, so maybe input has to take a particular form, and I've written a function that only accepts a type that makes sure it's of that exact form. But the input I've read is a string, and there's no way I'm going to be able to run my function unless I can convince myself that the string I've read in is legitimate.

Starting point is 00:46:24 So, of course, this is something that we should be doing anyway. If you're writing any kind of security-sensitive program, well, every program is security-sensitive, isn't it? So if you're writing any kind of program, you need to be checking that string. You need to be checking that it's the exact form that your function is going to take. And you don't just assume that the things are going to work out. But if your types aren't quite precise enough, then it might be that, well, you don't make those assumptions,

Starting point is 00:46:49 but there's something you forgot about. So, yeah, the printf example is kind of a tricky one. You probably wouldn't be taking runtime strings. But in the situations where you are taking runtime strings, you basically need to have a check. Actually, that would have been the short answer. You need to check at runtime. That's all it is. Well, I think that, yeah, my bad, bad example. An interesting one would be, so, and I think it's in your book, is if you have a, just a vector of

Starting point is 00:47:19 four elements and you're like, the user provides a number and you return the value. So one way of doing it is to say to the user, no, I'm going to make you provide exactly this number. And another way is if you've got it over a network, then you've got no guarantee that that's going to be intact. So another way of doing it is to say, well, we'll take whatever the user gives us us but we're going to make sure that it's right so yeah it was interesting to me to see like there's a a finite uh finite type of integers so you could have an integer class that's constrained to the possible values of a yeah so this is um yeah numbers finitely banded by some upper limit. So the natural example for this would be an array lookup. So you're getting compile time checks on the bands of the array. So again, this is something where a user might type in something that's out of bands.

Starting point is 00:48:19 You can only call the function that's in bands. So the machine is going to tell you, by the way, you need to do a check here. So I think this is one of the things that people get annoyed about with types and annoyed about with, you know, I have to do these checks, but I want to try the program without doing these checks. I want to just do some exploratory programming. So I think when it comes to the type-driven interactive editing, this is something that we have to think about. Can we either make it so easy to put these checks in that there's really no excuse for not doing it, or can we make it easy for people to just sort of

Starting point is 00:48:51 incrementally arrive at a program that does the right thing, just leaving a hole for where you haven't done the necessary checks? So this is where we can go a lot better with the tooling. Can I help exploratory programming by having better tooling for exploring? Because the concept of the compiler will force the developer to validate all of the inputs that come in from the user. Like that is a great concept. You're saying the problem is actually getting to that final program

Starting point is 00:49:24 because in the middle step, you don't want to have to already have written all the validations. Right, exactly. And this is what I do. I write programs where I know the happy path or I've got a rough idea what the happy path is. I'll code the happy path, but I'll just leave a load of holes for the error checking. And I will test the program out with input that I know is okay, and it will be fine. And then I'll test it out with input that isn't okay, and it will test the program out with with input that i know is okay and it'll be fine and then i'll test it out with input that isn't okay and it'll go into one of my holes and it'll crash and it'll tell me which hole it went to so i'll fill that hole in so um i find this quite a nice way of working to to just write in be able to write incomplete programs and just have the compiler build them and i can still test them even though they're not finished. But I have a machine to tell me which bits I need to do

Starting point is 00:50:06 in order to make my program robust. So maybe there could be improved tooling that allows us to hide bits. It's almost like programming with text is the problem here. The fact that programming languages are fundamentally still text-based things means that the sort of things we might want to do as perfectly reasonable things while developing just hide bits of programs things we we aren't able to do quite yet but to be honest that's the next thing I want to work on is making this making this process of interactive editing and just just nicer sort of my dream is to have maybe a tablet app or something like that

Starting point is 00:50:47 where I program essentially by waving hands around on the screen. So I write the types and I fill in the program by waving my hands around, pulling in bits of programs from different sources. But anyway, that's some way off. Well, that would be a hell of a demo. Yeah, well, we'll see. I don't i haven't i don't even know where to start with tablet programming yet so i think i'll need help on that one but uh yes that's a that's a dream yeah i mean i really like the keyboard i don't know if uh if text i don't know

Starting point is 00:51:19 i've heard this comment before like why do we just have files with text when when programs are so much more structured than that? But I think it's a good lowest common denominator. I think it's always going to be text in the end, but ways of interacting with it, well, as I say, it's always going to be text in the end. It's probably not always going to be text in the end. That's just the limitations of my own imagination. But having better ways of interacting with it, even if it is still text, I think is important.

Starting point is 00:51:46 So it's one of the limitations of the text editors we have. Actually, if you're using pretty much any modern text editor, it's amazing what you can do with them. Just by the action mode for Idris, it's essentially talking to an Idris process in the background, so we can get it to do basically anything as long as we can explain it to Idris as part of just a text command. So there's all sorts of scope for interesting new ways of editing programs. So I have a couple of... I was just going to say that if you start imagining new things like writing a monomorphic program, but there is a polymorphic version of it maybe already in the library, I just sort of imagine writing that program in the interactive type-driven mode,

Starting point is 00:52:42 and then the machine says, ah, by the way, you've already written this program. It's called map or it's called zip with. Once you start imagining what you can do, if the machine knows more about your program, I think there's all sorts of cool stuff we could be doing there. Yeah, I think like a linter kind of can do that sometimes. Say like, hey, it seems like you're using map.

Starting point is 00:53:03 But yeah, it's not. Yes. Maybe it is interactive with the right extensions. So linting tools can do this. There's all sorts of tools that can detect this, but I reckon it should be something that the machine can spot itself. You shouldn't need to code up these specific examples

Starting point is 00:53:17 of higher-order functions. But again, that's the dream world, but would be nice to have. Okay, so I have a bunch of yes or no questions that are maybe a little bit silly that I'm going to hit you. You can say yes or no, or you can refuse to answer or expand. So does Idris compile to PHP? Yes and no. In that you can generate PHP for core Idris programs,

Starting point is 00:53:51 but it only does the absolute basics. So I actually did that back end of the bet, as you do. In principle, yes, but really don't use it. Haskell having two colons for type annotation is a mistake. It was historically the right choice at the time. That's a safe answer. I don't think people necessarily know why it has two colons instead of one um so so as far as i know the reason is that um it came out of it evolved in various ways from another language miranda so um by david turner and uh that had type inference

Starting point is 00:54:36 and the the uh david turner's idea was that he would be writing a lot more lists than he would be writing types. Therefore, single colon for lists or single colon for types. Well, you pick the shortest thing for the thing you do more. So completely sensible to pick a single colon for lists. Well, in Idris, I think I'm writing a lot more types than lists. So it's the other way around. But anyway, as far as I'm aware, that was the reason for single colon.

Starting point is 00:55:04 That was a completely sensible thing to do. But that doesn't mean we should keep a decision because it was a sensible thing to do in 1980. Having a programming language with lazy evaluation by default is a mistake. No. True or false? False, yes. There are pros. I'm doing it all the well on yes or no here. So there are lots of arguments in favor of lazy evaluation. There are lots of arguments in favor of strict evaluation. I like both. I would really like a language that could read my mind as to what I wanted and when, but that's even more of a dream than the other things.

Starting point is 00:55:46 So, I mean, I picked strict for Idris just because a while ago I'd implemented a lazy language and I wanted to implement a strict one this time. That was the real reason. There's actually a better reason, which has come up later, which is that I want a type to precisely say what the data is at runtime. So if the type is int, I want the representation of that to be definitely an int. I don't necessarily want it to be a bit pattern that represents an int.

Starting point is 00:56:22 So that's really why I argue for strict now. But really, they've both got merit. What do you mean by a bit pattern that matches an int? Yeah, so in C, for example, if you see the data type int, you know that if you look at those bits and you interpret them, you will get the number that is represented by that int. In Haskell, what you will get is a pointer to a computation that will get you an int. It might be an int, or it might be a function that will calculate an int. And I'm just thinking now that if I really do want the types to be precise, then if it's an int, if the type says it's an int,

Starting point is 00:57:05 it should be an int. If it's a computation that calculates an int, the type should say it's a computation that calculates an int. So that's why in Idris we call lazy int. So we've got this lazy type. But that is merely true for Idris. That's not something that I believe to be universally true. It is merely the decision that we've taken for Idris. So I don't think there is a universally right answer to lazy or strict. They're both useful in their own settings. So yeah, I realize it is going out on the internet and therefore I should be controversial, but I'm not going to. It's more complicated. All right. Here's a controversial one that i saw from one of your talks um americans are spelling colors uh oh i just do that for cheap ass um so it's very sad to me that uh

Starting point is 00:57:55 just one of the final things that uh that happened with the book before going to print is that they changed it into the house style and they broke all my spelling. But I just whatever. It really doesn't matter, does it? But in the source code for Idris,

Starting point is 00:58:17 we spell color with a U and we spell normalize with an S and all of that. However, there are some things just not worth arguing about. Lazy versus strict is one of them. Color with a U or without a U is also not one of them. But, you know, it keeps us busy on the internet. So there you go.

Starting point is 00:58:36 Well, I want to be conscious of your time, Edwin. It's been great to talk to you about Idris and the book. And thank you so much for taking the time to talk with me. Well, thank you for having me. So that was the interview. Did it get you interested in Idris and dependent types? For me, it totally did. If you liked the episode, do me a huge favor and tell somebody else about it.

Starting point is 00:58:57 Maybe share it with some of your co-workers. Until next time, thank you so much for listening.

Your Ad Here

CoRecursive: Coding Stories - Tech Talk: Idris, Proofs and Haskell with Edwin Brady

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.