Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Grigore Rosu: The K Framework – A Framework to Formally Define All Programming Languages

Starting point is 00:00:00 This episode of Epicenter is brought you by Shapeshift.io, the easiest, fastest, and most secure way to swap your digital assets. Don't run a risk of leaving your funds on a centralized exchange. Visit Shapeshift.io to get started. Hi, welcome to Epicenter, a cryptocurrency podcast that covers upcoming technologies, companies, and networks in the blockchain and cryptocurrency space. And my name is Sunny Agarwal. And today we have with us Gregori Rosso, a professor of computer science at the University of Illinois Urbana-Champaign and the CEO of Runtime verification. Nice to have you on the podcast, Gregori.

Starting point is 00:01:15 Can you tell us a little bit about yourself? Thank you having me. Hi. So yeah, my name is Gregorio Rosu. And I came to United States in 96 to do a PhD at University of California. And then I was always attracted by mathematical techniques and formal methods. And I had a privilege to work at NASA for two years before joining the University of Illinois as a computer science professor in 2002. And since then, I've been there all the time.

Starting point is 00:01:45 And I started a company runtime verification in order to take ideas that we initiated NASA Ames. Actually, that's what we coined the term Rantaiverification of NASA Ames with colleagues there. And the plan with the company was to take all these ideas and technologies developed mostly by researchers and academics into real world products that can make a difference. So that's how we started the company. Awesome. And so that and the sort of the end result of that was the K framework, which is what we are discussing here today. So could you tell us a little bit about what originally got you interested? in programming languages and formal systems?

Starting point is 00:02:29 So I started off as a mathematician, and when I joined NASA, my job was to prove that programs were correct, that certain flight software was safe. And I started asking myself, what does it even mean? What does it mean for a program to be correct? So that's what started me to...

Starting point is 00:02:55 think about formal methods, about how to mathematically capture the meaning of programs. And then I was fascinated by programming languages, by all the variety of programming languages and paradigms, and in the end, realizing that every single one of them had a very deep mathematical meaning. And once you capture that meaning, then you have the language in your hands. You can do everything you want with it. You can execute programs, you can prove programs correct, you can do everything. So I found it just fascinating. And once I, once I, once I, once I, once I, once I, once I, once I, once I, I, I was, trapped into this field. And, uh, and, uh, I just love it. I see. Cool. And so I also read that, you know, Kay actually originally started off almost as a teaching tool as a way to, like, teach students.

Starting point is 00:03:45 And then it ended up, uh, evolving out of that. Could you, uh, tell us, like, you know, it does it still have that goal of being a teaching tool or has it grown much beyond that original vision. Well, first of all, I still use it for teaching. I used it last semester, and I would teach it, I would teach programming language semantics next semester also using K. So yes, it started from teaching needs. When I joined the department of computer science, I was asked to teach programming languages. Actually, I was asked to teach programming languages even during the interview. They really needed somebody to teach programming languages. And since I already had some ideas how to do that, for the first.

Starting point is 00:04:25 from NASA, I was very glad to pursue this path in my career, namely to teach programming languages, and at the same time to develop frameworks for it. And I found actually that, so obviously initially I didn't think about K. I just had to teach programming languages, and I knew that it had to be somehow mathematically rigorous. And I looked at a variety of approaches that were available, and I tried every single one of them. one of them. I think I tried eight or nine different approaches, like operational semantics, small step, big step, denotational semantics, separation context, reduction semantics with continuations, and in the end came the chemical abstract machine

Starting point is 00:05:09 was really interesting and and then I realized that each of these had a very nice idea under the hood but at the same time some things were missing from each of the approaches which kind of the other approaches that were invented, right? So people invented different approaches because the previous approaches were missing something. And what I really tried to do with K initially was to just engineer a framework, which takes the best from all the approaches that we had around, and try at the same time to avoid their limitations.

Starting point is 00:05:45 I didn't think at all of the mathematics underlying it. I knew that once it works, from a practical point of view, will be able to come up with the mathematics because all these other frameworks were based on rigorous mathematics. So the challenge was really how to make it easy, fun, and modular to define programming languages. Because one of the big problems when you formalize a language is that you add a new feature to your language and then you have to go back and change everything you've done in order to accommodate changes in the configuration, the program configuration needed for the new feature. And that's very inconvenient, a very very very demotivating. Imagine that you have already 200 rules and you add one more feature and you have to go back and change all the 200 rules.

Starting point is 00:06:30 You'll feel very demotivated to add new features. And I was lucky enough that I was in the middle of a very nice group of students who were very enthusiastic to use the early prototype to define languages. And they used the language and I encouraged them to complain. So whenever they had issues, I wanted them to complain. and I said, look, I don't like to duplicate code. I don't like to duplicate rules. I have to write the same rule again and again and again. That's annoying. All right, thank you, thank you.

Starting point is 00:07:00 And then we try to engineer it in such a way so that doesn't happen, right? So that you define the language in a nice incremental, constructive way without having to go back and change things that you've already defined in order to accommodate new features. So, yeah, so we engineered the framework and then the mathematics game. They put everything together in a nice round thing. So, of course, like, this episode is about the K framework. And as I understand it, Professor, the K framework is an attempt to have mathematical structures and rules and ways of expressing rules that allow people to write the rules of any programming

Starting point is 00:07:49 language. It is essentially a language with which you can write the rules of any programming language. Is that right? It's almost like a one ring to rule them all, right? So we as programmers, we use a language to write a program and like there's like hundreds of languages around the world. And the K framework gives us tools and semantics to express all of these languages in one unified mechanism. Yes, that was the goal. and that's what it is. And you can think of it either as a meta-programming language in which you define programming languages,

Starting point is 00:08:27 or another way to think about it is like a domain-specific language for programming languages. And in fact, you can even define K in K, which is what we are doing right now, actually. We're implementing K in K, right? Because K is a language in the end, and it needs to be expressed in itself. And many programming languages actually implementing themselves.

Starting point is 00:08:49 you know, even Scala and in other languages. So, right, yes. So the challenge was how to come up indeed with something very uniform that allows you to define any programming language in any paradigm using the same general principles. And what are those general principles so that is possible? And in my classes where I teach programming languages using K, we actually go through various programming paradigms.

Starting point is 00:09:18 We start with imperative languages, then we move to functional languages, then we go to object-oriented languages, and logic programming languages in the end. And students have all the kind of projects along the way where they combine features. So yeah, and in the end, in the end, it all boils down to a very simple principle,

Starting point is 00:09:38 which is that of a parametric transition. So you have a program configuration, which you can think of as some data structure that holds a snapshot, of your program state. And the same happens in all programming languages, no matter what paradigm you use. So you have this program configuration and then you have rules that match certain structure they care about in that program configuration, in particular the fragment of program, they still need to be executed, for example. So they match whatever part of the configuration

Starting point is 00:10:09 they care about and they apply a transformation. So match and apply. So basically, Ruride rules are pattern, Rurites to pattern. So you match the left-head pattern and you replace it with the right-hand pattern, module or the substitution of the variables that results from the matching process. That's it. That's the only computational mechanism under the hood. And this is Turing Complete. It has been shown Turing Complete many, many years ago by many people in different settings.

Starting point is 00:10:41 So it's a very powerful computational mechanism. And what makes it interesting, philosophically speaking, from my point of view, is that it allows you to tune the computational granularity of the computation, the granularity of the computation. For example, when you define a tuning machine or a lambda calculus, the granularity of the computation is given by the construct that you use. So with general purpose rewriting, termed writing like K employees, you can define that granularity by means of structure. You define your syntax and then rules match as much of the syntax

Starting point is 00:11:20 you want to transform in one step. You do not have to reduce it to lots of smaller steps to perform a big computation. You can just do it with a big matching and transformation. So yes, yeah, so that was the big challenge. And actually I spent quite a bit of time back in 2003, 2004, when the whole thing started, thinking about what would be the right mathematical formalism to employ in such a meta-language framework.

Starting point is 00:11:54 And I looked at existing approaches and term writing really was the most appealing to me, exactly because of this capability to match as much structure as I want from the program configuration. And then the fact that we were able to define languages across different paradigms quite easily naturally, that was like a bonus, like an additional incentive to convince me that is the right way to go. Interesting. So of course, like, so later on in the show, we will get into some of the, some of these details that you're pointing out, this is the idea of the rewriting logic and how that works.

Starting point is 00:12:36 But zooming out for a little bit, this seems like a language in which you can express any. computer language. Like this seems like a very very interesting idea when you tell it to anyone. Do you do like how old is that idea and why did people think of this idea in the first place? Well so the idea is very very old. Lambda calculus itself is based on the same idea. Lambda calculus actually has only one rule which is beta reduction. You take a lambda abstraction apply to a term and that transforms into something else. But it is just one particular writing over one particular syntax.

Starting point is 00:13:19 In K, we want to have any rule over any syntax. And in particular, I can define Lambda calculus as a particular theory or a particular language in K. The same way you can define other formalism, other imperative languages based on assignments, based on... So writing is an idea when it started, I don't know, but it is definitely a very old idea. I think maybe grammars, in the theory of context-free grammars, we had the first ideas around writing, maybe even before in early 1900s, right, even before we had Lambda calculus.

Starting point is 00:14:02 So many problems you're not expressed specifically as term-writing, but they were instances of the idea of the term-writing idea, the term-writing idea. So yeah, so the idea is very old. I would say what is new perhaps is the particular methodology on how to use writing and how to organize your configuration. And what is really important is how to find the place in the configuration where you want to apply a rule. Because rules do not always apply at the top level in your configuration. You have to find the right place where to apply the rules. And this is where you may need to specify questions. to specify contexts.

Starting point is 00:14:44 Contacts are actually critical for efficient implementations and effective implementations of rewriting. So, yeah, I would say that writing itself is not a new idea, and even contexts, they are not very new. And the K framework itself was engineered

Starting point is 00:15:03 from many ideas that were floating already in the programming language semantics field. But I think what K really did well, of course I'm biased, but I believe that what he did well was to put all these ideas together in a meaningful and practically useful way. Without worrying about the theory initially. And then when we came up with the theory, I was very pleased to see that things fit so nicely mathematically as well. So we developed a logic, a foundational logic underlying the entire K framework, which now

Starting point is 00:15:46 explains everything we do. So everything we do with the K framework can be explained as a search for a proof, for a fact. So everything we do in the end is very rigorously backed by a mathematical proof, starting with execution, verification, model checking, symbolic execution, everything is backed by proof. I think that's where some of the innovations on the mathematical side, the K-framework happened. So could you, like, maybe for some of our listeners who are less familiar with programming languages, you know, could you explain to us a little bit about what the difference between like this syntax and semantics are and how you can use rewrite rules to, you know, syntax

Starting point is 00:16:32 is usually about like explaining what the vocabulary is, well, some syntax. is more about what the grammar is. And it seems very easy to understand how you would use, like, rewrite rules to explain syntax. But how would you use these to explain semantics? And can you, like, talk a little bit about this? In fact, I teach programming languages to fresh students, and I teach programming languages using K,

Starting point is 00:17:00 and syntax and semantics as defined in K. And they grasp it right away, even if they do not have a lot of experience, So even if they only know one or two programming languages, they grasp the idea because it's so simple. So in terms of syntax, yeah, so the difference between syntax and semantics. So syntax is about really taking a string, right? A program. What is a program?

Starting point is 00:17:20 A program is just a string of characters, right? You can think of it that way, some, you know, level of obstruction. And then you have to give that a meaning. But before you are even able to give that a meaning, you have to realize that there are some constructs there. Right? There is an even than else construct. There is a wild construct. There is a function defined.

Starting point is 00:17:39 So how do you go from a flat sequence of characters to a tree, an abstract syntax, what we call an abstract syntax three, basically a term that corresponds to the program? How do you get from the syntax, from the strings to the program? So that's what the parsing does. So parsing is not new. Parting is a very old field. There are lots of algorithms, some of them more efficient, others more general. And in K we employ both efficient and general algorithms.

Starting point is 00:18:10 You can define actually our own parser outside of the K framework. The K framework also gives you a general powerful algorithm, which is not very efficient but very general. And yeah, so basically the syntax allows you, and K has this keyword syntax, that we define synthetic productions. For example, we say that an expression is an expression, the symbol plus another expression. And an expression can also be an integer.

Starting point is 00:18:35 for example, right? And now if you have one plus two, the parser that it generated automatically from this syntactic definitions, the parser will give you a three, a nice upstasy syntax three, where you have one plus at the top and then two to the right. So now you finally, you have some structure, right? So once you have some structure from your original program, now you can talk about semantics. So what is the semantics? The semantics takes these structures, these three.

Starting point is 00:19:05 these abstract syntactic trees, and gives them a meaning. And now depending on which mathematical formalism we employ, this meaning can be expressed in one way or another, but in all approaches in the end, what you get is some sort of mathematical object that corresponds to that syntax that you started with. And in our case in the case of K, the way we add this meaning to syntax

Starting point is 00:19:30 is by means of view-right rules, where we transform iterative this tree into a value. So suppose it your program was 1 plus 2. So the parcel will give you 1 plus 2, a nice 3 with plus at the top and 1 and 2 to the right and then there is a rule right rule saying that if you have an integer plus another integer, this rewrites to the sum of the two integers in the mathematical domain. So with one rule you transform that tree that had 3 nodes, 1, plus and 2, you transform it

Starting point is 00:20:05 to just one node 3, which has the value 3. So the semantics is obtained by iterative transformations of the original program. You start with the original program. You put it in some configuration, which contains all the additional information in order to run the program, which could be like a state, stacks, and all kinds of additional infrastructure that you need in order for the program to make sense. and then step by step applying rules one by one, transform that into a final configuration

Starting point is 00:20:38 which contains the result of the program that you're looking for. So perhaps I'd like to restate this in my own way. So we can think of, let's say we can think of human languages first. So in any language, so for our listeners, any of you that has learned a new language, you would have learned like vocabulary and then you would have learned grammar. So what is vocabulary? Vocabulary is like, oh, I, let's say I'm out to learn German.

Starting point is 00:21:17 And vocabulary is like, so there's a spoon and then there's a word in German. So let's say like there's a fork and there's gabil. So garbill means fork in German. So it's like vocabulary is the association of a word, which is like garbill, to this thing which is like a fork. But then there is also grammar, which is like once you have a lot of words, how can these words be arranged together? So in some senses, what I feel professor is like that semantics is the computer science equivalent of vocabulary. and syntax is the computer science equivalent of grammar, right? So, for example, like, if I want to tell the computer that it has to compare two things

Starting point is 00:22:13 and if they are equal do something. So this is like an operation, and semantics is like what word does the programmer use to represent this operation? and syntax is the way in which these operations can connect together in order to make a well-formed program. Is that right? It's almost. There is a bit more going on in a computer program, right? Because in natural language, you just say something as a sentence,

Starting point is 00:22:45 and your sentence doesn't have to compute into a result. Your sentence is being parsed using the grammatical rules. And then based on the individual meanings of the individual words, right, in the vocabulary, you then build the meaning for the entire sentence, right, without having to execute, right, the sentence. So in some sense, if you think about, okay, what we do, we take such a sentence, which would be the program, and then we transform it step by step into another sentence, which is a bit closer to the final result. and then into another sentence, until in the end becomes three, you know, the final result. For example, here is a sentence.

Starting point is 00:23:32 Let x be three in the expression 1 plus x. So what is the meaning of this sentence? You may think that it is 4, right, because you say let x equal x be 1 in 1 plus x. But it is not immediately obvious unless you do some computation. going there. So you can parse it and you realize, okay, I have the keyword let, which is like a statement. Then I say, oh, the variable x, which is equal to one. So I can use all the grammatic rules and I can parse this sentence into something that now I can add semantics to. And how we add semantics to in this particular language would be that I just replace X. I transform

Starting point is 00:24:18 the sentence let x equals 1 in, no, let x equals 3 in 1 plus x. I transform it into 1 plus 3. How I just realized that I have a let construct and let x equals 1 in some expression means that I should go ahead and substitute 1 for x in that sentence or 3 4x in that sentence in the remaining of the sentence which is 1 plus x now becomes 1 plus 3 and then I apply another rule which says oh 1 plus 3 this goes to 4 right so I go in one more step. So it's like keep talking, keeping talking. So if you have an original sentence which you parse and that gives you the program, but then you can keep talking about how that program computes, right? And then you come up with the result in the end. And I would say that

Starting point is 00:25:11 the syntax of K is the part which allows you to parse that sentence so you can start transforming it. And then the rules are the keep talking part, right, which allows you to eventually get a result for the whole program. For example, if you have a Java program, you have a four loop or a while loop, right? So that computes. You may execute it like a thousand times. You may do a very complicated computation there. How do you know what the result?

Starting point is 00:25:38 You have to simply go through the loop, step by step by step by step by step. It's like working through the program. And that's what the rules, okay, allows you to do. Right, so sometimes I know that the word rule may be confusing because you have rules for the grammar and rules for the computation. The rules for the grammar allow you to parse, and the rules for the computation allow you to compute. Okay, so I kind of understand this now.

Starting point is 00:26:05 What are the benefits of like formally, like of these formal semantics? Because like we have like, you know, hundreds of like common programming languages already out there, and most of them usually aren't formally specified. they're usually like, you know, oftentimes, sometimes they have reference implementations or sometimes they have like guidebooks, but they don't really have like, they don't, oftentimes you find like undefined behavior and whatnot.

Starting point is 00:26:36 What are some of the benefits of having this like formal specification of every language? That's actually a very good question which students often ask when they take the class, programming language, and we start talking about formal semantics. So here's my short answer. So what is Java? If I ask you, what is Java? Whatever Oracle tells me it is? Well, but if I want to, if I want to, let's, maybe Java is clearer, but let's consider C because you mentioned undefined behavior.

Starting point is 00:27:12 C is full of undefined behavior. So what is C? There is an 800 page standard, which is an ISO standard, right? 800 pages of text describing what C is. More than half of those pages is about defining undefined behavior. What does it mean for a C program not to be a correct program? Some things are obvious, like division by zero, yes. Or access out of bounds of an array.

Starting point is 00:27:37 You have an array, right, the numbers from one to elements from one towards zero to 99, and now you access element 101. Is that correct or not? So she will not check that for you. You can access it and get some number or crash or whatever. So here is the question. So what is C? And why do you care about what is C?

Starting point is 00:27:56 Suppose that you have a pacemaker, right? The pacemaker will run a C program. Okay, how do you know that that C program does what he's supposed to do? Because otherwise you die. Right? So what if that C program does an array out of bounds axis or a division by zero? And then you have a problem, right? You die.

Starting point is 00:28:15 So what you'd like ideally is to have a way to set. that that program does something. And this is where formal semantics comes into the picture. Without a formal semantics, you do not know what the language is. You need the definition in mathematics, right? We only need definitions in order to prove facts, in order to define concepts. Otherwise we cannot reason about those concepts. So if you want to reason about programs, if you want to say anything about a program,

Starting point is 00:28:42 except, you know, waving hands, but if you want to say anything about a program, we need a rigorous way to say what the programming language is. language is or what the program is and this is where the formal semantics comes into the picture. It's basically that link that is needed in order to do reasoning about programs. This episode is brought to you by ShapeShift, the world's leading trustless digital asset exchange quickly swap between dozens of leading cryptocurrencies including Bitcoin, ether, Zcash, Gnosis, Monero, Golem, Ochre and so many more. When you go to ShapeShift.io, you simply select your currency pair, give them your receiving address,

Starting point is 00:29:25 sent the coins, and boom. Shapeshift is not your traditional cryptocurrency exchange. You don't need to create an account. You don't need to give them your personal information, and they don't hold your coins, so you are never at risk from a hacker or other malicious actor. Shapeshift has competitive rates and is even integrated in some of your favorite wallet apps like Jacks, so you can swap your digital assets directly within your wallet, just as easily as putting on your slippers.

Starting point is 00:29:52 Whenever you see that good-looking fox, you know that's where Shapeshift is. So to get started, visit Shapeshift.io and start trading, and we'd like to thank Shapeshift for their supportive Epicenter. So one of the reasons that, like, you know, for example, in C, oftentimes things are left undefined is because, like, it allows for the, some freedom for the compiler to do optimizations.

Starting point is 00:30:15 And then, you know, oftentimes you can use, like, things like linters and stuff that there are warnings that you use saying like, hey, by the way, what you're doing is unsafe, but, you know, but let it happen because the compiler might need it for some reason. So what are some of the tradeoffs that you do there by like, you know, restricting the freedom of the compilers? So yes, yes, that's why she has the undefined behavior, because they want to allow compilers the freedom to do whatever they want, basically, more or less, with undefined programs. And to optimize, so C, one of the main applications of C is in embedded systems where performance is critical.

Starting point is 00:30:58 And to achieve performance, you need to take advantage of the hardware to the last drop of speed. And compiler developers really love that, how to translate C programs into something that is really very efficient on the target hardware. So for example, let's take actually array out of bounds access, right? So if you want to be absolutely safe, you would have to check that you do not access the array out of bounds. But then you have to pay the price, the cost of that checking. You have to read the value, evaluate the argument, and see that it is between bounds. So that costs more that it costs to actually access the array. And that's unacceptable for these people.

Starting point is 00:31:43 So what they assume in C is that the programmer knows what she's doing, and because of that we can assume that the program is well-defined, and then compilers take advantage of that and try to make the program as fast as possible. That is great. It's part of the spirit of C, the so-called spirit of C, that's what you want in C. However, if your C program now runs in a car or in a medical device that your life depends upon, or in some spacecraft or aircraft, then you cannot afford to have unexpected behaviors of the program. And those unexpected behaviors happen all the time in sea when you change the hardware.

Starting point is 00:32:24 You have a C program that looks reasonable to you. You run it on a particular desktop. You test it. It looks great. But then you put it on a different device, on a different platform, and it gives you a completely different result. And you may die, literally, because of that. So you cannot afford that.

Starting point is 00:32:39 If you cannot afford that, and there are many, many embedded system applications. where correctness became paramount in C programs, and that's where the formal semantics comes into the picture. Because now what I can do, I can take that program, and I can analyze it with the formal semantics of C, which defines completely all the undefined behavior in the standard, and now that tool, this formal semantic framework,

Starting point is 00:33:04 the K framework actually run with the C semantics as input, will tell you whether your program is undefined or not. And if it is undefined, you know, You can still ignore it if you want to. You can still compile it with your favorite compiler, knowing that it has some errors, but at least the framework tells you that, hey, you have an undefined program here.

Starting point is 00:33:22 Be careful. If you are not careful, it's your risk you are taking. And there are some tools like the Lint tools that you mentioned, which check for some very simple link style, so-called LIN style errors, but far from all the undefined behaviors. Some undefined behaviors are very tricky in C. So basically what will happen is, like, you know, I have a C program, and I'm doing some sort of weird undefined behavior, let's say, doing two right on the same line or something, right?

Starting point is 00:33:57 What what K would do is basically like throw me an error saying, hey, by the way, this behavior is undefined. It doesn't actually make like a choice on what to do. It actually, instead of like, you know, a compiler would usually do what, like pick whatever it wants to do. You instead will throw an error saying this is not allowed at all. Yes, yes. Okay. Definitely. Yes.

Starting point is 00:34:21 And then if you can go further and actually prove the program correct. You can prove that the program does what it's supposed to do. In which case you go through all possible behaviors, no matter what the input to the program is. And you show exhaustively by what we call exhaustive runtime verification. go symbolically on all the parts that can happen in your program, and you prove that there is no undefined behavior, no matter how this program executes. And this particular feature became extremely useful to companies like NASA, Boeing.

Starting point is 00:34:49 We have several contracts with this, who use the Kray framework with the C-Semantics for exactly this purpose. So could you expand a little bit on this then? What is exactly the relationship between like formal verification and formal semantic? Because oftentimes, like people think that formal, verification is like strictly in the realm of functional programming languages. People like, oh, if you want to do formalification, you need to use Haskell or Okamil. But now if you're saying that we have this way, K provide us like a way we can do formal

Starting point is 00:35:19 verification of C programming and stuff. Can you expand a little bit more about this? So I love functional programming myself, right? And I often found that a functional programming community is a bit too enthusiastic about functional programming in terms of verification in the sense that they try to encourage people to use functional languages because the problem is being that it's easier to verify your programs. I think it is easier to understand the programs, the functional programs. It is easier to reason about them in your mind, but in terms of actual formal verification things are also difficult to

Starting point is 00:36:04 to verify functional programming language, especially because many functional programming languages are not as pure as the original Lambda calculus was. Once you have side effects, once you have a heap, once you have references, things can be quite heavy. What we did in K actually was to step back a bit and look at what is the actual foundation underlying all programming languages, not only functional programming languages. And this is rewriting, the term rewriting capability. Functional language is also du-writing, imperative language is also du-writing, and object-rity language is also due writing, and even logic programming languages do-writing. So there is this basic foundation that goes beyond all the particular paradigms, including functional languages.

Starting point is 00:36:56 And yeah, so I wouldn't say that it is necessarily easier to verify functional programming languages. I would say that a lot of the functional, of the program verification work so far was done by means of what we call whole logic or axiomatic semantics, where you have to essentially redefine your process. programming language as a logic. I know this may sound a bit heavy if you don't come from the field, but you literally define a whole logic corresponding to Java. You have proof rules, you have theorems that you prove, and all these are facts about programs. And those core logics or axiomatic semantics tend to be a lot more involved for object-oriented languages or for imperative languages than for functional languages.

Starting point is 00:38:00 So if you follow the traditional formal verification approach, then yes, it may be easier to define an axiomatic semantics than for a functional language than for an imperative language that you can then use for formal verification. But what we deliberately decided in K was to not have to define another logic for a programming language in order to verify programs. So we want to have only one formal semantics of the programming language and to derive program verifiers from that semantics automatically essentially. It is in some sense like defining a whole logic automatically or deriving a whole logic

Starting point is 00:38:42 automatically for your language from the formal semantics of the programming language. This is not a traditional approach to program verification. This is very important to... to clarify, and I'm glad that you ask this question, because I think this is one of the critical insights of K. So we take an operational semantics, an inherently executable semantics, that gives you a reference model of your programming language that you can use to run programs, run tests, same like compilers do or interpreters, but at the same time, it gives you, from the exact same formal semantics, it gives you a program verifier that you can use to

Starting point is 00:39:19 verify programs, and that is sound and what we call relatively complete. So it's sound and complete. So any property about any programming language can be proved with this language-independent framework and logic underlying it. That was one of the biggest theoretical achievements behind the K framework. This foundation, which allows us to detach yourself from having to define, you know, an axiomatic semantics for your programming language only for verification, program verification reasons. You get the formal program verifier for your language exactly

Starting point is 00:39:59 the same way you get the execution engine or the parser, completely automatically from the formal semantics of the programming language. Okay, so, you know, the tagline of K is that it's this rewrite base, executable, semantic framework. So we've talked about the rewrite base, we've talked about semantic and framework.

Starting point is 00:40:19 Let's talk about this executable part. What does it mean for this framework to be executable? And how is that, in like, are other frameworks in the past been executable? Like, what does this exactly mean? Right. Executable means that you can take a program, throw it to the framework. So you have the framework and take a programming language semantics first, right? And you instantiate the framework with the semantics.

Starting point is 00:40:43 And now you pass a program and the framework will execute the program the same way an implementation of your language would do. So if take a C, let's say you take a C program, right, and you compile it with GCC and run it. Right. So actually our tool that incorporates the C semantics in the K framework is framed as a replacement of GCC, the dropping replacement of GCC, which we call KCC. So with our framework, you just say KCC, your program dot C and it gives you a binary and you run it. That's what I mean by executable. It is as executable as an implementation of programming language.

Starting point is 00:41:22 And that's funny because that's how I educate my students, right? That's what the formal semantics is. And then they have this phase, but why would it be any other way? Isn't that what a formal semantics should be? You should be able to execute the program. And then I have to refer them to axiomatic semantics or for logic, where you define these logics for prepose conditions and invariants and so on, where you can prove programs correct, but you cannot execute them.

Starting point is 00:41:49 If I have a loop that executes 100 iterations, I cannot just run the look with the whole logic. I can prove the loop correct if somebody is giving me an invariant, but I cannot run it. And that's very inconvenient, because imagine this. Imagine that you have a formal logic, of a formal semantics using axiomatic semantics or whole logic for a programming language,

Starting point is 00:42:10 and you cannot execute. How do you know if it is correct or not? In the case of C, the C semantics has, I think, almost 5,000 semantic rules now, 5,000 rules in the semantic, okay, that people had to sit down and run and test. Test is a keyword. How do you test them? How do you know that those are right?

Starting point is 00:42:28 Because like anything big, a semantics can be wrong. When you add thousands of rules, something may go wrong somewhere, right? How do you test it? It is extremely valuable to actually be able to take programs that compiler developers or or interpreter developers used to test their implementations and to use exactly the same programs to test your semantics, and then to use exactly the semantics for program verification, as opposed to catching errors only by proving wrong programs correct or not being able to prove some programs that look correct, correct, right, which is how you catch errors in a conventional

Starting point is 00:43:07 program verifiers. So how do you verify the binary, like the, whatever K is running, like is it compiled or interpreted, but like how do you, is, do we write like the X86 architecture assembly in K and that's how K knows how to compile to that? Or like, how do we verify the actual execution of the K language? Very good question. So there are different ways to do it. So as a matter of fact, we do have a semantics of x86 that is in the process of finalizing, which means that you can then use GCC, compile your program to x86 and then run the x86 with the K framework.

Starting point is 00:43:57 But that's not what we do with the KCC tool. The alternative is to actually have a formal semantics of C itself without any translation to anything else. And then you run the C programs directly with that semantics, a dead granularity, the granularity of the C language, without any translation to another language. Because typically when you translate from one language to another, you lose something in the translation. Or otherwise, you have to prove that you lose nothing,

Starting point is 00:44:24 and that's extremely difficult. As a matter of fact, we found several program verifiers for C that lose undefined behaviors in the translation, because they translate from C to some. intermediate language that their prover knows and as part of the translation they they lose behaviors. For example they may reorder the arguments of a function or they may reorder the arguments of an operation of a primitive operation you know as they translate and all those may may eliminate undefined behaviors. So yeah

Starting point is 00:44:58 so K doesn't have an actual binary at all. K doesn't have an intermediate language at all. So K works directly with the syntax of your programming language. That's why you define a syntax and then a semifine. and your syntax gives you the actual structure on which you want to define your computation granularity. And then the rules add that computation granularity to your desired syntax. Now if you want to implement a translator from that to another language, you can implement that in K as well, also using rules. As a matter of fact, we have done that for a language called Pluto developed by I.

Starting point is 00:45:34 And we implement a translator from that language to Yele, where E. Yele is a low-level language that runs in a virtual machine. Probably we'll talk about that shortly. So, yeah, that translator itself is implemented in K. So, in K you can implement translators, but also direct semantics of languages. And again, K does not have, like, its own binary language. It's not like X-86 or LLVM. It's not like that. It's not something to translate your language into.

Starting point is 00:46:04 something in which you define your language exactly the way your language is without any translation. So the future essentially with K is you define, like let's say we're talking about any language like JavaScript. So the semantics of JavaScript are expressed in K. And of course, like once you express the semantics of JavaScript in K, you will get all of the tool sets that are required to actually execute JavaScript along with it. So in this case, you'll get the interpreter. You will get some kind of formal verification tool sets out of K. And whoever is the person that is sort of updating the JavaScript language can just update the semantics of JavaScript in K.

Starting point is 00:46:56 And not touch the tools at all. The tools will be changed automatically and correct by construction, which again, I want to reiterate that this is not how things are done in most other approaches. Most other approaches you develop a model checker for JavaScript, or we develop a program verifier for JavaScript, or you develop a symbolic execution engine for JavaScript. We do not want that because we believe that is a huge amount of effort and the waste of talent, literally, there are many PhD students of top universities who do just that for their PhD.

Starting point is 00:47:34 They define a symbolic execution engine for JavaScript or a deductive program verifier for JavaScript. And so that has lots of disadvantages, not only duplication of work, because each tool for JavaScript or for Java or for C, let's take model, same model checkers, implement the same algorithm over and over and over again just for a slightly different language. or for slightly different infrastructure, configuration, syntax, and so on. So we believe firmly that this tool should be implemented once and for all. If I implement a model checker, I want to implement a model checker once and for all, and then to take the programming language as input and never touch the model checker. Right? So if you have, think like if you have N programming languages and K or M, let's say not K,

Starting point is 00:48:24 K is taken, so N programming languages and M. let's say tools that you may be interested in, like an interpreter, a compiler, symbolic execution engine, a model checker, symbolic model checker, deductive program verifier, so N languages and M tools. You have N times M combinations, right? And people implemented such combinations for different languages with different tools, right? And they publish papers. It's easy to publish papers. Trust me, I learned that. It's very easy, it's a lot easier to publish a paper than to develop something that lasts. And let's want you to see a complete disconnection between languages and tools.

Starting point is 00:49:03 We want the tools to be implemented once and for all, as efficiently as possible. And by the way, everything we do is open source and we encourage the community to contribute. So then everybody can contribute to the model checker, let's say, to make it faster and faster and faster and better and better and better. And then completely separated, we define language semantics that have passed as input to all these tools. And I can use exactly the same model checker tool with Java or with JavaScript or with EVM, the Ethereum virtual machine, or with Yere, or with any other language. It literally should not matter at all. And if it matters, then something needs to be fixed because things should be done completely language independently.

Starting point is 00:49:46 And that's the philosophy of the K framework. And the tool implementation is done in this spirit and the foundation, the logics, the mathematics, everything is done in this. spirit. It's a matter of engineering now to just make it work all the way through. And there is a lot of interest now from the blockchain community, which is great because together we can make it happen. So give us a sense of why the blockchain community is interested in K and why there's starting to be some traction for this tool in the blockchain space. I think the blockchain faces, challenges that were not there before. Like imagine, imagine that you have a C program in your car

Starting point is 00:50:34 running in one of the electronic control units in your car. Even if that has a bug, nobody knows the code and nobody knows how to attack it. But on the blockchain, everybody sees the code. And if the code can be exploited, if there is a bug that can be attacked, somebody will somewhere. So there is a huge, huge incentive

Starting point is 00:50:54 on all sides now to get these programs right, correct, secure. And what does it mean for a program to be secure or correct? What does it mean for a string to be a program? You see, we get back to the same questions. You need a formal semantics of the programming language. Once you have a formal semantics, you can say what a program is. Once you know what a program is, you can start reasoning about the program. and you can prove properties about it.

Starting point is 00:51:26 And then I can give you an actual mathematical proof that this program is secure. Right? You can still try to attack it, but the likelihood to attack a program that has been proved secure is much, much lower than a program that has been just written and hoped to be secure. It doesn't mean the programs, all verified programs are correct. Sometimes they may have issues because the system. because the specifications may be wrong, the programming languages may have flaws, but actually we have verified many smart contracts as part of, probably you saw it on our web page, Rantafrication.com.

Starting point is 00:52:09 We have a smart contract service that we offer some other contact verification service, and we do verify smart contracts and we verify a lot of smart contracts. And there is not a single smart contract that we verified, which was flawless. So we found little issues in every single one of them. And that's simply because when we executed with the formal semantics, there is no behavior missed. You see everything with the formal semantics of the language, provided you have a correct formal semantics of the language.

Starting point is 00:52:41 That's another question. Is this the right semantics of the language? And some languages are just too complex. So, you know, sometimes you may want. I wonder if you capture the right mathematical model of the language, because the language is too complex. Sometimes not even the inventors of the language know exactly what they meant, you know, with certain constructs.

Starting point is 00:53:02 So essentially the future that the K framework enables is, so today when I, let's say when I write a smart contract, what I want out of my smart contract is a high degree of assurance that the program will behave a certain way in all conditions. Right? Yeah. So I want that form of assurance, right? I want that form of assurance that that smart contract will,

Starting point is 00:53:34 is always going to give that money to my descendants and nobody else. Right? Now, there is like lots of places where things can go wrong in getting that form of assurance. The first is, the first is, of course, like human error. So there's some developer that is developing the contract and they can make an error in its implementation. But then, of course, the second kind of error

Starting point is 00:54:03 could be that let's see the developer is developing using solidity. And solidity has versions, right? There's 0.4.2 and there's 0.4.3 and stuff. So the developer developed using using 0.4.2, but actually when we compiled it down to bytecode, somebody ended up using the next version. Yes.

Starting point is 00:54:28 0.4.3. And that caused a bug of some kind. Or simply the compiler has a bug in itself. That's the third kind. You may think that the program does what you think that it does according to the quality code, but when it gets down to the EVM, it will be something completely different. you know, because EVM has all kind of restrictions like the stack. The stack has a finite, a bounded size,

Starting point is 00:54:52 and that size is not visible in your solitude program. You may think that everything is alright, but actually it's not. Or you may have an overflow, or the compiler may change the order in which the arismeric is being done, and an overflow that was not there before occurs in the binary that he generated. So all sorts of errors that can happen. So I think having a formal semantics will not prevent you from making the human errors, right?

Starting point is 00:55:15 If you type in a wrong account number or transfer your money, okay, that's a human error, right? But it will at least avoid the stupid errors. There are errors that should not happen, unexpected errors that are out of your control is a human. So those should not happen. Right, I should not have a compilation error, you know, I should not have an error due to an overflow because, you know, some shortcuts were taken somewhere in the implementation of the language or in the design of the whole language. And a shortcut could be not having a semantics at all.

Starting point is 00:55:50 So some people, you know, just think, let's not waste time doing the semantics. And then it is not clear what the language is supposed to do. And you may have two different teams implementing the language differently because they, like, that happens a lot with C. There are lots of C programs that you compile them with GCC you get one result, to compile it with Klan get another result. You compile them with Intel, CCD, Intel compiler, you get another result. That's ridiculous, right, if you think about it. Imagine that, imagine the same thing happening with your smart contract, right? You write a smart contract and you have three different behaviors depending on which, you know, which compilers were used.

Starting point is 00:56:29 So is this how, like, you, so do you use, like, the equivalency checker to, like, verify the compiler, essentially? Where, like, you can, like, you can, you can check. You have solidity defined in K. And then you have the EVM bytecode defined in K. And is the idea that you can use, you can, like, check to make sure that the solidity code is equivalent to the out-compiled EVM byte code? So, guys, before I answer this, let me tell you that I'm very impressed with your knowledge. Your questions are so deep and nice. It's a pleasure, literally a pleasure, talking to you.

Starting point is 00:57:02 So, all right, so two approaches here. One is to do the equivalence checking that you mentioned. So you have a, let's say a language, solidity, and you have a target, a low-level language, let's say the Ethereum virtual machine, the EVM. Right, so you can have a semantics of solidity, you have a semantics of the EVM. As a matter of fact, we do have a semantics of the EVM,

Starting point is 00:57:20 and we are in the process of completing a semantics of solidity. Once you have the two formal semantics, now you can take a program, translate it, And you can prove that that program, the original program, is equivalent to the target program, based on the semantics of mathematical semantics of the two languages. So first of all, notice that this question itself, what does it mean for the original SOIT program to be equivalent to the EVM program? This question cannot even be posed. It doesn't even make sense without formal semantics of the two languages.

Starting point is 00:57:52 What does it mean for something to be a program? What does it mean to be equivalent to another program? The mathematical definitions of the languages. This is approach number one, right, where you can take the two semantics of the two languages, a program, and you can prove the program equivalent, the original program equivalent to the target program. And this is called translation validation, actually, in informal methods.

Starting point is 00:58:12 Another approach, which is the one way I employing right now, actually, in our smart contract verification business, aroundtiverification.com, is to take the compiler completely out of the picture. right so I don't care we don't care literally what you meant in your Solidity code we verify the binary that generated EVM binary I don't even care what compiler you use what compiler version you use nothing we just take the final binary and the trick here is to come up with the right specifications right because whenever you

Starting point is 00:58:48 verify a program you verify it against some formal specification so what we do we actually talk to developers. We sit down with developers. Actually, right now we have two developers from DAP Hub here. So it's always great to work with people who actually develop this code. So we meet with developers and we talk to them

Starting point is 00:59:08 and we sit down hours, days, until we agree on what they meant in order to come up with the right formal specifications. We have helped them come up with the formal specifications if needed. And then once we have the formal specifications, we verify the generated binary against those formal specifications.

Starting point is 00:59:24 And if we find any discrepancy, any behavior that was not captured by the formal specifications, we immediately report it to them. And then we map those bugs back to the original solidity code, because in the end, we have to give them error messages related to the solidity code. And as I said, in everything that we verify so far, we found some flaws in one way or another. Most of them related to overflows or missing or implicit assumptions. That's another thing. People thought that the program does something, but it does that thing only under additional assumptions that they didn't really capture in any way.

Starting point is 01:00:03 And then they made it additional checks in the program to make sure that those things will indeed hold those assumptions. So, yes, so you can verify compilers, you can validate compilers, you can do translation validation, or you can verify directly the generated binary, which is what we do right now. And we can afford to do that because smart contracts are still relatively small. It would be very hard to do that with the flight software at NASA or Boeing, right, because those have already millions of lines of C. And then when you combine them to binary, they are huge and it's very hard to reason about those. But for smart contract, it's relatively

Starting point is 01:00:37 doable to verify directly the binary. It's the safest way to go anyway. So I wouldn't be able to, for example, like, you know, let's say we have two major Ethereum of client implementations, Gath and parity, and we just saw like this week, there was a bug in parity that would have caused it to like fork from Gath, right? Yeah. But the issue is that these code, you know, these client code bases are much, much larger than smart contract. So it wouldn't be possible right for now at least to use like equivalency checkers to check

Starting point is 01:01:09 to make sure client implementations are the same. It should be possible. It should be possible. Of course, it's a matter of effort and resources. So sometimes, so right now, for example, we are verifying the caseload. implementation for the Ethereum Foundation. We have a security grant with them. And the program is only several hundred lines of Viper

Starting point is 01:01:32 and it still takes almost two days to run all the proofs for all the properties that we want to verify. So let's say a thousand lines of code and it takes two days to verify completely. But it's all completely automatic. Once we write the properties, it's all completely automatic. So you can imagine if you have a program that is 100,000 lines in a language like C or Java, that will take a lot of time,

Starting point is 01:01:58 unless, unless, you know, you help it. You can provide lemas, you can, you know, give hints and help it. But then it's less automatic and it's more work for users. So I would say, first of all smart contracts should be verifiable, should be verified. actually these days. I would say it's the most insane that, you know, people bet their money on smart contracts that are not verified. Because, you know, if any flaw is there, then, then, you know,

Starting point is 01:02:37 things can go completely wrong. And it's not that hard to verify the smart contracts. Yes, it's harder to verify larger programs, particularly, you know, consensus algorithms and so on. Those are very, very hard to get right and to verify. It's very hard to even specify what it means for them to be secure and correct. But the smart contracts themselves, they are very low-hanging foot. We can do it right now, and we should do it as a community.

Starting point is 01:03:04 So last week on Epicenter, we had Zillica come on and talk about their Skilla language. And their skill language that they're building is, like, they're designing it to be very, they're designing it for compilation to cock at some point. and for now they're doing all of the translation to cock by hand, and in the end they want to get it to compile to cock. The idea with K is that instead of all of these different smart contracting languages compiling to figuring out of the K, they just have to specify in K and the idea is that K will then be able to compile to K at some point?

Starting point is 01:03:42 And is that possible already? Is there already a cock back end? We have a Kock back end, and more interesting actually you can take the Kock. proofs and translate them into proofs in the logic underlying K, matching logic. So in some sense, you can take proofs in K into proofs in K, actually. And it is not surprising because it is always the case that you can go between different semantic frameworks. Once you have something defined rigorously mathematically, then you can translate from another

Starting point is 01:04:12 rigorously mathematically defined thing to it and back and forth. So that doesn't mean that one is more powerful than the other. In the end, they are all powerful. But I would say the difference is that we do not translate things to K. So we do not take a language and translate it to K in order to reason about it. We actually construct the language, define the language in K without a translation. As we discussed, we define the syntax and that gives you the granularity and then you have the rules that work directly over the syntax of your language. So you can basically step by step.

Starting point is 01:04:46 Actually, we do have a step. command, right? You can push step, step, step, step, and you execute the program exactly in its original syntax using the K-sementic rules. And our logic allows you to prove properties about the programs directly at the K level without translating to any other formalism. Yes, I understand it's very tempting to take a language and translate it to KOK, but then you have to start worrying about your translation itself. Why is my translation correct? What people do also very frequently is to define languages in co- directly. The same way we define them in K, they define them in K. And sometimes those semantics are executable, so they can get an interpreter out of them as well,

Starting point is 01:05:32 and then they can also do reasoning using COP. So when they do that, the two approaches are very similar. But keep in mind that in K we have a lot other tools, not only a prover. So Koch is an interactive provor. We have to interact with it and prove things. With K, we can also do symbolic execution. We can do symbolic model checking, bounded model checking. There are lots of tools specialized for languages.

Starting point is 01:05:59 Koch is not really specialized for programming languages. Koch is a proof assistant for any proofs in any mathematical theory. It's a lot more general, and there is a price to pay for that. I guess I'd like to get a sense of what the future looks like, when a lot of smart contract language developers adopt the K framework to define their languages. So I think, of course, like one of the differences is pretty obvious. Today, Solidity is developed by Christian Reith Wiesner and his team. And in the future, the way their workflow changes is they define Solarity,

Starting point is 01:06:38 semantics of Solidity in K. And whenever they want to issue an update, they just issue and update to the the semantics in K and that's it. K will take care of making a compiler for it, interpreter, all of the formal analysis tools, et cetera. Now if I'm a developer that's writing in solidity, I need to specify what version of solidity I'm using, and I'll get the K semantics of solidity.

Starting point is 01:07:08 And when I write a program, I can use the formal verification tools that come out of K to prove things about my program. Yes. And now as a developer, whenever like Solidity upgrades, I can also sort of upgrade to the new version of Solidity

Starting point is 01:07:23 and have the proofs of my software in... You have the compiler and everything, all the tools for the new version. Yes. So the way I envision this is K2 be like universal language for the blockchain, right? In which you define your languages,

Starting point is 01:07:42 be that Solicians. Solidity, various versions of solidarity, be that Viper, Pluto's, whatever language you want to write smart contracts in, high-level language, and even the lower-level languages. Well, to lower-level languages, there is another story which I'll talk about later. But now let's talk about high-level languages. All right, so now I would like to have these semantics of languages themselves, be somewhere in the blockchain. Okay, because I want that to be fixed. Nobody can touch a semantics. It has a unique ID version number 3.2.7 and it's fixed.

Starting point is 01:08:18 I don't want that to be touched. So then you can refer to it on the blockchain and then you can generate all the tools from it. And then you can have your smart contract verified. And I want to go one step further. Not only verify the smart contract, but to have an actual proof of correctness of the smart contract. And that proof itself or a hash of it, right, to be also stored on a blockchain. So it's not only that I have my smart contract on the blockchain, but I also have a proof why that smart contract is correct. And that proof is checkable by you or any third party in a way that you don't have to trust the K framework.

Starting point is 01:08:56 Because the K framework, like any other frameworks, can have bugs itself, coke, have bugs, right? People prove false with Kock, right? So bugs are unavoidable. But what we'd like to do is to use these tools, this program. verifiers like the K framework, for example, to search for a proof. Once you have the proof, you should be able to generate a proof object that is completely checkable, independently of K. And I want to attach that proof object to my smart contract.

Starting point is 01:09:25 And this way, you can imagine having smart contract on the blockchain together with certificates, correct in the certificates. Hey, my smart contract is correct. This is the proof. So don't even think about attacking it because it's been proved, correct. And this is the proof. So if you believe that this is the specification, let's take an ERC20 contract, right? You have a specification for ERC20 that is public, everybody agrees that yes, that is the

Starting point is 01:09:50 formal specification of PRC 20, and now you prove that the smart contract implements an ERC20, end of story. It's correct. I have a proof that is checkable. I don't care how that proof was generated, whether with K or something else, I don't care. There is a fixed semantics of the language, there is a fixed proof, and those can be checked. And you can even imagine checking this proofs as a service on the blockchain, and you don't even have to worry about, as a user of a smart contract,

Starting point is 01:10:19 about K or anything like that. You already have like, you know, okay, that is green. The contract is green. It's been proved. Correct. But of course, in order for this to happen, I mean, there is a lot of work of work to be done, you know, to generate proof objects. We don't generate proof objects yet.

Starting point is 01:10:36 Also, the compilers, which are still working on compilers from the semantics. We call the semantics-based compilation that's not yet there. Some research to be done. To improve performance. So is this what this EOSC 20K project is? Because the EOC20 is just like an interface, right? It's just basically just syntax. It's saying this is what a function should look like.

Starting point is 01:11:03 And then so this ERC 20K project was adding semantics, like not only is this in the syntax, but this is how what it should do. This is what a send a transfer function looks like or does. So we were asked, we've been asked to verify some ERC20 tokens using our semantics of the Ethereum virtual machine. And we verified some properties, but we didn't know exactly what does it mean for something to be RC20 compliant. Right? So yes, we looked over this syntax, over this API, and there were lots of,

Starting point is 01:11:33 of words describing what the API is supposed to do, but that's not rigorous enough in order to get proofs done. So then in ERC 20 K we formalized the ERC20 intended protocol completely to the last detail. All the cases considered, there are 13 different rules, semantic rules that were hidden in all the text in the ERC 20 informal description. And now we have a specification. We know what ERC 20 is from, you know, mathematical point of view. And now if somebody gives me a token, an implementation, let's say,

Starting point is 01:12:08 the EVM level and claims it's an ERC20, I can now take it and automatically check it against the RC20 case specification. And if there's any flaws, then those will be found and reported. Given that, you know, many of our listeners are probably familiar with like ERC 20, could you maybe share whether any, like, hidden things you noticed in the ERC 20, like, any gotchas that may have not been noticed unless we went through this formal respification. Yes. So people typically don't think of transferring from you to yourself. Or when you call transfer from, you can transfer from A to B and you can be the color C.

Starting point is 01:12:50 Sometimes A, B and C can be the same entity. Right. So what if you transfer from yourself to yourself and you give yourself the allowance to transfer? Right. So sometimes it's interesting that the order of operations matters, right, whether you first deduct the value from the account from which you transfer and then you add the value, you know, to the two account versus if you add the two first and then you deduct the value from the from account. So these are completely orthogonal operations provided that the two accounts are different from and two. But if they're a product, you deduct the value from and two. they are the same account, the order matters because one of them will run you into an overflow, the other one will not. So you may prevent an overflow by doing these operations in the right order. And actually, depending on which order you do these operations, you can have two different

Starting point is 01:13:44 specifications of DRC20 and you'll have to comply with one or the other, which is a bit, I know, inconvenient. But yeah, so this was the totally unexpected behavior that when you transfer from yourself to yourself. And we found lots of implementations where people sometimes prefer to add the value first and then deduct it and others who first deduct the value from the balance of the from account and then they add it to the other one. And now you can have one case because that's where typically errors are in corner cases. You have this corner case when you transfer from yourself to yourself.

Starting point is 01:14:23 And if you don't implement things properly, you may even be in a situation where you generate new tokens. by transferring from yourself to yourself, you generate new tokens, which is crazy. I think like we are running out of time, but one of the last topics we would like to cover is your partnership with IOHK, your work for the Cardano project.

Starting point is 01:14:47 So there you have created something which is called the K-E-V-M, right? So walk us through what K-E-V-M is and how it will be used in Cardano. So K-EV-M is a, is a formal semantics of the EVM, the same like the ERC 20K. Right, so we formalize the entire Ethereum virtual machine in the K framework and that gives us an interpreter,

Starting point is 01:15:10 basically implementation of the virtual machine. And that is fast enough, it's only like an order of magnitude slower than the C++ reference implementation. And the IOHK actually deployed this virtual machine generated completely automatically from the formal specification on a test net, a couple of weeks ago. And the planes do the same with Yale,

Starting point is 01:15:33 which is even the next generation of virtual machine for the blockchain that we designed together with IVEK. And, yeah, so first of all, so it's a formal semantics of a low-level machine and you can use it to execute smart contracts the same way you use the EVM implementation in C++, right, to run a smart contract in the blockchain. However, you can also use it to verify smart contract, to prove part contract, which is exactly what we use to verify all this year's 20 compliance that I was talking about.

Starting point is 01:16:11 So that's what KIVM is, and it is very tempting to deploy on a test net such a virtual machine because it is essentially correct by construction. There is nothing to prove about it. right? So you mentioned before Sunny that you have this complex implementations and how do you know whether they are correct, if they are large and so on. So the actual EVM is large. That's a large piece of software, right? But if you generate it completely automatically from the former specification in a way that makes it correct by construction, now you have high confidence that you will not have bugs in that part of the system. So you didn't mention that it's like almost an order of magnitude slower than the C++ reference implementation. So would it be possible instead of, and like instead of using it as the base client, like the client that everyone uses, so have a test net or a network that's entirely K-EVM, what if it was used more as the reference client or like what I call it like I think Phil Dian, I'm sure you're from it. It definitely can be used. And actually, if you go to jellopaper.org,

Starting point is 01:17:26 we have, we deployed actually, we have a paper meant to replace the yellow paper, right, which has, which is a reference implementation of the EVM, basically. You have a reference to presentation from the semantics. However, however, we are working on a project now also with IOUHK where we want to generate a faster backend 4K,

Starting point is 01:17:46 an LLVM backend. So we are going to translate automatically the semantics into LLVM and we expect that to be an order of magnitude faster. I see. And once that happens, we'll be able to actually generate comparable in performance virtual machine implementations

Starting point is 01:18:06 from formal semantics. And then you may wonder, why should I even bother implementing a virtual machine if I can generate it automatically from the formal specification correct by construction and it has comparable performance? why should I implement? Why should I take the risk to introduce bugs if I don't have to?

Starting point is 01:18:27 Essentially, Professor, is it correct to think like, so today you see a lot of different projects in this space where these projects set out to implement their own virtual machine and a language that compiles down to their virtual machine and then this language helps informal verification. in a particular way. Yes. And so this is the story behind Tesos.

Starting point is 01:18:54 This is the story behind Skilla, which was the last project we did. This is, there's another called Zen Cash or something like that. So Zen project that's also doing something similar. And essentially this K approach with Cardano could be like one approach that like overshadows and solves the problems that these, projects are trying to solve. Essentially, in the K approach, what would happen? I don't know if it solves all the problems,

Starting point is 01:19:26 but it solves one problem, which is that you can write a smart contract in any programming language, for which you have a formal semantics. Exactly. So, once this is running on Cardano, so let's say the programming language of Tesos Mikkelson was interesting.

Starting point is 01:19:44 Developers wanted to use it. Somebody could just express the formal semantic, of Mickelson in K. Yes. And then developers can write a smart contract in Mickelson and execute it in K-A-E-V-M. K-E-M or ELE. ORELE.

Starting point is 01:20:01 But in order for that to happen, we have to still finalize this project. We're working together with IOWHK on, which is called semantics-based compilation, which is the following. You give me a language semantics, and I give you a compiler for that language. And the compiler is basically a translator

Starting point is 01:20:17 from your language to a low-level language, which we plan to be Yele. Okay? So Yele will be this new virtual machine. It's an EVM-like virtual machine, low-level language. It's actually a combination of EVM and LLVM. You may have heard of LLVM. So it's an LLVM for the blockchain, right?

Starting point is 01:20:37 We design this language together with IELHK. And what we want to do is to take any language for which you have a case semantics and automatically generate a complete. compiler for that language to Yele. Alright, so if you, suppose that you have, you like JavaScript, suppose, right? You write your smart contract using JavaScript. And as far as you have a semantics for JavaScript in K, you can push the button and translate automatically, correct by construction,

Starting point is 01:21:07 your JavaScript smart contract into Yele and then run it on the automatically generated YLE VM. So everything is correct, the entire, stack, right? Entire pipeline is correct. There is nothing, no code that needs to be verified by hand anymore because the entire thing is either generated automatically from the semantics or is formal semantics. So you, with the Yale, I notice you guys also have a K-Web assembly system as well. And so what were some of the reasoning behind the building of Yale? Because I know WebAssembly is also very similar and closely tied to LLVM. Yes.

Starting point is 01:21:52 So what were some of the benefits, like thought process of building YELA when you were looking at like existing WebAssembly and EVM and stuff? So you may know that LLVM has been created at the University of Illinois. A professor in my department Vikramadva was behind LLVM. So we have a lot of legacy here at UICC with LLVM, and we love LLVM. And particularly one thing that we like about LLVM is all this optimization, the LLVM optimization pipeline, right? We take an LLVM program to another LVM program which is better, right, from some points

Starting point is 01:22:32 of view or by some metrics, and then you take that to another LLVM program. And so you keep transforming the program until you get the final program that you are happy with and then you translate to native code. So we really like this approach because in particular you can imagine that you can take any high-level programming language, translate it quickly to Yele, let's say, right? Because Yele is like LLVN. Yele-VM for the blockchain, think of it that way. I translate from that high-level language, I say JavaScript to Yele, and then you go from

Starting point is 01:23:03 Yele to Yele to ELE to improve the gas metric, let's say, or to improve the size of the code, or you know, you have some things to improve. And we can use many of the ideas that have already been done. developed by the LLVM community. And that's what we'd like to build up. And we'd like to build up on all this legacy and all the tooling that has been developed around the LLVM project.

Starting point is 01:23:24 Well, if we did the same for WebAssembly, then we'd have to reinvent probably many of these techniques for web assembly. And why do it when we already know that this work for LLVM? And but yeah, so as you may know, actually, we define the semantics in K of all these three, right? we define the K-EVM, we define the semantics of WebAssembly, and we are still defining it now.

Starting point is 01:23:48 So one of my students ever at Hildembrand, he's actually also working for the Teterium Foundation, and he's defining the WebAssembly in K as well, and then we have Yele that we defined for IOHK, with IOWHK. So it's not like we favor one over the others. Actually, to be honest, I favor YLE over EVM, because we really designed Yele. with the explicit objective to avoid some of the problems that we learned while we

Starting point is 01:24:19 formalized EVM. But between WebAssembly and Yale, I don't think there is any big differences in terms of, you know, one should be definitely better than the other, or one is better than the other. And as a developer of the K framework, to be honest, I would like all the flowers to bloom. So the K framework allows all of these to leave other same time and once you have semantics-based compilation you can just change the target language and then you can go from any language to any VM right i can go from javascript to to yele or to javascript from javascript to evm or from javascript to watson although should be possible because because of the way the code is generated it's it's not hard to change the back end

Starting point is 01:25:05 of the of the code generator and that's for execution but then there is always the verification aspect Right? So having formal semantics for all of these, both for the languages and for the VMs, allows you to have no code left, which is not formally verified, which is my dream. That's what I'd like to see in the blockchain space. I'd like all the smart contracts to be formally verified. Sounds great, Professor. I mean, my sense is that the CA framework is the most general solution to all of these small,

Starting point is 01:25:40 specific problems that many of these blockchain projects are trying to solve by inventing their own languages and their own virtual machines and things like that. And I'm looking forward to what change KV brings in the... So please don't interpret me wrong. I'm not saying that these languages are not great. I think all of these languages are great and they must be done. We have to come up with better languages for smart contract with better virtual machines. But what I would like to encourage everybody to do is to have former semantics for all of this, and then to have these generic tools that allow you to easily change the language or the VM and still have all the tooling around, as opposed to re-implementing, reinventing the wheel over and over

Starting point is 01:26:24 and over again, which is what we see a lot happening, at least in the academic community. Cool. Okay, so that brings us to the end of the show. To our listeners, thank you so much for joining us. We released new episodes of Episode of Episandah every Monday or Tuesday. Thank you, Professor, for being a guest on this podcast. We enjoyed the conversation a lot. Thank you, guys. Great questions. I'm very impressed. You really understand well the K-framework. That means that we did our job well.

Starting point is 01:26:58 So you really, it's amazing how deep your questions are, right to the point. Your videos was on documentation were really good. Like I'd watch through some of your tutorials and stuff. Those were very helpful. Excellent. Thank you, guys. So for our listeners, if you're tuning in for the first time,

Starting point is 01:27:16 you can subscribe to the show on iTunes, SoundCloud, or your favorite podcast app for iOS and Android. You can also watch a video version of this particular show on YouTube at YouTube.com slash epicenter Bitcoin. Recently, we have also started a Gitter community, where you can chat with us, the hosts and other members of the Epicenter community. The link is Epicenter.tv slash Gitter. We look forward to catch up with you there.

Starting point is 01:27:45 We always welcome reviews of our show. It helps. It gives us feedback. It allows us to improve. It keeps us going. And it helps other people discover Epicenter. So please leave us a review on iTunes. And we look forward to catch you next week.

Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Grigore Rosu: The K Framework – A Framework to Formally Define All Programming Languages

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.