Algorithms + Data Structures = Programs - Episode 172: 🇺🇸 Sean Parent on Flash, Chains & Memory Safety

Starting point is 00:00:00 Yeah, I mean, I think the information coming out of Department of Defense and the White House is filled with a lot of hyperbole. But at the base of it, it's not wrong. Just, you know, in working on this chain library at one point, I accidentally wrote T instead of decay type T. And the result was I captured a reference instead of a value. Even though the code looked like it was fine, the type was not. And that meant that I had a use after return on my stack, and it cost me half a day to debug the thing. Hear that, folks? The luminary. Sean Perry, debugging his problem for half a day. There's hope for all of us.

Starting point is 00:00:50 Yeah, so it's too easy to make a mistake like that. Welcome to ADSP, the podcast episode 172, recorded on March 7th, 2024. My name is Connor, and today with my co-host Bryce, we interview Sean Parent from one of the New York Adobe offices and chat about Adobe Flash, his new library and idea about function composition called Chains, and his latest thoughts on memory-safe programming languages and C++. And we're live. Connor, why are we in a cab? Turn your gain down.

Starting point is 00:01:38 Testing, testing. We're in a cab because we went to the wrong Adobe offices. We are going to interview Sean. Connor, what was the one job that I gave you? Listen. I asked Connor, hey, Connor, can you get the address of where we're going? And then we show up at the wrong building. So now we're going to have like 40 to 50 minutes to interview Sean Parent.

Starting point is 00:02:01 We are currently at Broadway and 28th? 28th, 20th. Listen folks, I'd like to clear my name here. We were at the correct Adobe office for the meetup. That is true. It was unclear that

Starting point is 00:02:20 Sean was going to be at a different Adobe office. But Connor mentioned that, hey, Sean said that the office is really near his hotel. I did. And the office that we were going to be at a different Adobe office. But Connor mentioned that, hey, Sean said that the office is really near his hotel. I did. And the office that we were going to is not near his hotel. And I said, well, maybe they have a second office in New York. I didn't think that there was no possibility of there being a second office in New York. But in fact, Adobe has two offices in New York.

Starting point is 00:02:42 It's New York, of course. I mean, literally when I worked at Amazon, they have like six offices in Vancouver, which is like a fraction of the size. Anyways, but... Should we bottom out on that reduced by key thing? Not yet, not yet. I'm clearing my name. But my point was is that I don't think it's unreasonable to assume that Sean would be at the Adobe two hours before the meetup happens at that building. You're not wrong.

Starting point is 00:03:06 And, yeah. Oops. My bad. And that's why we're recording in the back of the cab right now. A beautiful yellow New York City cab. Not an Uber, not a Lyft. It is nice. Because those take forever to show up.

Starting point is 00:03:19 It is nice. You just stick your hand out and it captures it. It's wonderful. And now, yeah. Now we're going to go to the correct Adobe office. All right, we've got to get out of the, we've got to get out of the, I was going to say elevator, but it's a taxi. It is a taxi now. I'm going to clink these microphones.

Starting point is 00:03:38 We've got two mics now. Thank you, sir. You have a wonderful day. Okay, well, same to you, sir. You have a wonderful day. Okay, well, same to you, brother. Try not to forget. Thank you, sir. First ever pod recorded in a cab. Yeah.

Starting point is 00:04:03 Here, take There's a possibility that we've swapped mics, which would make your editing life so much harder. Not really, honestly. Okay, good. Then we should swap mics. Awesome. Thank you so much, sir. All right, folks. The main office is on the fourth floor. They have 408. Awesome.

Starting point is 00:04:26 Thank you. Thank you so much, sir. All right, folks. We're going to the fourth floor. See the further south you go in Manhattan. Holy smokes, it's hot. The further south you go in Manhattan, the hotter the elevators get and the smaller the elevators get.

Starting point is 00:04:39 Damn. This one's got a little small weird door. I don't think they have a reception. I don't know. We are outside of the Adobe Union Square offices. Please check in at reception. Where is reception? I think we should just did we just message Sean

Starting point is 00:05:05 to NVIDIA employees try to break into Adobe offices Adobe is a weapons free workplace this is ADSP the podcast episode 172 I think oh there is Sean

Starting point is 00:05:21 look at that like magic Sean just popped out of the elevator this was a series of unfortunate events Oh, there is Sean. Oh, look at that. Like magic. Perfect timing. Sean just popped out of the elevator. This was a series of unfortunate events. We did not think that it was possible if there were two Adobe offices in New York. No, that's not true. We literally, I said, I didn't say that New York didn't have two.

Starting point is 00:05:37 I just said, why would Sean be at a different Adobe office than the meetup? And I was wrong about that fact. You definitely were. This was your fault, and I chose to say it. No, is it my fault? I'll take the blame. You know, I'm Canadian, and I apologize. the meetup and i was wrong about that fact uh you definitely were this was your fault and i and i know is it my fault i'll take the blame you know i'm canadian and i apologize on behalf of uh

Starting point is 00:05:51 the two americans here yeah all right let's um let's wait are you not able to get into these offices either no oh i'm gonna say we're all we're all locked out. All right. We got you on the leash here. All righty. This is where we steal the IP, folks. Ooh, these are nice offices. You know, I've never been in an Adobe office that does not have nice art. Well, I mean, it kind of is like the creative company, you know?

Starting point is 00:06:19 I know, exactly. We're upstairs. Okay. Ooh. We'll whisper like we're in a library. Very quiet. What has this podcast become, folks? Let's take the picture after 5.

Starting point is 00:06:49 Alright. Open concept, folks. Alright. Thank you to the lady that helped us get the directions to the room. I really hope that you keep this all in. Oh, yeah, definitely. Hear what we're saying? All right, we successfully snuck in to our second Adobe building.

Starting point is 00:07:14 This one has Sean Parent in it. And we'll make sure we got the gains right. Who knows about the audio quality of our little cab ride there? Good to have you back on the pod sean thank you you were we were talking about room names and you were explaining yeah well bryce mentioned he did not uh or he was seeing a pattern with the the room names on this this floor um and i said in san jose when they put in the west tower which is our first first headquarter building they named all the the Tower, which is our first headquarter building,

Starting point is 00:07:49 they named all the conference rooms after fonts, alphabetical, which made it very easy to find the right conference room, alphabetical by floor. And then when they put in the East Tower, somebody said, oh, I see, some of the fonts in the West Tower are named after cities. So they continued that theme since there's fonts like New York and our East Tower. All the conference rooms are named after cities, which is very confusing because now if you have a meeting in New York, which building are you in? There's a font called New York? There is a font called New York.

Starting point is 00:08:22 So there are many fonts named after cities. I guess fonts is not the first thing that comes to mind when I think of Adobe, but I guess it is one of the things that you do. Yeah. So we have a significant font business, and there's even a Warnock font, which is design influenced by our our founder so have you ever had to deal with any interesting programming problems related to fonts not at adobe i actually worked on

Starting point is 00:08:59 what became true type which was Apple's and Microsoft's font system. Most of the development started at Apple, and it was part of the QuickTryGX project. I worked on the QuickTryGX project. So I worked on bits of that, largely trying to optimize how you render quadratic Bezier's. So Apple's font system was based off of quadratic Bezier curves,

Starting point is 00:09:33 where Adobe, everything is cubic. I heard a story. I don't know if this is true, but I heard a story that on Windows, the little spinning logo is actually a font because their font system is so optimized. It's just the best way to do a little spinning logo is just a series

Starting point is 00:09:58 of glyphs in the font. I have no idea if that's true. You've got the topic. Yeah, I do have the topic. We don't even know if Sean has anything to say about this, but Bryce assured me that the next topic, because you have questions and he better have answers. Well, there is something else that Adobe is known for or was known for in the past that we've not talked about here. Usually, I think these days we think about Adobe, we think about Photoshop, we think about generative AI.

Starting point is 00:10:28 But it used to be that most people knew Adobe because of Flash. And I remember when I was back in the day, when I was a kid, all the cool websites used Flash. Connor, you said yesterday that you used to be an action script wizard i don't even know really what action script is action script was the language that you program flash in and i wrote well a couple different applications but one was like a pretty it was terrible code i didn't know what i was doing back then but it was like a stock technical analysis program and it was amazing because it was so fast like i i originally wrote it in visual

Starting point is 00:11:06 basic six but like drawing all those little rectangles and every single time you hit a button and it had to display everything it just crawled to a halt but flash you can just basically like redraw everything and it's silky smooth like doesn't matter how much stuff i put on screen anyway so i put action script 3..0 at my expert languages at one point. But anyways, yeah. Yeah, so did you have anything to do with Flash? Did you have any involvement with Flash? Very little.

Starting point is 00:11:39 So Flash came in when we acquired Macromedia. It was initially a Macromedia product, and it came in through an acquisition. And when I came back to Adobe from Google, our CTO was big on Flash and decided that we were going to write all of the mobile products in Flash. And we had just started this project called Adobe Revel, which was a consumer version of Lightroom and eventually kind of morphed into Lightroom Mobile. And we did not want to code a Revel in Flash. And so we had a plan because Revel was going to largely be a tablet-based product. So it was going to be centered around the tablet. And the idea was to then feed into a Lightroom on desktop. And then Adobe had a product called Photoshop Express, which is still there in the App Store,

Starting point is 00:12:51 but has very little to do with actual Photoshop. And so our thinking was to repurpose Photoshop Express since it had kind of the camera tech and stuff, and to tailor that as the way to feed the ecosystem and get pictures into this consumer product standpoint or consumer product line. And so we had just started working with a team that was doing Express to morph it into what we needed.

Starting point is 00:13:23 And when this mandate came down, we decided to sacrifice Express to get rewritten into Flash. And so that eventually got done, and then it was like a two-year effort, and then it was scrapped. What was it written in originally? It was written largely two-year effort, and then it was scrapped. What was it written in originally? It was written largely in C++, so with a bit of objective C++ at the front and side.

Starting point is 00:13:57 Wouldn't it be a pretty big performance hit to rewriting nothing from C++ to Flash? Yeah, but our CTO thought that that was the way of the future and that hardware would get faster than it would matter. And it was not the case. So Express got rewritten into Flash, and it never shipped because it took them two years to get to feature parity to what they had two years ago.

Starting point is 00:14:28 And it just was all around a worse experience. So that got scrapped. So we were very thankful that we didn't take Revel down that path. So I had some tentative involvement with Flash during those times. And then about the time Flash was almost canceled or about to be canceled, right? For people who don't know, the reason why Flash was canceled was it just had an unbelievable stream of security issues. It was written as an old Netscape browser plugin. And so it plugged into your browser, and that was a C API, and so it was running outside of the sandbox. And one of the big reasons why people liked Flash is because it could give your website access to your operating system and your operating system capabilities.

Starting point is 00:15:28 The problem with that is it gave websites access to your operating system and your operating system capabilities. The problem with that is it gave websites access to your operating system and your operating system capabilities. So it became a big security problem, and there was a stream of exploits against it, and they got patched and patched and patched, and that kind of sucked in so many resources it was decided that that it was untenable to maintain it there were a couple efforts to to sandbox it or to rewrite it at the time it would have been into uh asm.js and pinnacle which were precursors to wasm um uh both of those rewrites were were were internal, but, uh, we couldn't see a way to, to roll them out because being sandboxed, they cut the capabilities off from the desktop. And so even though you would be able to run some sites in them, the main business that we had for Flash was not around games, which is how most people know

Starting point is 00:16:26 Flash content, but it was around a corporate business where corporations developed their in-house applications to be written that way. And all of those required access to the operating system. And there wasn't a way to provide that without opening up a bunch of security holes. So it got killed. This was right at the time when the first software technology lab was in the process of being nuked, and there was a point there briefly where the team got redistributed, and I ended up reporting into the flash organization but now it was the flash authoring tool which was in the middle of a big rewrite on top of HTML 5 and so I got to that assigned to that team what's the name of that tool

Starting point is 00:17:19 is it is it a product these days yeah I think it still is I don't even know what the name of it is now for a while it was just called flash which was very confusing because i can't imagine why yeah yeah yeah uh yeah so so i got assigned to that team for a little while i spent like less than a month consulting with them and this was when i was uh uh we were trying to figure out what was going to happen with uh software technology lab and then I had a sabbatical and so the day before my sabbatical I got called into a meeting and they said well we wanted to have this straightened out before you left on sabbatical and the decision is is we're we're distributing the entire team all over the company

Starting point is 00:18:06 and uh that's the way it's my my uh my girlfriend it keeps every time before vacation her boss like the day before vacation boss schedules a meeting and then it's always some bad news and then it's just like it ruins the whole thing yeah yeah so so my sabbatical was spent looking for another job. And so I came back and resigned. Was that when you went to Google? That was when I went to Google, yeah. Have we talked about Sean's time ago? We have, yeah.

Starting point is 00:18:36 That's the famous Rotate story. Oh, that is. Yeah, that's where Rotate came from. So that's the little involvement I had with Flash. I got what I wanted out of that. My thinking is like, you know, it makes the Flash, renaming to Flash for a different product makes Microsoft Teams new teams not seem so bad because they prefixed at least a new. We work at nvidia i don't think we can give anybody shit about product naming because ours is pretty bad tesla we got we named tesla first

Starting point is 00:19:14 and then they you know tesla at nvidia is the name of a business unit a product um a micro architecture and uh like a particular uh gpu skew well in my ivory tower and then i do my research and i wasn't aware of all that so it's actually my business unit is called the tesla business unit the that's what the hbc business unit was called back in the day tes Tesla was the name of the first NVIDIA, like, real compute architecture back in the day. Well, we've got halfway through the episode, and we've still got 25 minutes until Sean needs to go to his next meeting. So maybe in the last 25 minutes, we might get to a couple more topics. You can give us, we are T-minus 25 minutes from Sean's next meeting, but T-minus one hour and 55 minutes from the New York C++ meetup being held at the other Adobe office,

Starting point is 00:20:08 which we already have badges to because we went there a little earlier, as you heard. You're going to be speaking. Do you want to tell us about your talk and yeah, what you're going to be talking about? Sure. I'm going to be talking about, well,

Starting point is 00:20:22 this is a kind of a draft of a talk about a draft of a library, about a half an idea. So we'll see how this goes. So I've been talking with mostly with Eric Niebler about senders and receivers. I looked it up. It's more than eight years we've been having conversations around this. And I built the STLAB concurrency library, which is mentioned in the sender-receiver proposal as prior work and some inspiration there. And I've had some issues with sender-receivers. And one conversation that I've had with Eric repeatedly is at the end of the day, I think senders and receivers are just a way of doing function composition. So basically all that they are, are you're doing, you're composing a bunch of functions, which then you're going to execute.

Starting point is 00:21:18 And those functions describe asynchronous or concurrent operations, and you're effectively building a program and then running the program. And it's a somewhat functional language that you're building them in, but the sender-receiver interface doesn't readily expose that. You've got you know some complicated

Starting point is 00:21:46 concepts that have multiple interface calls on them for setting values and setting exceptions and setting stopped and getting stop tokens and and and it's this very broad interface that doesn't look like under the hood, like just function composition. And so a couple weeks ago, literally a day less than two weeks ago, I had this thought about, well, if you just took a sequence of functions and stuck them in a tuple, that you could use a fold expression

Starting point is 00:22:22 to execute them flat in sequence where the result of one feeds to the argument of the next and could make that work. And so I wrote a little snippet of code to do that, and then I started to think, how do you go from there to building up the rest of the capabilities that you would have in sender-receivers. And got pretty far and had some insights along the way. The work certainly isn't complete at this point. But it's got some interesting aspects to it in that the structure of these things,

Starting point is 00:23:02 I call them chains, it's really just a two-dimensional tuple of tuples under the hood. And that's maintained and exposed. The structure of them is very simple, but when you want to actually do something with them, you can wrap an algorithm around them that transforms that structure into a new structure and then executes that new structure. And what that means is, like in senders or receivers, you would say then and hand it a lambda. And what that is in senders receivers is, what is it? It's a sender adapter closure, right? So it's taking a lambda and it's capturing it in a sender, so, and adapting it to the sender interface and giving you back a new function, function in quotes, which is a sender that accepts a sender and returns a sender.

Starting point is 00:24:07 So it's this complicated little thing. And in my model, then lambda expression is just take the lambda expression and append it to the end of the tuple. That's it. And then all the magic happens when you go to execute this, all the transformation to execute it. And I think there's some significant power in there. So I've shown that I can do the, you know, transfer the execution onto other threads or other execution contexts, and I

Starting point is 00:24:36 can inject in the transformation cancellation tokens and error handling. You know, these functions are free to throw exceptions if they want to throw exceptions. All of the handling of that is handled outside and injected into the system. So that's what my talk is about, and that's the idea. And I kind of build it up from first principles, and I start with the ideas that led to me building the STLAB concurrency library and what are the problems with the ideas that led to me building the STLab concurrency library, and what are the problems with the STLab concurrency library, and why senders and receivers

Starting point is 00:25:11 are one potential way to solve that problem. The punchline at the end is I'm not quite sure where to go from here. I've discovered some things that are broken and stood exact, and the realization that at the end of the day, this is just a, which we're building up as a functional language. We would like to prove that it's computationally complete. So I've nerd sniped the Studexac authors into trying to figure out how they do that inside of std exec and trying to fix the issues that I've raised, which are mostly around the split operation in senders receivers, which

Starting point is 00:25:53 is a required operation if you want to prove you're computationally complete. For Connor, you have to be able to implement an S combinator, and an S combinator repeats a term, and that's a split. We love that. The starling bird. We love the starling. Yep. K is the kestrel. K is the kestrel.

Starting point is 00:26:15 S is the starling. Yep. In this talk, it'll be – I think Adobe records them. I remember seeing – or was that when Sean Baxter was at Bloomberg? I mean, hopefully this will be recorded. It's supposed to be recorded. And it will be on YouTube at some point. I'm sure at some point Sean will probably give this talk at some other conferences too.

Starting point is 00:26:34 This is true. This is true. Yeah. But the eager and avid listener will want to see it as soon as possible. But, I mean, we were chatting about this. Was it two nights ago on Tuesday when we were at Sugarfish? And this time you explained it. Well, I think the same way, maybe slightly different details.

Starting point is 00:26:55 But it made me realize that I actually, in a first edition of a different tool that I was working on at work, I did something not exactly similar. But I was trying to store views in these tuples and like the C++ 20 and 23 views. Because at one point I was trying to like lazily build these up and then just like launch them at the end. But that ran into just like massive compile time problems because the larger those things grow without actually, you know, invoking them, the slower and slower the compile times get. And because I was like automating this stuff, I was doing something super terse and then it would explode into this massive thing. So then the alternative I thought was like, what if I can

Starting point is 00:27:38 just store each of these as like, you know, a slot and a tuple or whatever with of some arbitrary length. And it worked up to a certain point, but then I just ran into a mess. I'm not like a crazy template metaprogrammer, so I was probably doing some stuff wrong. But I remember like, you know, trying to store lambdas in tuples because like lambdas have unique signatures

Starting point is 00:27:58 and you had to stamp out a bunch of different, like, and then you said you're using fold expressions, which I was not doing. Is there like a bunch of tricks that you're using to do this or or it's it's it's not much of a trick i mean if not in my slides but if if you um uh so if you've got a tuple of functions and you can call uh std apply to basically split those out into an argument pack. And so now, now you've got, got an argument pack of functions. And then you can take the, the arguments that you want to call it with, and you, you turn that into

Starting point is 00:28:39 a, you capture that itself into a tuple, whatever the arguments are, and wrap that into a little struct that's got a pipe operator. And it can then feed the next function and capture the result of it as a pipeable entity. And so now you can just say, my initial argument's pipe, my argument's pipe, dot, dot, dot, and you've got a fold expression. It will execute the complete sequence of tuples flat. I mean, it is a cute trick.

Starting point is 00:29:10 I definitely think that would have been useful in my failed attempt to do that. Then I just started, I switched to Python and made up on C++ and, you know, dynamic interpreted language. Woo! That's a lot easier, folks lot easier folks yeah stuff on the fly well you know i'll i'll admit when i was starting to to build the code i got lost in the type signatures pretty quickly and i was like like okay i'm just gonna start with make it work with an any right right we collapse everything to an any make it work through that and then then figure

Starting point is 00:29:46 halfway to python basically halfway to python yep yeah it's interesting because um one thing that i and to some degree connor have been working on recently has been looking at how do we paralyze range pipelines and one of the challenges with that is is that if we have like a range pipeline that has a filter somewhere inside of there, we got to go find that filter and we got to replace it with some other operation that introduces some tombstone values and then later we remove them. And there's all sorts of other optimizations that we got to do or we have to be able to inspect the whole pipeline. And because of the way that range pipelines are built today which is where they're recursively they're nested um it's a little bit

Starting point is 00:30:32 harder to do this um this decomposition and optimization and you know if instead of being this nested structure if they were um represented as like a tuple of of views or a tuple of things that were going to be applied together it would be so much easier things to be composed instead of things already composed exactly exactly right yeah because because like the i think maybe the realization that we didn't have was that um like with ranges the the reason that they come to you already composed is because there's you know that there was in the serial world there's really just sort of one way of composing them um but um but actually or sort of naively there's just one way of composing them but actually you might want to do optimizations or transformations or compose them in a different way, you know, apply some different thing to them.

Starting point is 00:31:28 And so representing them in the uncomposed form, and that gives you a lot more flexibility. Yeah, and that's what I found is you can inject, like I said, all the structure you need. And so instead of in the sender-receiver model, where you're pulling through a message for what happens if there's an exception and messages for stops and things like that, that interface impacts every single level all the way through the pipeline because it all has to be composed together. If instead you say, just give me the structure, all of that information can be injected after by just transforming that structure. And when the structure is flat, it's pretty simple to do. And since these are all, you know, it's a tuple of tuple of functions, and they can be lambdas, but you could also have a type in there, which is your filter.

Starting point is 00:32:28 And so now you can scan through the tuple of tuple and pull out particular types. So if you have additional information you want encoded in there, you can encode it in the type system and pull that out as part of the transformation. I wonder too if it depends on how it was implemented, but like if you have an ability to avoid one of the worst parts of ranges, which is the errors at compile time, because like, even as a ranges expert, and you know,

Starting point is 00:32:58 I'm not Tristan or Eric level, but you know, I've spent quite a bit of time playing around with ranges. Every once in a while, I'm running into my first few errors. It's just like this wall. And it's not wrong. It is technically the type, and it's just this nested crazy thing. You might run into the same problem, but potentially there's an opportunity to avoid that. Yeah. It's potential at this point. The error message that I most

Starting point is 00:33:24 frequently get is Tuple doesn't support the call operator. And that really means that some piece of my tuple in my fold expression is gone and it doesn't tell me. Since it's within the fold expression, it's like one of the 20 things you have in this pipeline is bad right yeah and and that's um that's a pain i've got some thoughts about how to to well there's there's a couple things right now i'm building things where everything takes you know auto dot dot dot as the args and and which means you're able to build these very generic computational patterns. But that also means when you actually call it with, you know, invoke it with 42 and something goes wrong because there's a type mismatch in the path, that you're completely lost.

Starting point is 00:34:19 And so one of the things I'm playing with now is what are the tradeoffs between having the developer declare the interface for the entire construct up front so that as you append each thing, you can say, okay, if I know what the interface is, then when I append the first thing, I can see, well, will it take those arguments? And then what does it return? And when I append the next thing, I can say, will it take the thing that's returned from the previous thing? And so on. So I can type check the entire path. Yeah, it's kind of – I think you make a – it's a very interesting observation there. And I think to do better, you have to not diagnose either when you're folding or once the thing is folded because the problem is that the folding operation, you end up with this recursive structure. And then if something bad happens down deep in the structure, then it has to roll up the stack. compiler diagnostics is are going to work it's going to tell you well this failed here and you

Starting point is 00:35:25 know and then this is the failure in the source and then that was in this context in this context in this context and i'll walk up this stack trace but um if you've got it as this linear chain of tuples um then you maybe have some other options where you can um do some checking uh before you do the fold where you just go through each pair and say like, do these things, you know, does the first one connect to the second one, the second one connect to the third one. And that could, I think, give you better diagnostics and give you more of the flat diagnostics that you'd get from just if you were writing like straight imperative code. So yeah, I think that that has some real potential there to give you better diagnostics. I don't know whether it will help.

Starting point is 00:36:14 The other challenge that I think a lot of people have with ranges is around compilation time. But same for senders. I don't know that this helps you there per se because ultimately at the end of the day, you're still doing the same composition. But I think it may really help with the diagnostics. It's very interesting. I mean, I'll link in the show notes. This has nothing to do with C++. But I saw in a talk by Jose Faleem, the of Elixir as they have a big pipeline it's like not even just a library it's the whole way the whole language works and they showed this little

Starting point is 00:36:51 demo in a Elixir live book I think is what they call it it's like a Jupyter notebook but they just rebranded it and you build up your little pipe expressions and then if you have a problem they just like in the notebook you can like toggle off each one of the pipes like if you got a functional map or a reduction or whatever and then like you there's a little hamburger bar and you can slide them around and stuff so i was watching this and i was like oh my god like when you've got a problem in c++ with ranges it's just like you know yeah a wall of template mess. Whereas over in this dynamically typed language, they're like, well, check out this little demo of debug. And like you could, as the final thing, you could pipe it to like a debug print or something.

Starting point is 00:37:33 And then in the notebook, it shows you at every stage, like what the current, you know, value. Anyways, it was just amazing. So I'm not saying we're ever going to get to that. But there is hope out there that, you know, things can be better. And, you know, that's why we have you on the show, so we can give you a bunch of work, Sean. You're going to go write a C++ committee paper, of course, and give us full reference implementation. Sure. And, you know, first now that you've got me thinking about this idea, I'm going to go write a C++ interpreter just so we can do this.

Starting point is 00:38:06 That'll be a good weekend project, I'm sure. So I got a great thing for us to close on. All right. So the White House tells us no more C++. What do we think? So does Gemini 1.5. We were talking about this the other day, too. It's C++ 20 concepts is too unsafe for anyone under the age of 18.

Starting point is 00:38:27 So we apologize to our younger listeners out there that are paying the $20 a month. If you're a younger listener, please hang up now. Yeah. But, yeah, what do we make of this? We talked about this last time I was on. It's no surprise that this is coming. For corporations, I think there's going to be some amount of risk management that has to go on, and Adobe's certainly in that boat.

Starting point is 00:38:57 My team's on the hook for developing the roadmap, the memory safety roadmap for Adobe and how we approach that. I had an absolutely terrifying meeting with our security team yesterday on this very topic. Yeah, I mean, I think the information coming out of Department of Defense and the White House is filled with a lot of hyperbole, but at the base of it, it's not wrong. Just in working on this chain library at one point, I accidentally wrote T instead of decay type T. And the result was I captured a reference instead of a value. Even though the code looked like it was fine. The type was not. And that meant that I had a use after return on my stack and it cost me half a day to debug the thing. Hear that folks?

Starting point is 00:39:53 The luminary. Debugging his problem for half a day. There's hope for all of us. Yeah. So it's too easy to make a mistake like that. And a lot of times the wrong code looks better than the right code. And that's a problem. And this was one of those cases.

Starting point is 00:40:14 There are some people that are either trying to define a safe subset of C++ or tools or things to make C++ safe or who claim that C++ is safe. Do you buy the argument that C++ is fundamentally unsafe and that there's no hope for it? Or do you think that people in environments that care about safety can find a way to continue using languages like C++? It's just a matter of defining the rules and the guidelines to make it – to define a safe form of C++. So I think there's like Herb Sutter's CPP2, which he states that one of his goals is not provable safety. But my team's looked at his proposals and we've given him some feedback. We actually think he's very close to being able to prove safety. He wouldn't have to compromise much to be able to do that. He doesn't seem interested in it at the moment, but maybe that will change with the current

Starting point is 00:41:24 legislation. I think the work that Sean Baxter has been doing is very promising, but of course that's one guy on a non-open source project. I hope something comes out of it, if nothing more than other people steal ideas from it. We talk a lot about memory safety in the abstract, both without a clear definition of memory safety is and losing sight of what the goals of the memory safety are. And safety properties in general are tools to help prove correctness of code. And, you know, if you, to prove partial correctness of an application, you have to, you have to demonstrate that it satisfies a set of safety properties. And memory safety being one of those.

Starting point is 00:42:12 But the goal from a security standpoint is something called a non-interference property, which is both a safety and lightness property, and the memory safety is just in support of non-interference. And the basic idea with non-interference is that a defect in one piece of your code should not affect another piece of your code. It should limit the blast radius. So when you're really talking about non-interference properties, things like shared pointers become a problem. Any shared mut shared pointers become a problem. Any shared mutable state becomes a problem. Sharing indices to the same array is another way to share information becomes a problem, right? So

Starting point is 00:42:57 even if you're outside of the pointer space. And C++ doesn't give you any tools to help you build code that satisfies a non-interference property. Even if you could go through with C++ and you could say, every place where it's undefined behavior, I'm going to insert a runtime check, and ta-da, C++ is completely safe. Your program will probably never terminate because it will run so slow, but it will be very safe. So at one end, that's completely doable, your program will probably never terminate because it will run so slow, but it will be very safe. So at one end, that's completely doable. And then the next end is how much performance can you get back through optimization, knowing that some of these checks are unnecessary? And then how do you

Starting point is 00:43:38 structure your code to guarantee that those optimizations kick in and what are the guidelines for that. But then you're still left with a piece of code that looks like a piece of C++ code and probably needs a redesign. And when they're talking about memory safety, it's 70% of the exploitable CVEs, according to Microsoft and Google, that they see. Well, that leaves another 30% of the problem. So even if we fixed all the memory safety problems, we wouldn't have super bulletproof software.

Starting point is 00:44:29 It's like your ship's going down, it's got three holes in it, and you're like, well, we got a way to patch two of them. Yeah. Right? Well, and also, if everybody's programming memory-safe languages, then attackers will spur innovation and new ways of attacking software. There will always be attacks. Yeah. There will always be attacks. Yeah, there will always be attacks. So I think eventually the changes have to be much deeper and much more radical than

Starting point is 00:44:56 just shoring up C++. That said, Photoshop alone has 30 million lines of C++ code. And it's not going away anytime soon. How long did you say that flash rewrite took? Two years? How many lines was that probably? I don't know, maybe 500,000 tops. So that's like a factor of 30 million.

Starting point is 00:45:23 I think it's just two, so it's like four years is all you need. Good math there. We should probably let Sean go because he's got another meeting. This is true. We have made you late. I've been hearing my watch buzz. Be sure to check these show notes either in your podcast app or at ADSPthePodcast.com for links to anything we mentioned in today's episode,

Starting point is 00:45:43 as well as a link to a GitHub discussion where you can leave thoughts, comments and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality, high quality. That is the tagline of our podcast. It's not the tagline. Our tagline is chaos with sprinkles of information.

Algorithms + Data Structures = Programs - Episode 172: 🇺🇸 Sean Parent on Flash, Chains & Memory Safety

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.