Future of Coding - Beyond Efficiency by Dave Ackley

Episode Date: March 4, 2024

Dave Ackley's paper Beyond Efficiency is three pages long. With just these three pages, he mounts a compelling argument against the conventional way we engineer software. Instead of inflexibly insisti...ng upon correctness, maybe allow a lil slop? Instead of chasing peak performance with cache and clever tricks, maybe measure many times before you cut. So in this episode, we're putting every CEO in the guillotine… (oh, that stands for "correctness and efficiency only", don't put us on a list)… and considering when, where, and how to do the robust thing. Links $ patreon.com/futureofcoding — The most recent bonus episode is a discussion with Stefan Lesser about new "laws of physics" we can invent inside the computer. Don't destroy the earth, then make sure your thing can't be destroyed, then don't destroy your data, and finally, do your damn job, AAAAAAAAAAAAAAAAAAAAAAAAAAA. A Software Epiphany, and the accompanying HN discussion — giga viral, so sick PartyKit? Nice! What started as a simple todo list turned into an ocean of tech boy milk and, ultimately, the AI apocalypse. Jepsen is a rough, rugged, deeply thoughtful and fantastically cool approach to distributed systems testing, by Kyle Kingsbury. Also, we didn't talk about it, but his reversing / hexing / typing / rewriting / unifying technical interview series is essential reading. Ivan's examples of robustness vs efficiency were RAID, the CAP theorem, Automerge, the engineering of FoundationDB, and Byzantine fault tolerance— all of which stake out interesting territory in the efficiency/robustness tradeoff spectrum, all of which are about distributed systems. Can programming be liberated from the von Neumann style?, a paper by John Backus. We Don't Really Know How to Compute!, a talk by Gerald Sussman. The Robust-First Computing Creed is rock solid. The Wikipedia article on von Neumann architecture did not come through with the goods. Ivan works with Alex Warth now, and thus may fairly speak in half-truths like "I've been working with constraints recently…" The Demon Hoard Sort Bogosort is never coming to Dreamberd The Witness was made by Jonathan Blow, who has Aphantasia, but he also made a game called Braid, and Braid is good. Datamosh is a creative misuse of the lack of robustness that comes from storing diffs instead of full state snapshots. Here's a lovely gallery of examples. Abstraction by xkcd Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine Can't let Lu get through the above without derailing onto Fiverr, PCP, Fight Club, and the Dust Brothers. Randy Newman was nearly quoted in Ackley's Indefinite Scalability for Living Computation — god help you if you read our show notes and don't listen to the episode. "It is difficult", says Upton Sinclair when asked about Jimmy Miller being Jimmy Miller, and how we all ought to approach our own sense of Jimmy Miller. Music featured in this episode: Hawker News by user: spiralganglion Corporate World by the Dust Brothers No more jokes! Find us at these normal places: Ivan: Mastodon • Website Jimmy: Mastodon • Website Lu: Mastodon • Website Dave: Mastodon • Website Send us email, share your ideas in our Slack, and support the show on Patreon. Yes, do all three please. http://futureofcoding.org/episodes/70Support us on Patreon: https://www.patreon.com/futureofcodingSee omnystudio.com/listener for privacy information.

Transcript
Discussion (0)
Starting point is 00:00:00 so we had a crazy week jimmy a like wild week like unprecedented amounts of buzz around what you and i have been doing we had somebody write a blog post about one of our podcast episodes and that made it to the front page of hn yeah i was so sad i didn't catch it when it happened wait no really yeah really yeah like we went gigaviral this week it's it's incredible no no no no hang on a second hang on no i'm actually deadly serious and we got we got 400 downloads from it 400 downloads from front paging hn what i don't even see this yeah no it's uh it was because it was a blog post titled i had an epiphany which is not something i'm ever gonna click on yeah just to be honest yeah right i mean i don't browse that file website
Starting point is 00:00:50 but it has a great search function yeah a great search function yeah if you search the word epiphany it was yesterday this blog post um do you want to do a dramatic reading of it lou a software epiphany yeah yeah john wiles oh my god episode 61 the programming is theory but okay which is exciting for me i just want to put that out there suddenly i understand software even though i don't respect podcasts as an information delivery system guys you've just been called a delivery system i recently listened to a podcast that felt like having an epiphany the podcast in question was episode 61 of future of coding so i guess they're saying that that specific episode of a podcast was a podcast. Yeah, fair enough. It is essentially a read-through slash review of two papers with which I was unfamiliar, but which I believe are actually quite famous and influential.
Starting point is 00:01:58 That feels like a roast of these. All right. Why did listening to- You made it past the funny part. The funny part is the whole like yeah i don't respect podcasts but i listened to this podcast and it was good actually which is i think it's great to be fair i i think that is the correct opinion i agree that's my basic view of podcasts including this one well there's a reason you're on this one and not on you know lex friedman's
Starting point is 00:02:23 podcast or what have you right yeah though i did just unblock him today on twitter really yeah i unblocked him why i realized that so me and steve my boss we sort of 50 50 run the teal draw a twitter account right and i'm realizing that a lot of the notifications that come into it are from people that I've blocked on my main account, my personal account. And it just made me feel a bit awkward. So I've just gone and unblocked everyone. So if I've blocked you in the past, this is your second chance.
Starting point is 00:02:55 You have one chance. I just mute people. I don't block them. Like, it's fine if they go look at my stuff. It's just like, I don't really care about that meme account. That's not what I'm on here for. I'm just going to mute it so that they stop getting... Or like this week, it's like,
Starting point is 00:03:13 you know, oh, SpaceX did something. Now all of Twitter is just SpaceX. Definitely just SpaceX. There was nothing else that happened on Twitter this week. Yeah, no. For you, it was like, oh yeah, programming doesn't exist anymore. There's only SpaceX.
Starting point is 00:03:33 But yes. Twitter's a vile website and nobody should use it. Yeah. The rest of this, I just want to put it, the rest of this blog post is really, really nice. It's a person going and listening to our episode on programming is theory building and i also threw in gilbert ryle uh as like background reading for that and they realized like oh yeah now we're had it right about programming is theory building
Starting point is 00:04:00 and it explains so much of what this person sees in real life software and i totally agree it's uh it was it was nice i i this is my favorite paper i think that more people should have this epiphany and it's it's so typical hacker news style where like most of the comments are like this isn't anything there's not really a big epiphany here i don't know how you don't realize these things and then they don't realize these things. And then they don't state the things that the paper actually states. It's great.
Starting point is 00:04:29 It's like A+. Title is a bit overblown for the actual epiphany. This is manifestly not the case. I don't see anything particularly unique to software in this thesis. Epilepsy warning. I also think this is way off base. Made me seasick.
Starting point is 00:04:52 You might understand software, but you clearly don't understand web design. The pop-ups are terrible. And I found the first sentence very off-putting. The value of computers is that you can write code once, and the computer can execute it repeatedly ad infinitum. As a programmer, your middle theory of the codebase has value to its owners, but it's not the product. The codebase is not the product. The product is whatever the output is created and consumed by the relevant stakeholders.
Starting point is 00:05:22 Oh, bud. relevant stakeholders oh but not only is this like quintessentially hacker news this is actually almost identical to donald newth's reply to nauer which is just hilarious like this is it's just the way that everyone reads it they say first i agree and yet i don't agree with the conclusion which doesn't you that's not how that's not how logic works like it the if nauer is making his argument the way he thinks he's making it this is a logical conclusion that you should follow and i just it's just my favorite i love that like so many people yeah so here i have donald newt saying my experience agrees well with peter nauer's hypothesis that programming is theory building, and they strongly support his conclusion that programmers
Starting point is 00:06:27 should be accorded high professional status. But I do not share Nauer's unproved assertion that reestablishing the theory of a program merely from its documentation is strictly impossible. On the contrary, I believe that improved methods of documentation are able to communicate everything necessary for the maintenance and modification
Starting point is 00:06:43 of programs. So a slightly different different focus here but amounts to the same thing that like the theory is not the main part that the software itself and the documentation are all we need uh and it's just i don't know i i wish now i wrote that paper a little bit more clear but i'm happy that uh that we could do the service of of talking about it too and other people could grab on to the concept you know it sounds trite when you just say the conclusions without actually diving into all the details so i'm not surprised that hacker news took away that it's just a trite conclusion my favorite is one person who said made with party kit if anybody wants to make their own
Starting point is 00:07:25 party kit yeah party kit nice show an office with those guys yeah yeah i was like oh that's nice this is my favorite hacker news comment this is a top level comment there's this seminal paper by peter nauer called programming is theory building that arrived at the exact same conclusion when it comes to programming oh so good that's how he knows he's summarized it well though right yeah yeah yeah it's just it was just so good it's like chinese whispers but there were no mistakes uh-huh you know yeah i like that one which is like it is true yeah um programming is theory building i mean if, to that person's, you know, not credit, but credit, like he doesn't link to the paper directly, but still, yeah. Yeah, it was honestly, this is, for me at least,
Starting point is 00:08:15 this was super satisfying to see. Like I'm excited that people are listening and enjoying the things that we're talking about, and especially for this paper. This was one of those papers that I worried we wouldn't do justice to and it's been a well received episode so i'm very excited and that that comment gets an upvote from me that's the stuff i come to hackernews for nice lou how was your week i'm not even joking i want to know what's it been like uh it's been crazy. I was also on Hacker News.
Starting point is 00:08:46 A little bit. The title was equally unhelpful. The title was, I think I need to go lie down. Yes. So on Monday, my boss, well, you know, at Teal Draw, we've just finished like working on some kind of, I would say, kind of boring stuff. Like we're just getting the library ready changing a few apis it's it's not necessarily visually flashy so you know we thought we'd earned a bit of fun so my for everyone everyone got some you know pick something fun to do my fun task was steve said to me hey luke can you that's my my impression of steve he said hey luke hey luke can you get
Starting point is 00:09:26 like 200 likes on twitter like and that was literally my to-do list for the week i think i posted it up on mastodon yeah you know part one step one write a to-do list step two um get 200 likes on twitter step three speak to gitbook lovely guys um and and yeah i got 200 likes and then we got about i don't know 10 000 um and so yeah it's been a fun week so for for anybody who doesn't know because this podcast is probably coming out like maybe two to 18 months after these events do you want to tell them what this is about yeah so like um the gpt vision api came out so what you can do is just chuck the ai a uh a picture of of a little user interface of a little website, and it can turn that into a website. But the fun thing that we've been doing is then it places that website back on the canvas,
Starting point is 00:10:36 which you can then annotate again, like draw some new things on, and then give that back. And you get in this cycle where you're like having this back and forth visual conversation by drawing stuff. So like turning a picture into a website, that's not new. But the new thing is being able to then draw on whatever they give back to you, which feels really weird, feels really weird. And everyone's freaking out. Why does it feel weird like i tried
Starting point is 00:11:06 to use it and for whatever reason my api key was just rejected repeatedly probably because of you know servers falling over who knows what let me in bang bang bang i think you need to you need to put five dollars on your account oh i put i put i put not five hundred dollars but more than five dollars on my account oh well there you go maybe yeah maybe they just shut it down i think it's because they invented agi who fired sam and so you know they know that you might take down the system so they're rejecting you your comments against ai have you know made you get banned so that's the the like the domino stuff that happened is like hey lou can you get 200 likes that's like monday or whatever right yeah wednesday it's like oh my god we're on
Starting point is 00:11:53 the front page of every internet um you know people are using this to like make games and everybody's having this this giant fit and it's it's incredible and terrifying and this whole cluster of emotions all at once friday open ai implodes and i see these as like there's like a linear you know straight line path from steve saying hey lou can you get 200 likes to uh you know monday it's going to be like i don't know sam altman is dead or something like that right like this is going to be yeah this is like who knows by the time this episode goes out will we even have jobs i don't know it'll be like the dyson sphere has happened or something like that and we're all living is like an energy cluster orbiting somewhere in a spheroid around the sun yeah that's the that's the accelerationist view that these you
Starting point is 00:12:46 know silicon valley hacker news types want us to e slash act trying to reshape society into yeah yeah which is why i was on the front page of hacker news um but hey do you know what should have been on the front page of hacker news but wasn. Lou, it's your first segue. It should have been Beyond Efficiency by David H. Ackley. Dave, that should have been on the front page of Hacker News. With any luck, somebody will find this episode a year after it comes out
Starting point is 00:13:24 and write a blog post titled i had an epiphany yeah a hardware epiphany maybe this time maybe and then people will finally discover this this very short highly readable super approachable i don't know what to call it because it's not a traditional sort of essay like you'd get in a journal even though it it was um uh submitted for we're reading a pre-print but it was um accepted to uh communications the acm so it's in the it's in the acm but it's it's this like three column like three page and then some references the references were actually interesting basically
Starting point is 00:14:05 like uh not not i don't i'm trying to avoid saying a screed because it's not that angry but it has that like flavor of like like uh like demagoguery like it's like trying to whip up people's enthusiasm and energy and like get them to you know change how they approach their work in a way that i find very appealing so yeah i don't know what to call this but it's uh it's short and sweet i believe uh it's called a viewpoint essay okay which is like i guess that's like an opinion piece i don't know a letter to the editor yeah yeah a letter to the editor yeah uh yeah i I think of all of the papers we've read, this one I can definitively say did not need an editor.
Starting point is 00:14:52 This did not need anyone to shorten it. There were no bits that needed to be cut out. Everything in here is essential to the argument being made. If anything, personally, I would have liked a little bit more. I did go and watch a bunch of or listen to a bunch of david actley um uh youtube videos in the robust first playlist so that helped me kind of get this mindset i will say that uh i'm i'm not i haven't absorbed it enough i feel like i'm a little like I hope this conversation will help me kind of get more where this is.
Starting point is 00:15:29 Because it's a big shift from where we are. And I feel like I don't quite understand all that entails. And speaking of chain reactions from a bit earlier i would say that this paper or like this series of papers that dave actually made um a while back is actually what has made me sit here today like i got into coding again like i learned it when i was younger and i picked it up again when i was learning about all this robust first this simulation style of computing and that's what got me back into coding and that's why i'm here and that's uh why sam altman got fired right and it's you know all that gpt4 needed was tl draw to become sentient and that's why we're gonna you know the heat death of the universe is upon us all all because we went
Starting point is 00:16:36 beyond efficiency exactly it's all connected yeah which almost sounds like beyond efficiency almost sounds like we're getting even more efficient than efficiency. That's like what my first thought when I saw beyond efficiency was, but it's actually kind of the opposite. This is a paper kind of arguing that we should throw out efficiency. That efficiency is not always good and that efficiency comes with some real trade-offs around what he calls robustness and i think on the face of it that is going to sound very unpalatable to a lot of the people in our general community i think of you know i don't know that of course the person that comes to mind
Starting point is 00:17:26 is somebody who gets named too much on this podcast, which is Jonathan Blow. I mean, you say that we shouldn't care about it. We should care about efficiency even less than we do right now to a lot of people. And they're going to feel like you're really out of touch with how little we care about efficiency but this is i guess like way beyond that kind of take to like we should get rid of cpus and ram yeah yeah like this is a it starts from a a very simple place of robustness
Starting point is 00:18:03 which is broadly defined as like resilience to error whether that error comes about as the result of you know a cosmic ray hitting your cpu or because there's a malicious hacker who broke in and sprinkled malware through your system or there's a coding error or any of the other reasons you know bad data or something like that any of the reasons that a computer program might fail to perform its intended purpose, those failures are, you know, robustness is our bulwark against that insanity, against those failures. And that in the drive to make computers more efficient, and specifically to make the execution of software more efficient
Starting point is 00:18:45 we have had to give up robustness and we had more robustness in the past and there's some things you know in the analog computer era say computers were more robust because they had to be you know analog and and very much more physical in nature but as we switch to digital computers and you could be more certain that uh that there was like perfect fidelity of data transmission and that sort of thing we have gradually bled the robustness out of our systems in the name of efficiency and so this is you know that's the the small seed of the idea is like recognize that robustness and efficiency have this relationship where you can trade one off to
Starting point is 00:19:26 gain the other. And now that you recognize that relationship, let's look at some of what we've lost and let's consider how we might recover it. I think that's a really provocative position to start from. Yeah. And I think that's what got me really interested in this idea in the first place you know I think there might be listeners listening surprisingly thinking no that's ridiculous what a crazy idea and I think that's the point you know like it's it's framed in a way and it's presented in a way that is intentionally surprising and provocative. It's this viewpoint essay. It's supposed to capture your attention. It's supposed to make you think, wait, what, really? So don't be put off by the fact that it sounds a bit ridiculous. I think that's actually the point.
Starting point is 00:20:18 It's putting out a bit of intrigue into your mind when it first hits you. So we should hit them with the bubble sort example because that's like we've been saying hey everybody you know this is going to be a little bit absurd and we haven't yet any we haven't given an example to show like absurd how like what do you mean oh it's all well and good to say abstractly yeah robustness and efficiency of this trade-off and it'd be nice nice to try and be conscious of that trade-off. But there's this example where bubble sort as opposed to merge sort and quick sort,
Starting point is 00:20:54 it's like significantly slower because bubble sort does just a ton more comparisons, right? Like it always takes a pair of adjacent elements in a list that it's trying to sort. And it compares that pair of adjacent elements. And if they're in the wrong order it swaps them and it just does that over and over and over again and the algorithm is like just super duper simple whereas quick sort and merge sort are a little bit more clever and through that cleverness they avoid doing a whole bunch of redundant checks right because bubble sort as the list gets more and more sorted, you're going
Starting point is 00:21:25 to be frequently comparing elements that are already in the right order. And merge sort and quick sort try not to do that, right? If your list is already mostly sorted, quick sort and merge sort are going to just blast through it. But bubble sort is so much more tolerant to faults that can occur in the hardware that's executing it or that can occur in a component that collaborates with it and as an example um ackley suggests like what if you had sorting happening where the function that is checking the elements to see whether they're like that checks a pair of elements to see whether they're sorted correctly, returned the wrong answer 10% of the time.
Starting point is 00:22:10 And if you run quick sort or merge sort with that kind of faulty comparator, you end up getting tons and tons and tons of errors. So if you have 52 items, like a deck of cards, and you sort them all, and then measure like how many errors do you get? This is total card positioning error so it's like how far off are these cards is the idea so like this is five positions off or this you know it should have been five up or down in an order quick sort gets like 250 something errors
Starting point is 00:22:41 merge sort gets like 180 something errors and bubble sort gets like 10 errors because that little bit of extra work that bubble sort does where it does all those extra comparisons just gives it more chances to recover from that erroneous result from that faulty comparator but even if you run quick sort and merge sort multiple times in a row, the error rates don't improve all that much. They improve maybe another 50% or something like that. So there's something about bubble sort that is just like inherently more robust to failure than quicksort and merge sort.
Starting point is 00:23:17 Yeah. And do you know what I like is earlier on, actually says about how you know when sorting a long list either one merge sort or quick sort blows the doors off bubble sort later on he shows this graph of bubble sort being way more robust than those two and follows with the line now whose doors have fallen off? It's like a very long, long setup. And he actually does say that if you repeat about six times with a quicksword or merge sword and take the most common answer, you do get competitive with bubble sort. But he calls that a cheap hack and-facts fix tailored to one situation. But I think this is interesting because in the worst case with quicksort, you get 250 positions off.
Starting point is 00:24:14 But when we're talking about comparing efficiency between quicksort and bubble sort, he tells us that if you try to sort a billion numbers and you try to do it with bubble sort it's just going to take forever whereas if you try to do it with quick sort he says it took under 10 minutes on my laptop I looked up I see people having like you can do in Java in like 30 seconds
Starting point is 00:24:38 on modern things this is a very very fast thing whereas if you still try to do bubble sort you're basically never gonna get an answer at a billion things so i think this is makes this even more stark to me which is like quick sort and merge sort yes they did they were less robust but they weren't like exponentially less robust in a of ways. Yes, they were for this small thing, but if we amplify numbers, it's not as bad. And yet, still the argument here is
Starting point is 00:25:11 their efficiency gains aren't worth it, even though they're quite a lot more efficient. We don't want that. We want robustness. And I think the thing that was a little confusing to me at this point is, like, why? And I'm curious if any of you all can articulate, like, why do we care that bubble sort can, you bit, is this viewpoint essay is introducing a new kind of way of measuring software. Ackley introduces this concept of CEO, and he states that currently, at the point of writing, the goal of software is CEO, which stands for correctness and efficiency only.
Starting point is 00:26:07 And, you know, the first example, comparing the efficiency of these different sorting algorithms, he's looking at efficiency. He's then presenting, we can measure these in another way that you may not have thought about, called robustness. And it's sort of just introducing this new idea that I guess I had never thought about. And in my experience, people often haven't thought about what if we have some error? I think in the video, he says, or he opens it with, why do computers crash? Is it something we just have to deal with? When is it okay to deal with it? When is it actually really important that it doesn't happen? And is there a way that we can rate different software in that scale? Not just speed, not just correctness, assuming nothing goes wrong.
Starting point is 00:26:59 Because in reality, things go wrong. In the myths and misconceptions episode we talked about um the fuzzy edges of software and how they have to interact with the real world and this robustness concept i think is trying to tackle that i think that's part of the why it is vague though he says it could be for any reason yeah i'm kind of reminded of you know the kind of erlang let it crash uh mentality but on steroids right like this is like erlang's like okay we have these little processes and they can crash and we bring them back up but this is saying like let's go even further let's assume that like we have software and like we hit it with x-rays and random bits just change. Let's assume that we have software and we have hardware that we can't actually depend on giving us a deterministic answer.
Starting point is 00:27:54 We have any number of things could just break at any moment. How can our software deal with this? As I was looking more, and I don't think it's actually ever explicitly stated in in this paper but the idea is like this is the only way uh that we're going to i guess it's stated a little bit but that we're going to like get beyond the scale of computing we're at right now right now everything we do even with our supercomputers, even with open AIs, clusters, etc., is, according to David Agley, like small-scale computing. We need to get beyond this to get these massive computations. And at that point, the centralized manner of these things
Starting point is 00:28:41 won't work. We cannot make decisions where we have to submit everything to CPU or we have to grab everything out of RAM. And anything we have right now is a hack on top of that. If you look at the Jepson test, look at all these databases and all the failure cases they have in these distributed systems, I think you could see this as the hack. We're trying to take these systems that are supposed to be deterministic
Starting point is 00:29:05 and somehow make them work in the face of all this indeterminism. This paper is arguing we should start from the other point. What if our fundamental primitives were unreliable from the beginning and we had to program to adapt to all of this? What systems would we get out at the end? One of the things I had a lot of fun doing as I was reading this was trying to think of some of my own examples
Starting point is 00:29:31 of places where this trade-off has come up and then trying to find like, okay, what are the common themes about those places I'm thinking up? Which is of course like, yeah, of course there's going to be themes because I'm the one thinking it up and I'm going to get onto a train of thought and that's where all the examples are going to come from so there's
Starting point is 00:29:47 probably great examples that i totally missed but some of the very first ones that i thought of were like raid so if you have a raid array of discs you have to choose depending on which raid mode you pick do you want the space to be used redundantly so that you can be resilient to failure of a disk right every every disk in your rate array gets mirrored so if one of them fails you have a backup copy versus do you want to stripe the data between your disks so that you can read that data out twice as fast because you can parallelize those reads between your disks so that you can read that data out twice as fast because you can parallelize those reads between your disks which is like you know um even in the era of the ssd that's still a huge performance win and so there's that trade-off right do you want it to go faster or do you want it to be more resilient to failure another one is cap theorem right if you have a distributed system
Starting point is 00:30:39 a bunch of nodes that need to communicate and there's some admission that like okay sending a message from one node to the other is going to take time and it's going to go over an unreliable connection and so there's a likelihood that that communication could fail there's different ways you can structure the communication so that you either have like if it's a web service right and there's requests coming in and this is you know these servers need to occasionally coordinate to be like oh somebody just added an item to their shopping cart i need to like update the the database and then the database nodes need to all communicate with each other and say oh yeah there's a new item in the shopping cart everybody catch up with that do you
Starting point is 00:31:19 want those distributed database nodes that have to like gossip changes to each other to be more available so that if you know there's a network failure or something like that nodes will still respond even if they have partial data right like the new item in the shopping cart hasn't been updated everywhere do you want availability so that those nodes will still respond with the shopping cart contents and some of them will say oh it doesn't have that item and other ones will say yes it does or do you want like consistency and correctness where you say hey we're not going to actually admit that the shopping cart has that item in it until all the nodes know about that item and we're going to wait until we've got confirmation that everybody knows it before we say yeah the shopping cart has that thing and that trade-off between availability
Starting point is 00:32:05 and and consistency you have to choose you can't have both that's the that's the cap theorem i have i have a confession yeah and you can't you can't tell dave this okay right so this paper, it's arguing against efficiency. You know, it's championing robustness. However, me reading and learning from this paper has actually helped me to write more efficient code. Oh! And, like, prioritise more efficiency sometimes, which I feel like I'm betraying the robust first cause.
Starting point is 00:32:48 But let me explain. Right. So I think before I discovered this whole new school of thought about thinking about this tradeoff, this other measure of robustness versus efficiency, I always thought, you know, to write faster code, code that runs fast, you just need to know, I don't know, the tricks, you need to be a better programmer, you know, if my code is slow, because I suck or something. And through discovering this other measure of robustness, and this trade off, it's just made it a lot easier for me to figure out how to make things more efficient. Because I know how to make things more robust, you add redundancy, you lean into the error, you lean into the non-determinism. So it's quite easy to just flip that. And when I want to make my code run faster now, I'm looking
Starting point is 00:33:37 for places to remove that and to make it less robust. And it feels wrong. you know i would never admit this to professor ackley but you know it's it's changed the way i think about the code that i write i think it's good whether we're trying to be efficient whether or robust or correct or whatever i think it's good to have this extra measure of robustness so that you know what you're sacrificing or you know what you're gaining yeah like you're conscious that there's a trade-off now as opposed to it just being invisible as it is to probably most programmers or at least unacknowledged and backwardsly automatically intuitively just doing the efficiency thing not even knowing that you could if you wanted to get even more efficiency by getting rid of more robustness go don't go looking for opportunities to make things faster go look for
Starting point is 00:34:30 opportunities to make things less robust yeah exactly yeah yeah but i just have to cross my fingers when i do it yeah i i think that this is i think that's a really interesting way of looking at these things and i do think it's super helpful to realize where we have redundancy, where we're copying things we don't need to. You know, these are often the causes of our slowdowns. But I think it's I think there's a really interesting idea here that I wish was more explicitly argued for in this paper, but I think there is this kind of argument for it that it is that sort of removing of redundancy that leads to these software errors, but that it's not a programmer's fault.
Starting point is 00:35:18 It's not that we write buggy software or that users are stupid and do the wrong things that cause these software problems. At the core is just the von Neumann model of computation, that this is the ultimate culprit that caused all this problem. And I know that there's a paper that's definitely on my list for later maybe to read, which I can't remember who wrote it, but Can Programming Be Rescued from the Von Neumann Model, I think is the name of the paper.
Starting point is 00:35:53 Can Programming Be Liberated from the Von Neumann Style? And that's a, I can't remember, who wrote that? Do you remember, Ivan? I did. Okay. You wrote it? Yes, yes. That's a robustness first answer yeah you can only you can only you know answer what you have local information
Starting point is 00:36:14 to quote ackley uh a mistaken person gives wrong directions even if you repeat the question so yeah i i wrote this um yeah so john backus wrote this in 1977 it's a acm turing award lecture um so luckily i have access to non-local information here um but yeah i think this is really interesting and this is really you could see this research here as kind of taking that question and saying yes but we don't quite know how yet like what we get here is not an ultimate answer to how can what's the alternative right what how could we get rid of the cpu how could we get rid of ram but we get some suggestive hints in this paper and I know you all have watched some of Ackley's stuff as well, so I'm going to rely on some of that. And it's this, you know, the things, the demos I saw are kind of this cellular automata basis, and we see that even von Neumann was interested in this being the model of computing going forward. And so what you have instead of a central processing unit where you have access to this
Starting point is 00:37:29 memory and there's these caches that are all local to you and you can guarantee, quote unquote, some certain deterministic output depending on all of these things. Instead, you get these tiny little computers that are all interacting with each other and can all scale and connect interchangeably together. And so the computer itself is small. It has very limited resources, and it can only see its little world, and it has to accomplish its goals not by taking control, not by becoming the leader,
Starting point is 00:38:03 but by cooperating with all of the bits around us. I'm reminded if you've seen the talk from Gerald Sussman on we don't know how to compute yet or something like that. No, I haven't. Okay, this is like 2011. And Sussman here says, it's quoted in Brett Victor's The Future of Programming. Which I also haven't read. It's quoted in Brett Victor's The Future of Programming. Which I also haven't read.
Starting point is 00:38:26 It's the one Brett Victor thing I refuse to read because it's bad. Really? Wait, is this another robust first answer? I don't know now. No, no. Well, okay. So I have read it. You really haven't watched?
Starting point is 00:38:41 Watched. Read. Watched. Wait, which one are you talking about? The talk that he gave Where he dresses up in a suit and has an overhead projector Oh, crap Yeah, I have seen that one
Starting point is 00:38:53 I was thinking about the essay Learnable Programming Yeah, yeah, yeah They both have programming in the name and I just snapped a grid Learnable Programming is bad Oh, wow I think it's there that he quotes No, maybe it's not that one. Anyways, maybe I'm wrong,
Starting point is 00:39:08 but someone quotes it and says, I got this, or was it a hickey talk? I don't remember. Now I'm all confused. Extraordinary claims without evidence may be rejected without evidence. Yeah, let's don't get on that tangent because I'll just...
Starting point is 00:39:21 So anyways, Sussman says that eventually we're going to be buying our compute by the bushel and that these computers will just be in the paint of the wall and they'll add compute to our wall by painting it on. Right, this is the idea. It was not that talk, but anyways.
Starting point is 00:39:38 Yeah, this is kind of Ackley's model is something like that. So I've got this idea that like robustness and the trade-off in particular come up frequently and prominently in cases where you're doing distributed computing. That was the thing I was building to when talking about like,
Starting point is 00:39:58 oh yeah, these different examples that I found, they all come up in cases of distributed computing, right? The other ones I had were like AutoMerge and FoundationDB and Byzantine fault tolerance, right? Which Ackley even talks a little bit about and references some work about that. That when you have lots of tiny computers and there's communication between them, just because physics, communication is fallible. If for no other reason than just like, like oh i lost my network connection in the middle of transmission or whatever or somebody unplugged an ethernet
Starting point is 00:40:31 cable and foundation db if anybody hasn't seen there'll be a link in the show notes to a talk from strange loop about how they engineered foundation db by building the most unreliable server rack possible and then only running their software on that rack as they were developing it where it's like you could unplug the network cable or the power or smash the hard drive or anything like that at any time and it had to be robust to that kind of failure and then they what they set out as a goal was like let's guarantee perfect robustness against that adversarial environment and then make it fast and then make it go as fast as possible given a certain minimum threshold of
Starting point is 00:41:11 robustness which is a really cool approach as opposed to let's make it fast and then try and bring robustness up a little bit which is like that's your yjs versus auto merge debate right there yeah and and this is the robust first creed it's it's um i i'm gonna try and google it right now okay but it begins with something like first be robust right i'm just gonna google like listen to that keyboard clacking no you carry on for a second. You're doing great. Oh, thanks. Yeah. Wow.
Starting point is 00:41:48 That feels really good to know that I'm actually doing a good job at podcasting. All right. I found it. Too bad. All right. The robust first computing creed. First be robust, then as correct as possible, then as efficient as necessary. It's not saying don't be efficient
Starting point is 00:42:06 it's just saying be robust first then you're allowed to be efficient and the example uh you just described just really reminds me of that so the thing that i also found i only thought of one or two examples of things that are kind of about this robustness, correctness, efficiency trade-off that aren't in the context of a distributed system. And one of them is constraint solver programming, and the other one is John Carmack's approach to optimizing game performance. I can talk a bit about what each of those are, but they're both cases where robustness is not the concern but correctness versus efficiency is and so i'm curious if either of you have any examples of cases where robustness is the concern but it's not a distributed system
Starting point is 00:42:56 because it's easy to think of like oh yeah distributed systems are fallible therefore we we need robustness and from robustness you get correctness and from correctness you know you get the desirable outcome of your program but like are there cases where we worry about failures of robustness that aren't about distributed systems i mean in some sense everything's a distributed system that's what makes this hard right except except for the like the abstract model of the von Neumann architecture, right? Well, even then, because the example I'm going to give is persisting to disk.
Starting point is 00:43:33 There's all sorts of things you have to do. You can find all sorts of things about how fsync is not enough to persist to disk properly. And you can look at all these tricks that databases that are production grade do to make sure, you know, you got the write-ahead log and you got like all of these things to ensure that like, yes, this really is on disk. And if something fails, I have it on disk.
Starting point is 00:43:56 And that's ignoring distributing it. It's ignoring like needing it on multiple machines. It's just a single node system. And yet you have to do all this because the storage is distributed in some sense yeah that's what i mean right yes like it's it's because of cache hierarchies and because of ram versus disk what have you and that is in the that all of that is the von neumann model i disagree because the von neumann model just assumes a single pool of storage and a single compute unit a la the Turing machine, right? Von Neumann architecture, it's just talking about you have memory and you have compute and they're separate.
Starting point is 00:44:38 It's about the hierarchy. That's the thing that von neumann so von neumann's architecture is like all right you want for this much hard drive space you want approximately this much ram and you need more hard drive space than the ram so that if you have to page out uh or swap and then it's like okay have this much in your l3 cache hierarchy and this much in your l2 and your l1 and then this much register space and then okay from from uh the wikipedia article the document describes a design architecture for the electronic computer with these components a cpu for stuff a central unit for instruction register and program counter memory for data
Starting point is 00:45:18 and instructions external mass storage and then input output devices all right so we're both wrong so like it already has this built-in well So it already has this built-in hierarchy of the close thing, which are the registers. You have the processor registers, and then you have the instruction registers, one level out. And then you have the memory, the RAM, and then you have the external mass storage. It has that idea.
Starting point is 00:45:42 That's the big thing about von Neumann, is the memory hierarchy because you have to have this all right slower moving and faster moving stuff cooperating together so let's let's ignore von neumann for a minute turing and church and girdle and those other sort of abstract models of computation are not inherently distributed and it makes me wonder if there is a if there are any like concrete computers like any not abstract models but actual physical computers that aren't distributed because i'm really interested in this question of like is there a case where robustness, specifically robustness, not correctness, matters other than in the case of distribution?
Starting point is 00:46:32 And I'm trying to think of any examples of that. I think here's the thing. myths paper you know what is intended to be and what is theoretically supposed to happen with software it's often not what actually ends up happening you know we're humans we're we make mistakes and even if it's not a distributed system they're probably intended to be helpful to a person or another system in some way. You know, we're not just making these things to exist in a vacuum, right? So the number of times I have, you know, built, coded a new engine for a new creative coding thing or a little game, and I often start by trying to eliminate redundancy in some way, because I'm trying to be clever
Starting point is 00:47:26 and I'm trying to make it run smartly. You know, so like a classic example is, you know, I don't need to re-render everything every frame or I don't need to recalculate the position of these certain things every frame. I don't need to recalculate the order of, and this was a real one that happened to me. I had, I made a little
Starting point is 00:47:45 just simple, like, 2D game engine, it just drew sprites, and it needed to keep the order of these sprites, right, on the screen, and I came up with a clever way, I will just sort of keep track of the order, and just keep track of certain things, so I don't need to like sort them every frame you know because sorting it's really fast but you know it's even faster to not do it you know which is kind of stupid right but like you know I thought this is so clever this is so smart I'll sacrifice some robustness and redundancy and I'll just do it once instead of on every frame. And I did that for about a week. And then I realized that somewhere in the code, somewhere, I was making a mistake and I was putting them into the wrong order. And I spent like a day trying to find the place I was doing it wrong. And then I realized,
Starting point is 00:48:41 wait a second, I could just sort them every frame. It's completely redundant. It's because of a mistake I made somewhere else. But you know what? Just be robust, you know, just do it again and again and again. And the only person using that was me. It only works on one computer. I think this robustness, it can happen on a small scale. It can happen on a small scale. It can happen on a big scale. Like it can happen on a little one line,
Starting point is 00:49:07 like one functional helper you write for yourself to make an RSS feed, which I did in the last couple of weeks. Or it could be like, hey, we have these like data centers that we need to all be working and all be talking to each other. I think it's like it's a general concept.
Starting point is 00:49:26 You know, those things can be efficient. They can, or not efficient. They can be robust or not robust. So I would say you can rate every program on robustness. And it's a question of priority, I think. Yeah, I'll put out one example that I think is fairly similar to what you were saying, Lou, on a robustness trade-off that's not distributed. And then I have another example that's very different that I think is also a good robustness example. But I wrote an editor, and I needed color, like syntax highlighting. And so what did I do? I wrote a parser to parse it out, and I just parsed on every single frame.
Starting point is 00:50:04 And when I parsed on every single frame, it was always right. In my new version of the editor, I have tried to have Rust Analyzer support. It's like this LSP, and it has semantic highlighting with tokens. And now I have something that when I edit it, I need to update those tokens because I don't yet have from Rust Analyzer the answer on what it's going to be. And I had to deal with so many errors that I didn't have to deal with
Starting point is 00:50:33 because I was just, yeah, just reparse the whole entire text every single time from scratch. And I can do that at 1,000 frames per second on a reasonable Rust file without writing anything smart from my parser. It was the dumbest parser. Yes, it was robust to any of those things.
Starting point is 00:50:51 Once I got the parser working, I never had to touch any of that code ever again, whereas this token code I'm still just messing with. In some ways, it's an efficiency trade-off because I'm not doing it every single time and i'm just getting deltas of tokens and that's the way that rust analyzer talks to me and but it also is like so much worse in so many ways uh the other one though that i think is arguably if we ignore like the fact that cpus and stuff are distributed blah blah is arguably a robustness without talking about
Starting point is 00:51:25 a distributed systems are like the control computers in uh spaceship space shuttles like the space shuttle and like apollo and stuff they had multiple computers all running the same program and they had to have a quorum before they would do this and the idea was because like oh actually having gamma rays and stuff hitting your computer actually can happen right and so this was kind of like an ad hoc mechanism afterwards instead of having something that would have been robust in the face of these kinds of things we throw in redundancy to try to get that robustness trade-off now i've seen research that suggests this doesn't... One of the ways people have talked about writing this
Starting point is 00:52:07 is not running the same program, but having four different groups of programmers write the same program. And this study claimed that that didn't work because the errors people make are almost always correlated around the hard parts. And so the parts that are wrong are more likely to be wrong in independent implementations.
Starting point is 00:52:29 That's fascinating. Yeah. So you think you're, and I don't know if I believe it or not, but because part of what they talked about is the programming language made it hard, so you could maybe try it in different programming languages and different parts are going to be hard, blah, blah, blah.
Starting point is 00:52:44 But it's interesting because here again this feels like this ad hoc approach where we're trying to build in redundancy into systems that don't have it baked into the core and I think this is what's so interesting about let's throw out the von Neumann model and let's have hardware that isn't reliable that's like one of the suggestions here yeah and it reminds me of foundation db that talk from strange loop where
Starting point is 00:53:12 they did that and it was amazing it reminds you of that jimmy it also reminds me of haskell and their choice to have laziness this was a choice not because they were like oh okay well um they wanted purity in a language and they wanted laziness uh well really they chose laziness to force them to make sure that the language was pure that's the idea that making a pure strict language would have been easier but once you you have this laziness, as soon as you have these potential I.O., you get all sorts of problems. So you have to have this purity.
Starting point is 00:53:52 I think of this same sort of thing. I butchered that explanation. But I think this is kind of this forcing thing here where if our hardware were not reliable from the beginning, how would we change the way we compute? I wonder if you could get that without having to go to hardware.
Starting point is 00:54:09 Like Lou approaches the design of their spatial programming experiments and various other things by starting with the virtual machine and then building the interface, which is the backwards way to do it. It works for them, I have been told it works great yeah agreed so could lou make an unreliable vm and that be the forcing function that that makes them build a robust interface on top of it now i want to try that yeah i i think that this is one of the things that's actually asymmetrical here that i find interesting is that's very easy to do right it is so easy given our deterministic correct ceo software to write something that isn't always going to
Starting point is 00:55:02 guarantee you those things it's very easy for me to throw in random faults. It's very easy for me to throw out random bits. It's very easy for me to corrupt random memory. On the other side, if my hardware by default does that, recovering these things is impossible. Now, when I say impossible, I mean guaranteeing it, making sure it's always the case. Now, what we can do is we can start approaching it asymptotically.
Starting point is 00:55:31 We can start getting better and better, and we can have different models that can make sure that most likely these things happen. I know in distributed systems with consensus, you can't gain common knowledge with a deterministic protocol but if you have an indeterministic one you can gain this knowledge with a probability of one things like that there's like results where you can like use the indeterminacy to help you gain something uh but yeah i find that actually like it makes me stop and say i guess is robust first the right answer because one is more powerful than the others no yeah yeah no i think i think the spaceship example although it's like a bit exotic it is um a good way of thinking about the kind of errors that you want to happen so like say you were on a spaceship or like even worse you're on a spaceship, or like, even worse, you're on a
Starting point is 00:56:26 spacewalk in a spacesuit. And you have a little computer that tells you how much oxygen you've got left. You know, you maybe you have 99% of your tank is full, or maybe you have 1% of your tank is full. That is something that you would want to be pretty accurate you wouldn't want that to be completely wrong you wouldn't want it to say you have 80 when you actually have two percent left you wouldn't you wouldn't you know and vice versa and i think in robustness the idea is that you're still gonna have mistakes there are still gonna be mistakes in code. Code goes wrong, things go wrong, shit happens. But considering that mistakes do happen, what kind of error would you want? Now, in CEO software, when something goes wrong, it goes completely wrong because we have certain
Starting point is 00:57:18 assumptions about everything being right all the time. If there's an error, it's not going to show you 99%, it's going to show you 1%, or it's not going to show you 99%, it's going to show you 1%, or it's not going to show you 1%, it's going to show you not a number percent, you know? Whereas with robust first computing, it's probably always going to be a bit wrong because of this non-determinism, but it's only going to be a tiny bit wrong. You know, so maybe, oh, so what? It says 99% when you're on 98%, but at least it doesn't say 99% when you're actually on 0%.
Starting point is 00:57:51 And I think that's the idea. It's what kind of mistake do you want to happen? That's similar to what you get out of systems that use constraint solvers and gradient descent, which is something I've been doing a little bit of recently oh yeah well okay i haven't been doing it i i have been on a team uh with uh alex worth and alex worth has been doing it and uh doing a very good job of it so i won't take credit except i will oh but using gradient descent the idea is you don't care about getting or at least one of the things that you can do with gradient descent is say i don't care about getting an exact
Starting point is 00:58:33 like a precisely calculated answer because there are all kinds of different math where you can't do that right at least not on a computer you might be able to do it with symbolic algebra you might be able to do it by like manipulating terms in an abstract way. But if you want to actually do calculation, there's a whole class of things you can do that aren't, or that you want to do that aren't closed form and can't be like, you know, straightforwardly calculated, but you can approximate them. And if you can model them using gradient descent and you say like, you know, here's my descent and you say like you know here's my function and here's the curved shape that it has through space and i want to you know slide
Starting point is 00:59:10 down a hill and find a you know a local minimum of error and that's good enough that way of working gives you that property where you won't get exactly the quote- correct answer but you'll get an answer that's close enough and if you build your system assuming that you're only ever going to have close enough answers it's very enabling like it enables you to do a whole bunch of things that you couldn't otherwise do if you were depending on correctness so it's to me this is interesting because it's like it's not just a trade-off between you know the behavioral properties of your system it's to me this is interesting because it's like it's not just a trade-off between you know the behavioral properties of your system it's like there are some systems that are inexpressible if you require a certain notion of correctness that are expressible if
Starting point is 00:59:57 you relax that correctness requirement i i'm i struggle with this kind of this idea of robustness. Because when I hear things like, you know, you only want it to be off by a little bit, that makes perfect sense to me, right? I can completely imagine systems where I've said, okay, you know, I want to put in some safeguards here. So if I see something anomalous, I maybe ignore it, right? Like, and it might be right, but like, I'm not going to, it could also be like a phase transition in the system. And I don't want to go into that. So I'm going to like, you know, whatever I can imagine where you can reduce error and make sure that it doesn't propagate through the whole system and cause all sorts of issues. But when I see things like the bubble sort example, it's slightly different than that things like the bubble sort example it's slightly different than
Starting point is 01:00:46 that so the bubble sort example we get one run here of doing a comparison with this 10 failed comparisons and we see that in this one run bubble sort did better than merge sort and quick sort but that's not always the case there is a case if we kind of like abuse big o notation here for a second there there is a case where like in the best well i guess first in the best case bubble sort does o of n comparisons this is when the list is already sorted well we can imagine a list that's already sorted and we get, yes, absolutely unprobabilistically unlucky, but it technically can happen. It's one possible run
Starting point is 01:01:31 where we hit that failure chance every single time. We have an already sorted list and we do bubble sort and we hit that failure case every single time. And now our list is complete. We went from a sorted list to a completely unsorted list. Yeah. And in fact it might have it will be even worse because it's going to constantly do all these comparisons
Starting point is 01:01:50 and it actually might i like does it actually if we got the you know i know i'm like you i'm saying 10 and now i'm saying we happen to get that 10 you know a bunch of times in a row but it wouldn't end it would it wouldn't terminate if we always swapped it around right if we crank that failure rate up to 100 bubble sort will never end for sure i guess none of them will but anyways it'll never end and it will be inefficient doing it but i i guess like you're not actually you're minimizing the the probability that you get a large error is low, but you're not actually saying it can't ever have a large error. There are runs of bubble sort
Starting point is 01:02:31 where it gets a much larger error. It's just improbable that that happens. And I guess that's the thing that kind of confuses me about this robust idea is there's no guarantee that we get an approximate answer answer hold on a sec you're saying none of them will ever terminate well that was like a silly example if we took it up to 100 but no no no but this there's there's something interesting here so let's instead of saying it
Starting point is 01:02:58 returns you know oh if i'm sorting a list of numbers instead of returning you know is a less than b it returns as a greater than b every time that's one kind of hundred percent failure but there's a different kind of hundred percent failure which is that the comparator always just returns false right because if it returns like a greater than b instead of a less than b all three of these will terminate they'll just sort the list backwards they'll sort it you know descending instead of ascending or whatever, right? But if it always returns false, I think merge sort and insertion sort will eventually stop.
Starting point is 01:03:32 Or not merge sort and insertion sort, quick sort and merge sort will eventually stop because they work based on the assumption that it's like after we've checked two things and we have our answer, we don't have to check them again. Whereas bubble sort is going to keep checking them again and again and again until it determines that it's true, right? Until it's like checked every pair and gotten a true result. Which means merge sort and quick sort will terminate with a wrong answer and bubble sort will never terminate because it's still robust to giving you a wrong answer or another way of looking at that is that if we that bubble quick sort and merge sort are more robust in the face of swapping a mess up in our comparator code that replaces it with a constant false because what they'll do is they'll give you something that's closer to an answer than what bubble sort will give you which is no answer i don't agree with that did you see that uh ai that they got it to
Starting point is 01:04:35 play tetris right and they got it to try and learn how to win and you know like you can't win tetris you can only lose so you know what it did, right? It just kept playing forever? It just never played. No, it just paused. It just paused the game. That's what Bubble Sort does if it figures out that it's rubbish, you know, if there's no winning. Yeah.
Starting point is 01:04:58 Have you seen this Demon Horde Sort thing? Demon Horde Sort, I believe it's called. I have not seen Demon Horde Sort. Yeah, i did look at this cellular automata sorting almost all right now it's the sound of my keyboard clicking demon but but hold on i want a pigeonhole on this rope no i want a pigeonhole i want a pigeonhole no no we can do both if let's assume which is true in the real world that a lot of our lists are approximately sorted right if we just always return false quick and merge sort we'll return this original list
Starting point is 01:05:34 and they'll be approximately sorted bubble sort would return nothing bubble sort wouldn't return yes it would return nothing it would never wouldn't return. Yes, it would return nothing. It would never return. That's different than returning nothing, Hickey. Okay. It's not. Peyton Jones. It's not different than returning nothing.
Starting point is 01:05:53 It returns nothing because it doesn't return. But anyways, I think that you have to consider, and I think even Ackley in this paper would consider, in that scenario quick short and merge short being more robust to handling that failure we should ask him yeah yeah yeah i actually don't know i actually don't know yeah we should ask him so there's a thing that i wanted to wrap did you did you finish your pigeonhole jimmy yeah okay yeah that's fine okay so there's a thing that i want to wormhole on which is uh i've got two examples maybe 2.5 examples of things that are like not distributed systems where some of these trade-offs come up that i want to that i want to meow meow um one
Starting point is 01:06:38 of them is john carmax talking about uh how to do performance optimization oh my god i completely forgot about this i'm so sorry this is this is this is gone somewhere else you said this ages ago you mean this podcast has a correct order that that things are supposed to go in yeah this is what always happens when the paper is either one not structured well or two too short to just walk through it which in this case this is well structured just too short to spend a whole episode just walking through the paper yeah i have some things to respond to for from over an hour ago but i'm gonna save them we haven't even been recording for over an hour i say because i'm gonna edit it down we're being robust in terms of our conversation we can always come back the car nothing gets lost so carry on the car mac thing
Starting point is 01:07:32 this podcast is never going to end um we're robust against a bad episode if we never end the episode it can't be a bad episode that's true that's true um that's true so this this carmack thing is most people when they are optimizing code what they do is they try and eke out every little performance win that they can right like they run some micro benchmark or whatever or they look at you know they they run a flame graph and they look at you know a particularly typical frame or run and they try and find okay where's the the hot path through my code how do i make that less hot how do i you know find some case where i'm like you know n squared looping within a loop or whatever and flatten it out or what have you and they do that process and what and what tends to happen is that you get much better average
Starting point is 01:08:28 case performance but you will still have these spikes of worst case performance and carmax approach is that you should instead try and do the opposite that you want as little variability as possible you not you know better average case and that if you take this mindset of like prefer you know minimizing your variability most of the typical kind of performance optimizations that you do no longer work because they they'll they'll they'll lead you to basically the kind of case where it's like oh i'm gonna do my gc in a way where it's like i don't run gc on every frame because that's slow instead if i save up gc and only do it every once in a while overall the performance of the system will be better but i'll occasionally have these
Starting point is 01:09:19 big pauses where i do gc and that's bad if what you care about is very uniform execution of your program, which is an interesting thing that like matters to Carmack because in games, you know, it's better to have a stable frame rate than a fast frame rate that occasionally has big hiccups in it. Like that just feels bad. But I think this is something that comes up in other cases also, and it's a lens that I'm fond of. And so i just wanted to suggest that that is a place where it's not trading off robustness versus efficiency but it is trading off maybe correctness versus efficiency or or it's one of these like places where you can trade off against efficiency that seems relevant the other one bubble sort will terminate i'll just sort the list backwards
Starting point is 01:10:06 did you just implement that i just googled a python implementation of bubble sore and then i modified it and we were just thinking of it having some sort of while loop but it just does two nested for like a nested for loop setup doesn't that depend on how you implement bubble sort i mean this is like canonical bubble sort. Canonical? Head canon bubble sort. Yeah. If it goes through,
Starting point is 01:10:30 all it does is like, once it goes through some iteration, if it never swapped anything while it was walking down, then it can terminate early. Otherwise it knows that if it does all the possible, all those combinations, it will end up sorted. It knows. And so I just made it so it swaps every single time and always knows that it it does all the possible all those combinations it will end up sorted it knows and so
Starting point is 01:10:45 i just made it so it swaps every single time and always it knows that it's swapped and then it just starts the list backwards so it gets the exact wrong oh now i would need to compare it to merge sort and quick sort i would actually bet they probably do the same thing-ish, but it's hard to know. So what's the name of the sort that does the thing that I was talking about? I think Bogosort is the most robust sort. No. No. Am I the only person who finds Bogosort not funny?
Starting point is 01:11:23 Yes. I would say it's- I don't find Bogosort funny. I find it dreadfully unfunny. How is it unfunny? It's like one of the most recommended things for Dream Bird. Yeah. It doesn't make the list.
Starting point is 01:11:36 Sorry. Oh, you just ran. You throw your data up in the air. And if it lands in, you know, the a mona lisa or a shakespeare or whatever then you can shoot the monkeys right i don't what about pancake sort okay what's pancake sort it flips the list repeatedly and if it lands in the right sort order it's actually like a legit it's not like a fake one you can actually use it uh but it is based on the idea of flipping over steps stacks of pancakes yeah that's fine right like okay so jonathan blow has a fantasia oh god does he yeah so i was listening
Starting point is 01:12:16 to this jonathan blow talk and this is going to segue into my other rabbit hole um uh it's going to segue um jonathan blow was talking about how he doesn't visualize things in his head and how that forces him to work with artists in a certain way and i found this all very interesting and kind of unnerving forces him to make terrible mistakes yeah and to keep bringing up the witness when he shouldn't that he's you know sworn to stop bringing up the witness and so so he has this game that he made called Braid. And in Braid, there's this time travel that you can do where you can like, you know, be running around
Starting point is 01:12:50 and jumping on enemies and doing all these things. And then you die. But when you die, you don't start over again. Just time pauses. And it's like, hold down this button to rewind time. And you rewind back. It's like playing the tape of your gameplay backwards. And whenever you want, you can stop rewinding and then go oh okay i'm gonna pick up from here and keep playing
Starting point is 01:13:08 and it does all sorts of wonderful things with this time travel mechanic and the way that blow had to implement this time travel mechanic was you know he took a couple of different attempts at it and settled on this idea uh that's very much like the way that you know um h.264 and other video codecs encode their frames where you have occasionally these full state snapshots where it's i think in in blow's case it's every two seconds or something like that every every uh 60 frames if the game's running at 30 frames a second, you have these full-state snapshots. And then in between those full-state snapshots, you have a series of diffs.
Starting point is 01:13:49 So the full-state snapshots are useful because if you're rewinding in the game, you can rewind just at normal speed or you can rewind like super-duper fast. And if you're rewinding super-duper fast, it's handy to be able to say, jump to this arbitrary position in the past grab the full state snapshot and then if you didn't you know mean to land exactly on the on the moment in time
Starting point is 01:14:12 that that snapshot corresponds to you can just apply some diffs onto that snapshot and get to the exact point in time you want it so if you're going you know in between two snapshots you just use some diffs to get there which is also how video encoding works right where you have full frames that are stored i can't remember if they're the b frames or the i frames i should have looked this up but you have these full frames that are stored and then in between storing those full frames you store diffs that apply on top of each other and in the case of video encoding, those full frame snapshots are sometimes unevenly spaced.
Starting point is 01:14:49 Sometimes they only store one at the very beginning. Sometimes they store one whenever they detect that there's a big change, like, you know, oh, we switched from an indoor night scene
Starting point is 01:15:00 where it's very dark to an outdoor broad daylight scene where it's very bright. Instead of packing in a gigantic diff that's going to handle that scene change we'll just put a full state snapshot there and that's that's nicer the thing that this video encoding has that is neat is data moshing which is this practice of only applying diffs, but applying them to the wrong content. And it's this technique that's emerged in, you know, like people doing video collages
Starting point is 01:15:33 and remixes and art and all of these things where they take this video encoding and they apply the diffs on the wrong content and you get just weird, surrealistic, melting, trippy, glitchy visuals out of it it's a case of using the fact that diffs are not robust for an expressive quality where if the video encoding had only ever stored full snapshots that would be the like optimally robust thing but it would be slow right that's a lot of extra data you're
Starting point is 01:16:05 storing every frame as a full frame that's kind of like an unencoded video instead most video frames are like very similar to the previous frame and so they just store little tiny diffs and that was an easy efficiency win but that efficiency win gives you an opportunity to hack the matrix and and come up with some cool visual result and we don't often want that in our software because that's like malware or corrupt data or whatever but there are cases where you actually do want to give up robustness because it it gives you a generative interesting result that you wouldn't otherwise get so that's like that's taking like the opposite view of this trade-off and seeing what can you do with it that's that's interesting the jonathan blow time
Starting point is 01:16:50 travel thing was just in there to irk jimmy that's not actually relevant to the example i wanted to give but and me and me and get out of here didn't irk me so oh oh just i'm robust against those kinds of no i'm efficient against it yeah i think that example is yeah explains the difference that's basically what happened with my little game engine ordering thing i was trying to do the diffs thing it didn't work so i ended up with some data moshing esque thing but but unintentionally so i just gave up on it and went back to full snapshots yep so i'm curious i think i mean i i there's you know little quotes in here in this you know position paper that we haven't talked about but i think we've gotten the idea here across of like this is let's put the robustness first let's not care about correctness and efficiency only if we have
Starting point is 01:17:53 to make these trade-offs that's it's important to do and that these things are related that like efficiency is going to come in tension with robustness. But I'm curious to hear your all's thoughts on like, what does a robust future of coding look like? Like what do you see as something that we could be doing where we started robust first? And on that note, I wrote some things down, some computing concepts slash things. And I want to know if you think they're robust first or not.
Starting point is 01:18:30 Okay. Yeah? So maybe this could help inform our answers to Jimmy's questions. I love this total non sequitur. Right. So first of all, is local first computing robust? It desires robustness against losing your network connection or against, you know, other other things that can happen. It's not just about unreliable networking, but it's also about like, you know, you don't surrender your data to somebody else. So if somebody else doesn't want to give you your data, Google, then well, too bad because you still have your data.
Starting point is 01:19:22 So it's like robust against the evils of capitalism, which is good. So I would say yes. Yeah, every time I try to think about this, I think there's so many different dimensions to be robust on that it's hard to give an answer. And so much of it comes down to implementation details. So like CRDTs, with the assumption that you always, you know,
Starting point is 01:19:47 end up eventually getting those messages, they are robust in that they'll have a, you know, they'll converge, right? We are not going to have non convergent data at the end, assuming we get all of the data. But if we never get that data data we might be wildly off what the final result is supposed to be um so that feels non-robust there uh and then though there's things like if you look at like the details of implementing these if you do like a hybrid logical clock for example it's a popular mechanism for trying to implement how do you get ordering in these things and hybrid logical clocks are not robust against somebody just arbitrarily saying it is the year 10 000 because from now on your whole system thinks it is the year 10 000 and it's sending all messages
Starting point is 01:20:39 that way well this brings me to my next question perfectly. Is permacomputing robust? I would like you to answer this question, Lou, because I don't know enough. Right. Well, I think permacomputing is a kind of computing that attempts to be sustainable and it takes it extremely seriously. And there's an emphasis on using old hardware. There's an emphasis on simplicity. There's an emphasis on being able to rebuild things from scratch. So I think in some extents, it is very robust. It's robust from like capitalism. It's robust from the end of the world as we know it because of climate change.
Starting point is 01:21:20 And it's robust against like complexity. You know, it's robust against things getting you know is it robust against things getting lost in translation stuff being forgotten but i think that it also goes at odds with robustness a little bit because there's this huge emphasis on being energy efficient but in robustness there's this assumption that we have energy we can use redundancy because you know we have electricity and computers are powerful but permacomputing is more focused on low energy it's more focused on slower computers so there's like a slight tension but there is an element of robustness there i think it would be good to try and make them compatible in some way yeah i'm glad you brought that up because that's one of the few
Starting point is 01:22:05 pink as in bad bad actly highlights that i have in this paper which is that actly says efficiency is key when demands exceed resources yet many computers are usually idling they could have been checking their work but that's off the radar for ceo software and just to refresh the memories over the listener ceo correctness and efficiency only and that like oh many computers are idling they could be using that time to be doing useful work to which i say uh i i turn them off yeah switch them yeah like why maybe why um i i don't think that that's a compelling argument in favor of of robustness versus efficiency i would rather and the permacomputing thing gets to this i would say that there are many things that we do in computing that have a tremendous cost against both efficiency and robustness and that identifying those things
Starting point is 01:23:09 and getting rid of them is something that the permacomputing culture seems really good at is like because of their relentless focus on simplicity and their desire to be usable on older computing they've gotten rid of tremendous amounts of that stack that like you know tower of babble that is modern operating systems and and software frameworks and all of that where it's you know the xkcd about the x86 cpu is screaming along and it's doing you know billions of calculations per second and the servo and gecko inside firefox are like you know doing this tremendous amount of work to render what ends up being just a gif of a cat jumping into a box like the gigantic tower from cpu up to animated gif is a place where robustness and efficiency both go to die and that if you thin
Starting point is 01:24:06 that stack out and start from you know much more primitive hardware with a very simple vm with you know some carefully written software on it you get a lot of robustness back and you get a ton of efficiency back and i think you can that's place where, where people are having some cake and eating it. And then even if they want to sacrifice some efficiency for robustness, they're still way more efficient than, than the modern stack. So I have a, I have my own creed that I just came up with like 30 seconds ago. Hot off the press. Yeah. It's the, I call it the better computing, computing creed, right? And I almost feel like I want to stick something even ahead of robustness. It's first be sustainable, then be robust, then correct, then efficient, in that order. Sustainable meaning?
Starting point is 01:25:01 Meaning, meaning permacomputing don't destroy the earth yeah then make sure your thing can't be destroyed yeah then don't destroy your data and finally do your damn job yeah that's that's great anyway i'm glad we settled that i have a i have one more thing to ask about in terms of is it robust? And it's a weird one, right? Is the Pfizer coronavirus vaccine robust? Whoa. This might be the first episode that gets censored
Starting point is 01:25:41 from a source other than Editor Ivan. Okay. I have no idea if it is okay because there's this insanely interesting blog post article it's an article and it's called reverse engineering the source code of the fiverr sars-cov- blah vaccine the fiverr covid vaccine somebody went on fiverr and was like hey can you gig workers come up with a covid vaccine yeah it's just like this horse stuff it's um it's this horse ivermectin no it's not ivermectin no no it's it's um ah pcp it's this horse tranquilizer and you take that and you have this wicked hallucination and you listen to the Fight Club soundtrack because it's really
Starting point is 01:26:29 groovy. I wish that happened. Dust Brothers, man, am I right? They're so good. They've got these retro sounding drum machines. They're like, Tung, tung, tung, tung. It's crazy though right this article is talking about this vaccine which is made with the programming language dna right and there is a hugely different set of priorities to this programming language than the ones we typically use right like let me read you this bit it says at the end of the protein we find a stop codon, a stop instruction, denoted here by a lowercase s.
Starting point is 01:27:50 This is a polite way of saying that the protein should end here. The original virus uses the UAA stop codon, but the vaccine uses two UGA stop codons, perhaps just for good measure. right? It gives the stop instruction twice. There's this redundancy because if there's something you really don't want to f**k up, it's stuff that you inject into your body, right? Yes. There's something else, right? The very end of the vaccine, it just ends quite humorously with loads of a's right it goes right it says the end of it all the very end of mrna is word i can't pronounce
Starting point is 01:28:42 this is a fancy way of saying it ends with a lot of ah even mrna has had enough of 2020 it appears oh so it doesn't need to have all this but basically every time this program this dna program gets run it loses some stuff at the end, just from wear and tear, right? So, they just chuck loads of ah on the end. And I just thought this was just a really interesting parallel because this is code that has been written on like an editor on your computer, but with a hugely different set of priorities. It's interesting that it's not running on a von Neumann architecture, right? Unless you want to get like, right? Because we're already in like metaphor land, right?
Starting point is 01:29:31 Like, oh, DNA is code, right? So we're already in the realm of metaphor to some extent, right? In the way that, and I guess it's equally metaphorical that like C is code, right? C is not code. Get out of here. Computer code is not code. I'm sorry um computer code is not code i'm sorry what no what um so it's interesting that we're already in this realm of metaphor and so we can we can probably say yeah the the human body is not a von neumann machine it might be a von neumann unless you are von neumann himself right yeah exactly yeah von neumann's body i mean it's not a machine
Starting point is 01:30:06 yeah well you know he's more machine now than man um this is like david actually has a definition for what a living system is and he has this it's like a dynamic persistent system and so he gives an example of uh little eddies in the water that are swirling around or living yeah in his definition yeah because they're these dynamic persistent they only last for a little while but they take energy from other places and so yes and he wants these robust first systems to be living systems in the same way that makes me so happy i'm like i you know what this paper is short and it's kind of weird it's not what we usually do i love david ackley this like this stuff makes me so happy i'm glad that there's you know somebody like like actually out there
Starting point is 01:30:57 pushing for computing to be different in this way and not afraid to get freaky with it like we spend i i in particular spend so much time on this podcast like lamenting how everything is a monoculture and how different i want things to be and here's somebody who's like actually genuinely doing it but like backing it up with the like you know look i did this like actual computer science work you know analyzing these different algorithms and submitting a thing to acm and like you know he's he's he's putting on the costume of a computer scientist to go out there and be like actually i just want to make life we stan dave ackley we stan We stan a complex king. You know, I think that's what I like. Like, I don't even know how far I agree. You know, I haven't even sort of come to terms with, wait, am I fully robust pilled?
Starting point is 01:32:01 But it really, it just makes my imagination go it makes me think beyond my normal like bounds when it comes to coding and it makes me to see things in a different way and that's what i appreciate about it and and i see what he's saying you know it's like i agree but like am i am i brave enough to truly dedicate my life to robust first find out next time on the patreon exclusive anyway patreon.com slash future of coding can i can i read something uh from from another one of his papers which is about von neumann von neumann von neumann for whatever all right right. Randy Newman. Randy Newman. 65 years ago, Randy Newman predicted that hardware determinism would soon be supplanted. But with design lock-in and network effects, it remains virtually unchallenged today.
Starting point is 01:33:03 Randy Newman said that, right? That's from Indefinite Scalability for Living Computation, one of Dave Ackley's other papers. I think it's kind of funny, like, even Randy Newman was against Randy Newman architecture, right? Even Randy himself, you know? So if the creator himself is against it, and maybe we should question it a little bit too. Yeah. Mic drop. I have to say, I think since we're giving kind of like closing statements or general opinion statements here on the thing,
Starting point is 01:33:41 I love that there's somebody focused on something very different. I love seeing different avenues being explored for sure. But this paper left me wanting more of an argument. I wanted to be convinced. I'm like, okay, robust first, but I don't know what we're aiming at and it reminded me of a conversation i had with kartik on the slack where he was arguing against dependencies basically like we shouldn't depend on like anything yeah like even the operating system that you're using or the hardware that you're making which feels like a almost like a similar kind of you know vibe here warms my heart not exactly the same but a similar vibe and i didn't get the argument until it was kind of just put as like a instead of practical terms more of a moral stance right like this is just how things ought to be
Starting point is 01:34:42 like this is a better world where we don't have to depend on all of these things. And I guess I just don't know how to take this. Is Ackley giving us a new way of living? A new way of being that is kind of a moral life way? Or is he saying, we're going to have better practical applications at the end if we go this route? And if it's the latter, like this is going to be better and more practical, I don't know that I'm convinced of that. If this is instead more like I see things like permacomputing and these sorts of things,
Starting point is 01:35:15 right, like an ideologically driven way of doing this, I might be more on board with that. Like even if I don't follow it I might see the logic more and that's what this paper really lacked for me was like how do I get into this mindset how do I decide that robust first is for me is it that I'm looking for a problem a solution to a problem or is it looking for you know some vibe, some different way of being? And when I listened to some of the other things from Ackley, I definitely got more of that sense of like this is about vibe. This is about changing our relationship to computing. It's about that sort of, and I wish this paper had more of that.
Starting point is 01:35:58 I just want to know, like what is the argument here? Why should I buy into this? And if it's just on practical considerations it just seems to fall a little flat i think it's got to be more than that ivan are you robust pilled and by the way yeah i i totally see where you're coming from from there jimmy i agree but at the same time i'm reminded of the upton sinclair quote that it's very hard to get uh jimmy miller to understand something when his jimmy millerness depends on him not understanding it and that's embodied in the quote from this paper today's computing base is optimized for the perfect logician that's jimmy jimmy is the perfect logician like have you seen him he's perfect
Starting point is 01:36:47 he's also very logical um and uh yeah that that that explains to me why he is not so easily robust pilled as the as the two of us you know me and lou who are not very logical beings excuse me no it's probably true yeah this is how we got here i mean i'll happily admit like when i read that i was like but you can do all of this with logic too right like that is my first my first reaction right that doesn't strike me as like oh but why why give up those parts right that uh-huh your your argument's irrational it's not comprehensive enough i need more supporting data and you know yeah please explain your thinking in a more robust way i will just say i also recoil at the word robust but i i got rid of that for this one because in industry when people say robust it is just it stands for everything i dislike aka like java enterprisey stuff right
Starting point is 01:37:47 i know that's not what we're talking about here but i had to like yeah at first i was like robust no and then i realized what what this means i usually recoil at it when i see it too because you know i think no but you're not really robust are you you know where's you can't slice your computer in half what if i drop a nuke on you yeah you know um when i when i presented cell pond in at splash one of the one of the most common comments that i got was no but it's not really non-deterministic, is it? It can't be. You know, how do you, but you had it doing multiple steps. You can't do that. You had it being true and complete. You can't do that on a non-deterministic system. And the number of times I said, no, no, no, you can. Look, let me show you.
Starting point is 01:38:41 There's something that just doesn't sit well with with some people and me at first as well i think that's why it intrigued me so much it doesn't sound like it can work and there's an element of that that can put me off and there's an element of that that makes me want to find out more and through finding out more i've to my surprise discovered oh my god it can work yeah it's it's this thing like actually here is doing what i often try to do with visual programming which is to say there's this way that people approach writing software and building things on computers where there's there are trade-offs and dimensions along which they can vary their approach that they're not even conscious of and so they they automatically do things in a certain way
Starting point is 01:39:30 and it is first of all value to point valuable to point out that those things could be different right like we could instead of using text as the interface through which we express our intention to the computer we could do it through something not text. Or instead of prioritizing efficiency and correctness, we could also consider robustness. And that first step is like establishing that there is a choice that has been made here at some point and that you have the option to make a different choice. But then the next thing that I think most people when they're confronted with that immediately write it off it's not a binary choice it's not like an either or it's that you can now that you've identified that there's this other way that things could be you can try to have the best of
Starting point is 01:40:18 both you can you know why are we comparing bubble sort versus quick sort and merge sort well because nobody's made robust sort yet right like there might be a like not you know n squared billion billion comparisons bubble sort slow awful thing there might be something that is robust in the ways that we care about and fast not as fast as the thing where you give up robustness but fast enough for most practical purposes and the same with visual programming versus textual programming you don't have to completely get rid of textual programming you can have a little bit of visual expression maybe in your debugging tools or in how you visualize live data or in how you break up the problem domain
Starting point is 01:41:05 and say, okay, my overall architecture is going to be expressed visually. And then my small, like, you know, in the small programming is going to be done textually or vice versa. There's like lots of different things that you can do once you're aware that this trade-off exists. And so I appreciate this Ackley paper
Starting point is 01:41:24 and this idea of robustness not as a you know oh hey i have to give up on one thing in order to have the other thing in totality but rather that this is a a dimension along which i can make a conscious choice and i really like that this is like avengers end game for me You have to give up half of all programming. No, it's like, you know, Smash Bros Ultimate or Spy Kids 3D at the end when they get everyone to come. It's because this is like this crazy crossover episode for me because I'm a huge fan and follower of all of the robust first stuff.
Starting point is 01:42:03 And I listen to the future of coding podcast and now it's like both in one it's like the crossover episode you know in comic books where like spider-man saves someone else you know i'm i don't know i have not watched any marvel movies but yeah i haven't it's's Ivan Rees talking about robust first computer. What's going on? What's going on? It's crazy. Well, it was nice having you on as a guest for these two episodes, Lou.
Starting point is 01:42:33 Oh yeah. Where can people find you on the internet? I know you're on Patreon and some other places. Toadpond.com, baby. Toadpond.com. There's only one place now. Toadpond.com. Getting slippy. Getting slippy. Yeah. Yeah.com, baby. Toadpond.com. There's only one place now. Toadpond.com. Getting slippy.
Starting point is 01:42:46 Getting slippy. Yeah. Yeah. Yeah. Yeah. It was good having you. And you'll never guess who our surprise guest will be next episode. You'll have to just tune back in and see who it is.
Starting point is 01:42:59 Who are you getting? Rhymes with Onathan Rowe. It rhymes with Jesus Christ. Jesus Christ.esus christ we can't wait yeah jesus christ cheese and rice i just like the fact that people's names rhyme with their own names so you can just say their names and it rhymes with and you're still logically correct look at that perfect logic luke either's fine rhymes with luke any rhyme did the crime did the crime oh this is this is scraping a very new low for us the episode ended half an hour ago stop recording stop recording stop recording stop recording stop recording stop recording Stop recording.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.