CoRecursive: Coding Stories - Story: Portal Abstractions with Sam Ritchie

Starting point is 00:00:00 Several years ago, Twitter had this problem that may sound familiar. The problem is big data versus fast data, or batch processing versus real-time stream processing. Probably because of the scale that they operate at and the real-time nature of the Twitter feed, they hit this problem earlier than the rest of us. Batch is very efficient. You can calculate things on years of data, but doing the calculation might take a day. Real-time is much faster, but you can sort of only work forward in time. Really, you need both these things. If I want to look at my most liked tweet, I need to look at all the old batch data, but also any real-time tweets that I'm doing. So it's 2013. Sam Ritchie, he's a mechanical

Starting point is 00:00:46 engineer by training. His uncle is Dennis Ritchie and he's working at Twitter and his job is just translating from one system to the other, from real-time to a batch job. And there's just so many jobs that looked like that. And it kind of looked like that was going to be my life, like just coding these things. He just said, enough is enough. We need to figure out how to write one piece of code that will run in real-time world and also in batch world. I guess I've come at the problem from that side where it's like, oh, I can calculate this thing, but I just started calculating it and the world existed before. Yeah. The backfill problem is hard and it's hard like it just consumes your life writing, you know, backfill jobs. And you kind of, you stick to really simple things because

Starting point is 00:01:29 you know, you're gonna have to write them twice. You're gonna have to maintain them twice. I mean, it's really not a nice way to live. Hello, and welcome to Co-Recursive. I'm Adam Gordon-Bell. How did Sam get away from these one-off jobs? How did they solve this fast versus big data issue? An issue that many are still struggling with. The answer isn't some new data processing system that he's here to shill. The answer is actually abstract algebra

Starting point is 00:01:54 and probabilistic data structures. If you don't know what those are, don't worry, we're gonna walk through it. We're also gonna talk about what Sam calls portal abstractions. That's finding abstractions that let you leverage work from other fields. But we'll do that at the ending. Let's start at the beginning when Sam was working away at Twitter.

Starting point is 00:02:14 I was on the revenue team. I had a colleague named Oscar Boykin, who I didn't know that well. We both maintained one of these libraries I mentioned before that lived on top of Hadoop. He had the Scala version. I had this Clojure version. And kind of yet again, I had a task for work that was like building one of these dashboards. You're trying to count something like tweets per user per day. You're just basically grouping on some key and then adding numbers to a database. And there's just so many jobs that looked like that. And it kind of looked like that was going

Starting point is 00:02:44 to be my life, like just coding these things. Through this or that, Oscar and I teamed up to work on some shared piece of machinery for serialization between the code we both maintained. And we realized that we both were doing this sort of thing. And he's got a lower tolerance for BS. He just said, enough is enough. We need to figure out how to write one piece of code that will run in real-time world and also in batch world. This happens with compilers. This is a compiler problem. Let's just back off and solve it. So they solved it. They built version one of this open source analytics system. System could do

Starting point is 00:03:21 batch. It could do real-time, sum the results together. They called it summing bird. It's a pun, you know, it sums things and it's from Twitter. So it's a bird, summing bird. But then they had some interesting revelations about what they had created. You know, if you really simplify what we're dealing with here, you're writing code that is generating for some key tweets per user per hour, something that happens. It might just be tweets per user.

Starting point is 00:03:48 It might be how many users we have. And then you have some value. You have some counter that you're ticking. So you're just incrementing this thing up. And so many machine learning features and dashboards, everything is just like ticking counters. That's like the secret of analytics work, is you're just adding ones. That seems suspicious too. That's something you can maybe break out of. Like, really? Is that all we can do? Like how would you do more complicated things? But so this software package, Summingbird, was a library that let you write a logical declaration of what you wanted to happen.

Starting point is 00:04:17 And then the second component of it would take that data structure that you've built and go run it on any number of these different underlying platforms. So I can power a dashboard. I can do backfills. I have this boundary that's maintained transparently to me that behind the scenes will give a hard line between the massive multi-year database and then the last couple hours that are stored

Starting point is 00:04:39 in some much harder to manage, much more fragile, but very, very fast online processing system. And phase one was just write something that can do... You know logically you're doing the same thing. You've been rewriting the same code. Just have the machine write it for you. But it does open this door. And this is the topic I wanted to get to. It opens the door. You start to look at this thing and think... You know, you have these two buckets. One is like all, all time before a couple hours ago. And then you have a bucket for each of the recent hours. So you're doing this addition of some of numbers, but you're sort of putting parentheses in one case around like years and years of data. And then another set of parentheses around like each of the previous few hours.

Starting point is 00:05:26 And then you take this final step of adding them together. If you're calculating how many times I've tweeted, like Hadoop gets like everything up to yesterday. And then I'm adding that to something that's actually getting the real-time data and just counting. That's right. Those are nice ideas, but they're not that, they're sort of obvious and not that powerful. But this idea of adding things and putting parentheses wherever you want

Starting point is 00:05:51 also seems kind of innocuous. It's like this little lamp you pick up, right? Like it looks kind of, you know, you rub the lamp and what you find is like, this idea is not, it's not your new idea. It's very simple, but it actually exists in this field of abstract algebra. All right. Abstract algebra, if you're not familiar with it, it's a subfield in math. Stick with me here. I'll explain a little bit. A lot of concepts from abstract algebra can be implemented in a programming language.

Starting point is 00:06:20 Semigroup is one of these. It's an interface. It has one method. That method is add or sum. If we an interface, it has one method, that method is add, or sum if we want to stick with the summing bird pun. It also has a rule that things have to be associative. You might remember associativity from a high school math class. What happened here is SAM had the system where to calculate analytics, calculation had to implement a certain interface, and then the system could run it in both worlds.

Starting point is 00:06:50 The interface had an add method, which meant it was a semi-group, which meant this subfield of math with people talking about seemingly obscure constructs could suddenly enable him to answer interesting questions in his real-time analytics dashboard thing at Twitter. So yeah, you can start slimming things back. And when you've slimmed it back almost to nothing, you're left with this object called a semi-group, which is an idea of, okay, I need some set of things. So numbers in what we've been doing, which is just counting. Some way to quote, add two of them together and get something out that's like still the same type. And then a test that goes along with that. So the test is that I have to be able to do that associatively. So this is kind of, it sort of sounds like a pedantic mathy thing. Like there's this impulse that I'm sure we can get into in the functional programming

Starting point is 00:07:35 world to like, like see ideas that seem mathy and like slap math names on them and just tweet about it and start like spraying that. And that, that sucks. But I think it's the, the, the thing you get when you do that, when you identify this like mathematical concept is it's not your, because it wasn't your core idea. People have been thinking about this. It turns out for a long time, you have potentially hundreds of years of work that have gone into answering questions of what can I do with types that are able to implement a plus method

Starting point is 00:08:06 and then a single test of associatively calling plus. That is a very, very tiny interface to satisfy, but there's a huge amount of work on all the things you can, one, all the things you can do just relying on those two properties and two, like a zoo of data structures that all hold those properties. And yeah, so by backing out what we built and making it not just about numbers, but about this thing in Scala, you might say, I want a type where I can implement a type class called semi-group for that type. You know, if you just make that one change, suddenly you've kind of opened this portal into this portal into, again, this whole zoo of data structures. Anything that matches this tiny contract will fit into your model.

Starting point is 00:08:53 That's true of any abstraction, but this one was special because when you turn and look at the computer science literature, you find things I never would have thought about or never expected to work in the context of an ad dashboard job. Suddenly, we were able to plug into this thing. So the work became less about how do we go manage real-time and batch and this kind of boring suit and tie stuff to, oh my God, we've gone into into Narnia and suddenly there's these like approximate, you know, sketching data structures where I can maintain, I can, I can see, I can feed items into it and it'll give me account for the unique number of items seen. And it can do that up to billions and billions and trillions of items. And it just won't get any bigger. Like it doesn't actually have to store them. Like how the fuck does that work? That's a different question.

Starting point is 00:09:46 All you know is that you can take two of these things, add them together. It works associatively. And so they suddenly become candidates for running on years and years of tweet data or Twitter data or any large-scale data set. And the results you know will make sense, will not have any errors, and will be real-time updatable. So you went out to solve this problem of real-time analytics, and your solution is a semi-group. That doesn't seem obvious to me, I guess, that that's a solution to the problem of analytics. Yeah, that's a semi-group and then the monoid.

Starting point is 00:10:24 I mean, it doesn't stop at the semi-group. Yeah, it doesn't seem obvious that that's the problem to, that that's the solution to analytics, but it is. What started to happen was you start to realize that, okay, well, it's not the solution to analytics per se. What it is though, is, you know, adding things together associatively, this seems to be the key that unlocks being able to store data in multiple places and merge together, you know, your results when you want. So being able to distribute in space or time is tied somehow intimately to the associative property that we know from elementary school. Like that's kind of odd.

Starting point is 00:11:04 And why is it, why do I say it's like intimately tied? Just because if you ask what you need to the associative property that we know from elementary school. That's kind of odd. And why do I say it's intimately tied? Just because if you ask what you need to go do that, it's really just this one simple property. Let's do a tiny recap. So if we want to run calculations in real time and in batch, we need a common interface. That interface turns out to be semigroup from abstract algebra. And that interface is important whenever you need to distribute calculations. All right, next up in solving analytics is how you deal with data that may be missing. We're going to change our interface to handle that, which will bring us to another interesting concept. There's other properties like the idea of a missing value. If you're querying multiple databases and data might be missing, like how do you deal with that? Well, okay. You probably need some notion of like,

Starting point is 00:11:49 you know, a zero. So if I'm adding numbers, like adding zero doesn't do anything. That's fine. If I have like lists of tweets, I'm merging together. I can have all these nil checks or check for none or have optional types. Or I can know that for a list, if I, if I concatenate an empty list, nothing happens. So my code becomes simpler now because I've got this idea of an empty data, an empty element of the type. So you can start to see that a lot of data types have this idea, like a set has this idea. Numbers, of course, have this idea. For multiplication, you kind of have this idea, or you do.

Starting point is 00:12:25 It's just one instead of zero, so that's kind of odd. But in fact, there's another thing called a monoid, which again has this in-your-face name, but it's just the same as the semi-group, but added on is this extra method you implement called identity, which is giving back a thing that if I pass it into plus with something else, it just won't do anything. So again, a very, very simple idea, but what you get out of it is suddenly the ability to handle missing data. And that comes up all the time in dashboards. How do you represent data that's not there yet? All right. So we have semi-groups so that we can distribute work. We add one more method to our interface and we get monoids so that we can distribute work. We add one more method to our interface and we get monoids

Starting point is 00:13:05 so that we can handle the absence of data. This is still a very small interface to enable distributed computing. It's a very well-defined interface, but I really think of it like it is like a portal into some interdimensional transport system. I used to think when I was thinking about

Starting point is 00:13:23 how do I become creative? What do I want to do in software? And you want to make original things that no one's done before. You want to think when I was thinking about how do I become creative? What do I want to do in software? And you want to make original things that no one's done before, right? You want to make these things, like crack the door open on something no one's thought about. But I think more lately that that, I mean, that's kind of lonely. If you do that, you actually succeed and make some like 2001 obelisk and that's exciting but it's it contrasts that with if i managed to build like a transporter gateway from like star trek right and you look through and like you weren't creative at all like you just made a thing that like million you know millions of other galactic

Starting point is 00:13:58 civilizations have made before like that's good now you get to plug into the network like coming up with abstractions like you know figuring endpoint to a website or the packet format required to talk to the web, that's what picking an interface out of some incredibly well-trod field like abstract algebra does for you. Okay, this metaphor is great, but I think it needs a little explanation. The monolith is from 2001 A Space Odyssey. I don't think anyone understands it, but it's powerful. This is like coming up with your own unique solution to a problem, a solution that no one has thought of. But the transporter gateway from Star Trek or an HTTP interface is less creative. Perhaps you're implementing something that someone else already built. It's not really a new discovery,

Starting point is 00:14:50 but you get to draw on all the existing solutions that exist for that interface. You transform your unique problem into a known type of problem where known solutions exist. This is what Sam is calling a portal interface. What was on the other side of this portal behind your, your ad function? I mean, we, what, what we got, we built a library of all the things we found. The library is called Algebra. You know, very concretely we got, I mean, the thing I'd never seen before was this whole zoo of data structures that the core idea is that if you don't really care about your exact value that you're accumulating. So for numbers, maybe I want to counter, but I don't really care

Starting point is 00:15:31 that it's exact. I'm happy with 0.1% error, maybe a hundredth or a thousandth of a percent. It turns out there's this whole field of research on data structures like this, where if you can give up a little bit of error or a little bit of accuracy, you can get often two orders of magnitude. You can get 100X space savings on this thing. And that's so outrageous that, I mean, that took me a while to even understand what the hell was going on. So why does the amount of space matter? I can make a monoid that just adds every Twitter user together. Okay. This is a great point. It's not even that it doesn't have anything to say about what you can add. It's that you can plug things in that will just shatter the system.

Starting point is 00:16:16 So this example you gave, if I'm trying to go, say, I just want to keep lists of everybody's tweets. I decide to group on a user. And every time a tweet comes in, I make a list with the single tweet in it. That's my thing. How do you add lists? You just concatenate them together. No problem. So what you find is that most of your database is empty because most people don't tweet the tweets they're putting out anyway. And then some people just have these huge amounts of tweets they're pumping out. I mean, some are bots that are just hammering out tweets every couple of minutes. And so you get these incredibly skewed keys in your database. Some of the values are just getting bigger and bigger and bigger. And there's nothing in your system that has limited this from

Starting point is 00:17:00 happening. So when you're running some system that sometimes is fetching nothing, the default value, and sometimes it's fetching like dozens of megabytes of, you know, tweets and then filtering on them. Like this is in some sense, orthogonal from your original problem. Like that's totally logically fine to do. It still fits the interface of the totally fits the interface and it'll fit the database for a while, but it's not everything you meant. There's some problem there. And the problem is that in almost all these systems, definitely at Twitter, there's just skewed keys everywhere. Somebody's got the most followers. And so when they tweet, you've got to fan it out to everybody. And that just

Starting point is 00:17:42 hammers the system. Whereas maybe when I tweet, no big deal. The system doesn't notice. Okay. So why would you accept like an accuracy loss? Like, yeah, I want the total result. Like I want the full thing. I want to know how many followers I have. I don't want to know how many followers I have, like plus or minus 1%. Maybe not though. So it turns out if you can, well, the problem you're trying to solve is like, how can I track counters and deaden the effect of these massive explosions of a particular key value pair? You get it for free with something like a counter, because people have done a lot of work to make sure that, okay, all our numbers, like up to some massive amount are going to use the same amount of bits. Yeah. If it's a long or something,

Starting point is 00:18:28 it can only get so big. And if you want to double its size, like just add a bit, no problem. Why do we just count numbers? Like it's easy. Well, why is it easy? Like, well, a lot of problems are solved for you just because of the architecture you're inheriting about how numbers are represented. Like if numbers actually took a ton more bits, if we hadn't figured out like how to write things in binary. Counting would be harder. Yeah. Yeah. So counting lists, like adding lists is pretty hard or sets. Let's have that example. If I want the set of how many followers I have, how many unique people have seen my tweet today? Well, what's one, how would you implement that?

Starting point is 00:19:02 I just add them to the set and then I can combine sets by just getting rid of the, like doing a distinct. Yep, exactly. You know, if everybody had roughly the same number and it was small of people that saw their tweets, but sometimes, you know, there's just huge amounts. So the distinct set of people that have seen your tweet is just massively larger than, than the average. So you get this massive set in memory and you're serializing and deserializing it every time. And there's two ways you can go. One is you can start to build in these special cases into your system where the abstraction starts to leak

Starting point is 00:19:37 and you say, well, I can't really tolerate this. So it's not just a type with a semi-group, it's like this other thing and there's more constraints. That's fine. If you can accept a little bit of error, like if I don't really care if my count of people that have seen my tweet is off by 10, which honestly I don't like, I mean, in that example, like data gets dropped all the time. Like if you see, if you hit like on my tweet and then your phone's offline, like there's already error just built into the universe. So if I accept that and I just live with it, I can reach for a data structure like, here's

Starting point is 00:20:12 the buzzword. There's this thing called the hyper log log, where if you allocate this thing, some very small amount of memory, you can get something like 99.9% accuracy on account of how many unique things you've dumped into this. So it's an approximate set. You add things to it, or you sort of put things into the set, and then you can ask it the question, how many unique things have I seen before? And it'll tell you, and it'll be almost right, and it won't get any bigger. It doesn't seem like it should be possible. It doesn't seem like it should be possible. And if you try, if you thought of that idea,

Starting point is 00:20:51 when you were working on your analytics system and you said, yeah, it'd be really nice if I could just like count this thing and like not have the set grow at all. Like you're not going to go take a few months and go off and figure that out. Yeah. It just sounds impossible, but somebody figured it out. And then somebody, maybe the same person, but somebody figured out that, oh, if I have two of these things, I can add them together. So I can track, you know, I can track like users for a few hours. I can track my distinct counts. And then if I have another set that represents like stuff I've seen before,

Starting point is 00:21:27 you know, I can merge or at a later time, I can merge those two together. And the result of the merge set will also satisfy the properties that I had with either of the two side ones. And then we can distribute it. Yes. Yeah. Then you can stop and you can save. You can like save your state and then you can load it up again later and keep going. And that's really all we want to do.

Starting point is 00:21:46 We want things where you can pause and wait a while and then load it back out and keep going. And yeah, these approximate data structures get you that ability. If they have that ability, then you can plug them into a system like Summingbird that's running these massive analytics jobs and things will just work. And you'll solve again, your system's problem of heavily skewed key distributions that will just go away the same way it does when you use counts.

Starting point is 00:22:15 All right. So we have our simple interface for real and batch, and it turned out that it already existed in abstract algebra. It was the monoid or the semigroup. We found this portal abstraction. We rated the research papers and found probabilistic data structures like the hyperloglog that were monoids and run in fixed space. But I wanted to ask Sam about this pet topic of mine.

Starting point is 00:22:39 Do names for math help or hinder adoption in software? I just imagine you, you know, standing up and being like, HyperLogLog is a semi-group and everybody, nobody knows what the hell you're talking about. But you're like, no, this is important. I absolutely have the reaction that you're saying. Like at first I was kind of like, I had to write this job.

Starting point is 00:23:00 Fine, we can do it this way. But then it just started to get like more and more clear that we'd gone down some rabbit hole that was actually not just abstraction for abstraction's sake. I had a few experiences of going out and finding papers that, again, implemented these. There was an approximate sliding window counter. Would I have found the paper? No. Would I have taken the time to implement it? No, absolutely not. But aiming to implement these interfaces and pass these tests and then being able to immediately turn around and have like an approximate sliding window counter that would just work with Stripe's like entire machine learning feature generation interface. Like I could take this thing, put it in the cupboard, write a nice doc string for it, like write a little pitch for why you might want to use it. And it would just work. There's no

Starting point is 00:23:49 sort of, that doesn't look like it would work in an analytics system. Like that just goes out the window. It just will, you know, we've got the test to prove it and, uh, you know, pull it off, see, see what you can think of. Yeah. Yeah. Like it seems so non-obvious to me and i don't i don't really live in this world so maybe it's not non-obvious but um yeah i don't know i hear people talk about like fast data and big data and pipelines i never hear anybody say like hey if you can make something a monoid then you can like calculate it either in batch or in real time and you can combine it and all you need to do is meet this interface and that's it yeah well you heard it here no i look i'm i'm with you and i think so

Starting point is 00:24:33 i listened to your podcast with uh dhh and he was talking about ruby and you know when he first picked up you know ruby like this this emotional sense he had and that really got me thinking about like why is it that this idea is not more out there i mean it's not a tough idea in that if you if you didn't need it to yeah if you just write the test down and you encountered it you wouldn't find it to be you could do that code review no problem but there's this like aesthetic sense with certain abstractions and there's something about like pulling abstractions from math that sounds i don't i don't know i mean i'd love to hear you're in the functional programming world like functional programming has this bad rap of

Starting point is 00:25:17 just you know like it's all about category theory we need to shove functors and monoids and and monads and and you know if you don't get it like here's this category theory. We need to shove functors and monoids and monads. And if you don't get it, like here's this category theory textbook, you go figure it out. What we were trying to advertise was here are the names of these things and the names themselves are important because you're going to find these names when you go on the hunt for stuff you can plug in. Right. If you call it addable, you have this problem of, okay, what do you solve?

Starting point is 00:25:46 You make it more comfortable. And if I have a preexisting library of things that I can plug in, like this is great. This, I can look at the name addable in the function slot, the parameter, you know, type, I can go look at the library and I know what can fit into what. But what you lose is this sense that you're plugging into this larger you know this mind like that you can go down and find new things and so for someone who's actually looking to like expand the range you you know i think it would be not wise to change the name to something more comfortable because what you might do and here's something that happens you know i might pick up or adam you might pick up like this thing and you might go,

Starting point is 00:26:27 well, okay, I'm going to make a new data type. Like I have addable. That looks pretty easy. It's got a plus method on it. And I can implement my thing. You know, I don't pass the associative test, but like that doesn't really matter. I'm still an addable. You know, I can still have him at plus.

Starting point is 00:26:44 So I'll make my thing work. And like, I'm just going I'm still in addable. You know, I can still have him at plus. So I'll make my thing work. And like, I'm just going to ignore the tests and no problem. I just won't implement that test for me. But like now you're in dangerous territory. It was so tight. It was such a poetic little interface that when you ditch one of the two lines, like you're totally off the map now. But I, it really is tied. I think to this idea of like

Starting point is 00:27:05 the aesthetics of an abstraction, like there's an aesthetic response you have to some people have an aesthetic response to these mathematical abstractions and go like, holy shit, like this, I'm plugging into something big. And this is a, I'm so happy this post was here. I have no intimidation at all. And some people go, I kind of remember getting my ass kicked in eighth grade, like in algebra, you know, like, is it really that again? I think, yeah, there's cases where people are like, maybe like overly extraneously using terminology, but here it's like, it's actually the key to running things. It is paying weight, I guess, in actual business use cases. That's right. I mean, a thing I'm really passionate about and the reason this stuff's important is you want to go mine the literature

Starting point is 00:27:51 of what other people have done. You want to go be able to plug these things into your work and really just benefit from this incredible community that's been cranking for, again, maybe hundreds of years. But then you're turning around and you're presenting this aesthetic thing. And yes, it matters like what the references are to the past, but it also needs to kind of present itself, uh, you know, as its own thing to use. Like you should ideally like good design is about giving people an on-ramp at every level of engagement they want. Yeah. You know, like experts only is like fine. But if you're trying to build something that's accessible across the, you know, the entire

Starting point is 00:28:33 range of experience and like you find yourself confused about why monoid and semi-group and field are not like doing it for people. I think there's, there's more we need to learn there about how to go use these incredible minds of abstraction resource in modern code. I think this makes sense. These concepts are super valuable and these concepts already have names. Maybe the names aren't the problem here. Now I know how semi-groups can model distributed calculations, how hyper log log can give me fixed overhead. But how do I find these abstractions on my own? Like, how do I repeat this trick and find my own

Starting point is 00:29:10 portal as Sam calls it? Like if I, if I go through and I extract some interface for everything that has a name, like, you know, a dog is a name, a car is a name. And like, how do I know if that's a valuable thing or just me, yeah, wasting time? Yeah, it's hard. That's the thing we all deal with as programmers. Like, how do you know? I was thinking this the other day on a walk that, like, I wonder if conspiracy theorists would be like great software developers, would just be so sensitive to like abstraction and, you know, you're seeing patterns everywhere. Like there's probably some dial in our brains that cranks up or down and it's hard. I don't think an abstraction can like tell you like, you know, what you just

Starting point is 00:29:57 described, like extracting name for everything. Like maybe it's good, maybe not. You need a thousand examples that you look at and go like, I think I've got something really powerful here. And if that gets you excited, you should do that. But if you simply, if you want to make your search process faster, then there are these other fields where people have been thinking that way for a while. So there's this great talk about like Richard Feynman. yeah yeah totally tell the story though tell the story i think i know what you're doing so richard feinman like collected all these problems over the course of his life um and he said like that was the secret to him being so successful is like he had all these problems and then whenever somebody mentioned some new

Starting point is 00:30:42 solution like he would just go through his list of problems and see like if it solved them all, which I guess is kind of what you're talking about, right? It's like, will Monoids solve this? Try it on. Maybe it's a horrible fit. Wow. I love that. That's great. Yeah. That's a brilliant Feynman story. It's like, yeah, he says he'll get a click sometimes and go, ah, here's the connection. People go, how did he do that? Well, most, you just don't tell anyone when, when you don't get a hit. Yeah. Yeah. I love that. Yeah, absolutely. You have this, you have a solution, you have some interface. If you, if you learn about some abstraction that seems powerful in another field, like go backwards, say, does this apply to what I'm doing? Is there, you know, forget if it seems natural or obvious, but like, what would it mean if I forced it in? What does this all say about the future of software development? How should we think about this idea of importing these portal

Starting point is 00:31:38 concepts? Yeah, I think that the clue I get from this is that, I mean, you're trying to solve interesting problems. You're trying to solve interesting problems. You're trying to go expand the range of what is possible for you to build. If you buy this idea that these things just kind of lead toward greater complexity and interest, there's always more to learn. There's always more to do. One way to make progress is to go make new artifacts like new new examples new kind of works of art almost we're trying to build these these things like spun out of our thought and you know that is that's really powerful that's really like what it's all about but in fact there are other fields

Starting point is 00:32:22 that have been obsessed with this idea of, you know, structure and relationships between things. And, you know, physics is one, math is another. I think that all of these are just these incredible cupboards we can raid of ideas that most of which were invented before, like the modern software era. You know, one way to move forward is to really use the hundreds and hundreds of years of work that have already been done to give ourselves hints about, you know, we effectively have like an alien civilization that we can raid.

Starting point is 00:32:57 And that's like our own work before the 60s when structured programming just became a thing. So I think to go forward, like there's always going to be new discoveries to be made, but one very, very fruitful thing to do is to turn around, look back and find these things and say, well, is there an interface I could discover that someone's already found that would let me just plug into this incredible, almost battery of human creativity that just exists waiting for the taking in maybe dusty old papers and books, but it's there. No one's hiding it. We started with fast data versus big data. We hit abstract algebra and probabilistic

Starting point is 00:33:40 data structures, but these were all just examples for Sam's idea of finding another field that's already solved your problem and pulling in those ideas. Sam is actually working on this right now in his latest side project. He's looking for more of these portal interfaces into math. So I've got a project that I'm about to... I spent three or four months in this before my current job and I'm about to restart it. But it's this like re-implementation of a lot of the core reinforcement learning algorithms, but using like totally hardcore functional programming style. So it's like you're pulling the same trick or you're attempting- Trying. Yeah. Yeah. To see if I can be like a one trick pony, but from,

Starting point is 00:34:20 you know, like with my math trick. I don't mean it in a dismissive way. I mean- No, I'm on purpose doing it to test this theory we're talking about, about like, if you're a one trick pony, but like your trick is like opening the portal, like I just keep doing that. Very cool, sir.

Starting point is 00:34:37 Well, good luck surviving the- Yeah, this is all assuming. Assuming we survive. Yeah, you too, Adam. Good luck with this, man.. Assuming we survive. Yeah, you too, Adam. Good luck with this, man. So that was the show. If you have an interesting story of a solution to a problem like Sam's, let me know. It doesn't have to involve math.

Starting point is 00:34:54 Adam at co-recursive.com or find me on Twitter or the website or wherever. If you liked this episode, like really enjoyed it, then tell your co-workers about it. I've been trying to improve the quality of the episodes and hopefully it shows. If you liked this episode, like really enjoyed it, then tell your coworkers about it. I've been trying to improve the quality of the episodes and hopefully it shows. Thank you for listening.

Pet Camera - EBO Air 2

CoRecursive: Coding Stories - Story: Portal Abstractions with Sam Ritchie

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

CoRecursive: Coding Stories - Story: Portal Abstractions with Sam Ritchie

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.