CoRecursive: Coding Stories - Story: Portal Abstractions with Sam Ritchie
Episode Date: April 17, 2020Buckle up, on today’s episode Adam interviews Sam about how the abstract algebra and probabilistic data structures helped solve fast versus big data issues that many are struggling with. Sam Ritch...ie is a machine learning researcher and a mechanical engineer by training. Stop in to hear Adam and Sam’s conversation about portal abstractions that let you leverage work from other fields. You cannot miss this episode! "And that's really all we want to do. Like, we want things where you can pause and wait a while and then load it back out and keep going." - Sam Ritchie "I'm aiming to implement these interfaces and pass these tests and then being able to immediately turn around and have like an approximate sliding window counter that would just work with stripes, like entire machine learning feature generation interface." - Sam Ritchie "I'm really passionate about and the reason this stuff's important is. You want to go mine the literature of what other people have done. You know you want to go be able to plug these things into your work and really just benefit from this incredible community that's been cranking for, you know, again, maybe hundreds of years." - Sam Ritchie Episode Page Episode Transcript Links: Sam's Blog Summing Bird Algebird Reinforcement Learning
Transcript
Discussion (0)
Several years ago, Twitter had this problem that may sound familiar.
The problem is big data versus fast data, or batch processing versus real-time stream processing.
Probably because of the scale that they operate at and the real-time nature of the Twitter feed,
they hit this problem earlier than the rest of us.
Batch is very efficient. You can calculate things on years of data, but doing the calculation might take a day. Real-time is much faster, but you can
sort of only work forward in time. Really, you need both these things. If I want to look at my
most liked tweet, I need to look at all the old batch data, but also any real-time tweets that
I'm doing. So it's 2013. Sam Ritchie, he's a mechanical
engineer by training. His uncle is Dennis Ritchie and he's working at Twitter and his job is just
translating from one system to the other, from real-time to a batch job. And there's just so
many jobs that looked like that. And it kind of looked like that was going to be my life,
like just coding these things. He just said, enough is enough. We need to figure out how to
write one piece of code that will run in real-time world and also in batch world.
I guess I've come at the problem from that side where it's like, oh, I can calculate this thing,
but I just started calculating it and the world existed before.
Yeah. The backfill problem is hard and it's hard like it just consumes your life writing, you know, backfill jobs. And you kind of, you stick to really simple things because
you know, you're gonna have to write them twice. You're gonna have to maintain them twice.
I mean, it's really not a nice way to live.
Hello, and welcome to Co-Recursive. I'm Adam Gordon-Bell. How did Sam get away from these
one-off jobs? How did they solve this fast versus big data issue?
An issue that many are still struggling with.
The answer isn't some new data processing system
that he's here to shill.
The answer is actually abstract algebra
and probabilistic data structures.
If you don't know what those are,
don't worry, we're gonna walk through it.
We're also gonna talk about
what Sam calls portal abstractions.
That's finding abstractions that let you leverage work from other fields.
But we'll do that at the ending.
Let's start at the beginning when Sam was working away at Twitter.
I was on the revenue team.
I had a colleague named Oscar Boykin, who I didn't know that well.
We both maintained one of these libraries I mentioned before that lived on top of Hadoop.
He had the
Scala version. I had this Clojure version. And kind of yet again, I had a task for work that
was like building one of these dashboards. You're trying to count something like tweets per user per
day. You're just basically grouping on some key and then adding numbers to a database.
And there's just so many jobs that looked like that. And it kind of looked like that was going
to be my life, like just coding these things. Through this or that,
Oscar and I teamed up to work on some shared piece of machinery for serialization between
the code we both maintained. And we realized that we both were doing this sort of thing.
And he's got a lower tolerance for BS. He just said, enough is enough. We need to figure out
how to write one piece of
code that will run in real-time world and also in batch world. This happens with compilers.
This is a compiler problem. Let's just back off and solve it.
So they solved it. They built version one of this open source analytics system. System could do
batch. It could do real-time, sum the results together. They called it summing bird.
It's a pun, you know, it sums things and it's from Twitter.
So it's a bird, summing bird.
But then they had some interesting revelations about what they had created.
You know, if you really simplify what we're dealing with here,
you're writing code that is generating for some key tweets per user per hour,
something that happens.
It might just be tweets per user.
It might be how many users we have. And then you have some value. You have some counter that you're ticking. So you're just incrementing this thing up. And so many machine learning features and dashboards,
everything is just like ticking counters. That's like the secret of analytics work,
is you're just adding ones. That seems suspicious too. That's something you can maybe break out of.
Like, really? Is that all we can do?
Like how would you do more complicated things?
But so this software package, Summingbird,
was a library that let you write a logical declaration
of what you wanted to happen.
And then the second component of it
would take that data structure that you've built
and go run it on any number
of these different underlying platforms.
So I can power a dashboard.
I can do backfills.
I have this boundary that's maintained transparently to me that behind the scenes will give a hard
line between the massive multi-year database and then the last couple hours that are stored
in some much harder to manage, much more fragile, but very, very fast online processing system.
And phase one was just write something that can do... You know logically you're doing the same
thing. You've been rewriting the same code. Just have the machine write it for you.
But it does open this door. And this is the topic I wanted to get to. It opens the door.
You start to look at this thing and think... You know, you have these two buckets. One is like all, all time before a couple hours ago. And then you have a bucket for each of
the recent hours. So you're doing this addition of some of numbers, but you're sort of putting
parentheses in one case around like years and years of data. And then another set of parentheses
around like each of the previous few hours.
And then you take this final step of adding them together.
If you're calculating how many times I've tweeted, like Hadoop gets like everything
up to yesterday.
And then I'm adding that to something that's actually getting the real-time data and just
counting.
That's right.
Those are nice ideas, but they're not that, they're sort of obvious and not that
powerful. But this idea of adding things and putting parentheses wherever you want
also seems kind of innocuous. It's like this little lamp you pick up, right? Like it looks
kind of, you know, you rub the lamp and what you find is like, this idea is not, it's not your new
idea. It's very simple, but it actually exists in this field of abstract algebra.
All right.
Abstract algebra, if you're not familiar with it, it's a subfield in math.
Stick with me here.
I'll explain a little bit.
A lot of concepts from abstract algebra can be implemented in a programming language.
Semigroup is one of these.
It's an interface.
It has one method.
That method is add or sum. If we an interface, it has one method, that method is add,
or sum if we want to stick with the summing bird pun. It also has a rule that things have to be
associative. You might remember associativity from a high school math class. What happened here is
SAM had the system where to calculate analytics, calculation had to implement a certain interface,
and then the system could run it in both worlds.
The interface had an add method, which meant it was a semi-group, which meant this subfield of math with people talking about seemingly obscure constructs could suddenly enable him to answer
interesting questions in his real-time analytics dashboard thing at Twitter.
So yeah, you can start slimming things back. And when you've slimmed it back almost to nothing,
you're left with this object called a semi-group, which is an idea of, okay, I need some set of things. So numbers in what we've been doing, which is just counting.
Some way to quote, add two of them together and get something out that's like still the same type.
And then a test that goes along with that. So the test is that I have to be able to do that associatively.
So this is kind of, it sort of sounds like a pedantic mathy thing.
Like there's this impulse that I'm sure we can get into in the functional programming
world to like, like see ideas that seem mathy and like slap math names on them and just
tweet about it and start like spraying that.
And that, that sucks.
But I think it's the, the, the thing
you get when you do that, when you identify this like mathematical concept is it's not your,
because it wasn't your core idea. People have been thinking about this. It turns out for a long time,
you have potentially hundreds of years of work that have gone into answering questions of what
can I do with types that are able to implement a plus method
and then a single test of associatively calling plus. That is a very, very tiny interface to
satisfy, but there's a huge amount of work on all the things you can, one, all the things you can do
just relying on those two properties and two, like a zoo of data structures that all hold those properties.
And yeah, so by backing out what we built and making it not just about numbers, but
about this thing in Scala, you might say, I want a type where I can implement a type
class called semi-group for that type.
You know, if you just make that one change, suddenly you've kind of opened this portal into this portal into, again, this whole zoo of data structures.
Anything that matches this tiny contract will fit into your model.
That's true of any abstraction, but this one was special because when you turn and look at the computer science literature, you find things I never would have thought about or never expected to work in the
context of an ad dashboard job. Suddenly, we were able to plug into this thing.
So the work became less about how do we go manage real-time and batch and this kind of boring
suit and tie stuff to, oh my God, we've gone into into Narnia and suddenly there's these like approximate,
you know, sketching data structures where I can maintain, I can, I can see, I can feed items into
it and it'll give me account for the unique number of items seen. And it can do that up to billions
and billions and trillions of items. And it just won't get any bigger. Like it doesn't actually
have to store them. Like how the fuck does that work? That's a different question.
All you know is that you can take two of these things, add them together.
It works associatively.
And so they suddenly become candidates for running on years and years of tweet data or
Twitter data or any large-scale data set.
And the results you know will make sense, will not have any errors, and will be real-time updatable.
So you went out to solve this problem of real-time analytics, and your solution is a semi-group.
That doesn't seem obvious to me, I guess, that that's a solution to the problem of analytics.
Yeah, that's a semi-group and then the monoid.
I mean, it doesn't stop at the semi-group.
Yeah, it doesn't seem obvious that that's the problem to, that that's the solution to analytics,
but it is. What started to happen was you start to realize that, okay, well, it's not the solution
to analytics per se. What it is though, is, you know, adding things together associatively,
this seems to be the key
that unlocks being able to store data in multiple places and merge together, you know, your results
when you want. So being able to distribute in space or time is tied somehow intimately to
the associative property that we know from elementary school. Like that's kind of odd.
And why is it, why do I say it's like intimately tied? Just because if you ask what you need to the associative property that we know from elementary school. That's kind of odd. And
why do I say it's intimately tied? Just because if you ask what you need to go do that,
it's really just this one simple property. Let's do a tiny recap. So if we want to run
calculations in real time and in batch, we need a common interface. That interface turns out to
be semigroup from abstract algebra. And that interface is important whenever you need to distribute calculations. All right, next up in solving analytics is how
you deal with data that may be missing. We're going to change our interface to handle that,
which will bring us to another interesting concept. There's other properties like
the idea of a missing value. If you're querying multiple databases and data might be missing, like how do you deal with that? Well, okay. You probably need some notion of like,
you know, a zero. So if I'm adding numbers, like adding zero doesn't do anything. That's fine.
If I have like lists of tweets, I'm merging together. I can have all these nil checks
or check for none or have optional types. Or I can know that for a list, if I,
if I concatenate an empty
list, nothing happens. So my code becomes simpler now because I've got this idea of an empty data,
an empty element of the type. So you can start to see that a lot of data types have this idea,
like a set has this idea. Numbers, of course, have this idea. For multiplication, you kind of
have this idea, or you do.
It's just one instead of zero, so that's kind of odd.
But in fact, there's another thing called a monoid, which again has this in-your-face
name, but it's just the same as the semi-group, but added on is this extra method you implement
called identity, which is giving back a thing that if I pass it into plus with something
else, it just won't do anything. So again, a very, very simple idea, but what you get out of it
is suddenly the ability to handle missing data. And that comes up all the time in dashboards.
How do you represent data that's not there yet? All right. So we have semi-groups so that we can
distribute work. We add one more method to our interface and we get monoids so that we can distribute work. We add one more method to our interface and we get monoids
so that we can handle the absence of data.
This is still a very small interface
to enable distributed computing.
It's a very well-defined interface,
but I really think of it like
it is like a portal
into some interdimensional transport system.
I used to think when I was thinking about
how do I become creative?
What do I want to do in software? And you want to make original things that no one's done before. You want to think when I was thinking about how do I become creative? What
do I want to do in software? And you want to make original things that no one's done before,
right? You want to make these things, like crack the door open on something no one's thought about.
But I think more lately that that, I mean, that's kind of lonely. If you do that,
you actually succeed and make some like 2001 obelisk and that's exciting but it's it contrasts that with if i managed to build like
a transporter gateway from like star trek right and you look through and like you weren't creative
at all like you just made a thing that like million you know millions of other galactic
civilizations have made before like that's good now you get to plug into the network
like coming up with abstractions like you know figuring endpoint to a website or the packet format required to talk to the web, that's what picking an interface out of some incredibly well-trod field like abstract algebra does for you. Okay, this metaphor is great, but I think it needs a
little explanation. The monolith is from 2001 A Space Odyssey. I don't think anyone understands
it, but it's powerful. This is like coming up with your own unique solution to a problem,
a solution that no one has thought of. But the transporter gateway from Star Trek or an HTTP
interface is less creative.
Perhaps you're implementing something that someone else already built.
It's not really a new discovery,
but you get to draw on all the existing solutions that exist for that interface.
You transform your unique problem into a known type of problem where known solutions exist.
This is what Sam is calling a portal interface.
What was on the other side of this portal behind your, your ad function?
I mean, we, what, what we got, we built a library of all the things we found. The library is called
Algebra. You know, very concretely we got, I mean, the thing I'd never seen before was this whole zoo
of data structures that the core idea is that if you don't really care about your exact
value that you're accumulating. So for numbers, maybe I want to counter, but I don't really care
that it's exact. I'm happy with 0.1% error, maybe a hundredth or a thousandth of a percent.
It turns out there's this whole field of research on data structures like this,
where if you can give up a little bit of error or a little
bit of accuracy, you can get often two orders of magnitude. You can get 100X space savings
on this thing. And that's so outrageous that, I mean, that took me a while to even understand
what the hell was going on. So why does the amount of space matter? I can make a monoid that just adds every Twitter
user together. Okay. This is a great point. It's not even that it doesn't have anything to say
about what you can add. It's that you can plug things in that will just shatter the system.
So this example you gave, if I'm trying to go, say, I just want to keep lists of everybody's
tweets. I decide to group on a user. And every time a tweet comes
in, I make a list with the single tweet in it. That's my thing. How do you add lists? You just
concatenate them together. No problem. So what you find is that most of your database is empty
because most people don't tweet the tweets they're putting out anyway. And then some people just have
these huge amounts of tweets they're pumping out. I mean, some are bots that are just hammering out tweets every couple of minutes.
And so you get these incredibly skewed keys in your database. Some of the values are just
getting bigger and bigger and bigger. And there's nothing in your system that has limited this from
happening. So when you're running some system that sometimes is fetching
nothing, the default value, and sometimes it's fetching like dozens of megabytes of,
you know, tweets and then filtering on them. Like this is in some sense, orthogonal from
your original problem. Like that's totally logically fine to do. It still fits the interface
of the totally fits the interface and it'll fit the database for a while,
but it's not everything you meant. There's some problem there. And the problem is that in almost
all these systems, definitely at Twitter, there's just skewed keys everywhere. Somebody's got the
most followers. And so when they tweet, you've got to fan it out to everybody. And that just
hammers the system. Whereas maybe when I tweet, no big deal. The system doesn't notice. Okay. So why would you accept like an accuracy
loss? Like, yeah, I want the total result. Like I want the full thing. I want to know how many
followers I have. I don't want to know how many followers I have, like plus or minus 1%.
Maybe not though. So it turns out if you can, well, the problem you're
trying to solve is like, how can I track counters and deaden the effect of these massive explosions
of a particular key value pair? You get it for free with something like a counter,
because people have done a lot of work to make sure that, okay, all our numbers,
like up to some massive amount are going to use the same amount of bits. Yeah. If it's a long or something,
it can only get so big. And if you want to double its size, like just add a bit, no problem.
Why do we just count numbers? Like it's easy. Well, why is it easy? Like, well,
a lot of problems are solved for you just because of the architecture you're inheriting about how
numbers are represented. Like if numbers actually took a ton more bits, if we hadn't figured out like how to write things in binary.
Counting would be harder. Yeah.
Yeah. So counting lists, like adding lists is pretty hard or sets. Let's have that example.
If I want the set of how many followers I have, how many unique people have seen my tweet today?
Well, what's one, how would you implement that?
I just add them to the set and then I can combine sets by just getting rid of the, like doing a distinct. Yep, exactly. You
know, if everybody had roughly the same number and it was small of people that saw their tweets,
but sometimes, you know, there's just huge amounts. So the distinct set of people that have seen
your tweet is just massively larger than, than the average. So you get this massive set in memory
and you're serializing and deserializing it every time.
And there's two ways you can go.
One is you can start to build in these special cases
into your system where the abstraction starts to leak
and you say, well, I can't really tolerate this.
So it's not just a type with a semi-group,
it's like this other thing and there's more constraints. That's fine. If you can accept a little bit of error,
like if I don't really care if my count of people that have seen my tweet is off by 10,
which honestly I don't like, I mean, in that example, like data gets dropped all the time.
Like if you see, if you hit like on my tweet and then your phone's offline, like
there's already error just built into the universe.
So if I accept that and I just live with it, I can reach for a data structure like, here's
the buzzword.
There's this thing called the hyper log log, where if you allocate this thing, some very
small amount of memory, you can get something like 99.9% accuracy on account of how many unique
things you've dumped into this. So it's an approximate set. You add things to it,
or you sort of put things into the set, and then you can ask it the question,
how many unique things have I seen before? And it'll tell you, and it'll be almost right,
and it won't get any bigger. It doesn't seem like it should be possible.
It doesn't seem like it should be possible. And if you try, if you thought of that idea,
when you were working on your analytics system and you said, yeah, it'd be really nice if I
could just like count this thing and like not have the set grow at all. Like you're not going
to go take a few months and go off and figure that out. Yeah. It just sounds impossible, but somebody figured it out.
And then somebody, maybe the same person, but somebody figured out that, oh, if I have
two of these things, I can add them together.
So I can track, you know, I can track like users for a few hours.
I can track my distinct counts.
And then if I have another set that represents like stuff I've seen before,
you know, I can merge or at a later time, I can merge those two together.
And the result of the merge set will also satisfy the properties that I had with either of the two side ones.
And then we can distribute it.
Yes.
Yeah.
Then you can stop and you can save.
You can like save your state and then you can load it up again later and keep going.
And that's really all we want to do.
We want things where you can pause and wait a while
and then load it back out and keep going.
And yeah, these approximate data structures
get you that ability.
If they have that ability,
then you can plug them into a system like Summingbird
that's running these massive analytics jobs and things will just work. And you'll solve again, your system's problem
of heavily skewed key distributions that will just go away the same way it does when you use counts.
All right. So we have our simple interface for real and batch, and it turned out that it already
existed in abstract algebra. It was the monoid or the semigroup.
We found this portal abstraction.
We rated the research papers
and found probabilistic data structures
like the hyperloglog that were monoids
and run in fixed space.
But I wanted to ask Sam about this pet topic of mine.
Do names for math help or hinder adoption in software?
I just imagine you, you know,
standing up and being like,
HyperLogLog is a semi-group and everybody,
nobody knows what the hell you're talking about.
But you're like, no, this is important.
I absolutely have the reaction that you're saying.
Like at first I was kind of like, I had to write this job.
Fine, we can do it this way.
But then it just started to get like more and more clear that we'd gone down some rabbit hole that was actually not just abstraction for abstraction's sake. I had a few experiences of
going out and finding papers that, again, implemented these. There was an approximate
sliding window counter. Would I have found the paper? No. Would I have taken the time to implement
it? No, absolutely not. But aiming to implement these interfaces and pass these tests and then being
able to immediately turn around and have like an approximate sliding window counter that would
just work with Stripe's like entire machine learning feature generation interface. Like I
could take this thing, put it in the cupboard, write a nice doc string for it, like write a little pitch for why you might want to use it. And it would just work. There's no
sort of, that doesn't look like it would work in an analytics system. Like that just goes out the
window. It just will, you know, we've got the test to prove it and, uh, you know, pull it off,
see, see what you can think of. Yeah. Yeah. Like it seems so non-obvious to me and i don't i don't really live in this
world so maybe it's not non-obvious but um yeah i don't know i hear people talk about like
fast data and big data and pipelines i never hear anybody say like hey if you can make something a
monoid then you can like calculate it either in batch or in real time and you can combine it and
all you need to do is meet this
interface and that's it yeah well you heard it here no i look i'm i'm with you and i think so
i listened to your podcast with uh dhh and he was talking about ruby and you know when he first
picked up you know ruby like this this emotional sense he had and that really got me thinking about like why is it
that this idea is not more out there i mean it's not a tough idea in that if you if you didn't
need it to yeah if you just write the test down and you encountered it you wouldn't find it to be
you could do that code review no problem but there's this like aesthetic sense with certain
abstractions and there's something
about like pulling abstractions from math that sounds i don't i don't know i mean i'd love to
hear you're in the functional programming world like functional programming has this bad rap of
just you know like it's all about category theory we need to shove functors and monoids and and
monads and and you know if you don't get it like here's this category theory. We need to shove functors and monoids and monads. And if you don't get it, like here's this category theory textbook, you go figure it
out.
What we were trying to advertise was here are the names of these things and the names
themselves are important because you're going to find these names when you go on the hunt
for stuff you can plug in.
Right.
If you call it addable, you have this problem of, okay, what do you solve?
You make it more comfortable. And if I have a preexisting library of things that I can plug in,
like this is great. This, I can look at the name addable in the function slot, the parameter,
you know, type, I can go look at the library and I know what can fit into what.
But what you lose is this sense that you're plugging into this larger
you know this mind like that you can go down and find new things and so for someone who's actually
looking to like expand the range you you know i think it would be not wise to change the name to
something more comfortable because what you might do and here's something that happens you know i
might pick up or adam you might pick up like this thing and you might go,
well, okay, I'm going to make a new data type.
Like I have addable.
That looks pretty easy.
It's got a plus method on it.
And I can implement my thing.
You know, I don't pass the associative test, but like that doesn't really matter.
I'm still an addable.
You know, I can still have him at plus.
So I'll make my thing work. And like, I'm just going I'm still in addable. You know, I can still have him at plus. So I'll make my thing work.
And like, I'm just going to ignore the tests and no problem.
I just won't implement that test for me.
But like now you're in dangerous territory.
It was so tight.
It was such a poetic little interface that when you ditch one of the two lines, like
you're totally off the map now.
But I, it really is tied. I think to this idea of like
the aesthetics of an abstraction, like there's an aesthetic response you have to some people
have an aesthetic response to these mathematical abstractions and go like,
holy shit, like this, I'm plugging into something big. And this is a, I'm so happy this post was
here. I have no intimidation at all. And some people go, I kind of remember getting my ass kicked in eighth grade, like in algebra, you know, like, is it really that again? I think, yeah, there's
cases where people are like, maybe like overly extraneously using terminology, but here it's
like, it's actually the key to running things. It is paying weight, I guess, in actual business
use cases. That's right. I mean, a thing I'm really
passionate about and the reason this stuff's important is you want to go mine the literature
of what other people have done. You want to go be able to plug these things into your work and
really just benefit from this incredible community that's been cranking for, again,
maybe hundreds of years. But then you're turning around and you're presenting
this aesthetic thing. And yes, it matters like what the references are to the past,
but it also needs to kind of present itself, uh, you know, as its own thing to use. Like you should
ideally like good design is about giving people an on-ramp at every level of engagement they want.
Yeah. You know, like experts only is like fine.
But if you're trying to build something that's accessible across the, you know, the entire
range of experience and like you find yourself confused about why monoid and semi-group and
field are not like doing it for people.
I think there's, there's more we need to learn there about how to go use these
incredible minds of abstraction resource in modern code. I think this makes sense. These concepts are
super valuable and these concepts already have names. Maybe the names aren't the problem here.
Now I know how semi-groups can model distributed calculations, how hyper log log can give me fixed
overhead.
But how do I find these abstractions on my own? Like, how do I repeat this trick and find my own
portal as Sam calls it? Like if I, if I go through and I extract some interface for everything that
has a name, like, you know, a dog is a name, a car is a name. And like, how do I know if that's a valuable thing or just me,
yeah, wasting time? Yeah, it's hard. That's the thing we all deal with as programmers.
Like, how do you know? I was thinking this the other day on a walk that, like, I wonder if
conspiracy theorists would be like great software developers, would just be so sensitive to like
abstraction and, you know,
you're seeing patterns everywhere. Like there's probably some dial in our brains that cranks up
or down and it's hard. I don't think an abstraction can like tell you like, you know, what you just
described, like extracting name for everything. Like maybe it's good, maybe not. You need a
thousand examples that you look at and go like, I think I've got something really powerful here. And if that gets you excited, you should do that. But if you simply,
if you want to make your search process faster, then there are these other fields where people
have been thinking that way for a while. So there's this great talk about like Richard
Feynman. yeah yeah totally tell the
story though tell the story i think i know what you're doing so richard feinman like collected
all these problems over the course of his life um and he said like that was the secret to him being
so successful is like he had all these problems and then whenever somebody mentioned some new
solution like he would just go through his list of problems and see like if it solved them all, which I guess is kind of what you're
talking about, right? It's like, will Monoids solve this? Try it on. Maybe it's a horrible fit.
Wow. I love that. That's great. Yeah. That's a brilliant Feynman story. It's like, yeah,
he says he'll get a click sometimes and go, ah, here's the connection. People go, how did he do that? Well, most, you just don't tell anyone when, when you don't get a hit. Yeah. Yeah. I love that.
Yeah, absolutely. You have this, you have a solution, you have some interface. If you,
if you learn about some abstraction that seems powerful in another field, like go backwards,
say, does this apply to what I'm doing? Is there, you know, forget if it seems natural or obvious, but like, what would it mean if I forced it in? What does this all say about
the future of software development? How should we think about this idea of importing these portal
concepts? Yeah, I think that the clue I get from this is that, I mean, you're trying to solve
interesting problems. You're trying to solve interesting problems.
You're trying to go expand the range of what is possible for you to build.
If you buy this idea that these things just kind of lead toward greater complexity and interest, there's always more to learn.
There's always more to do.
One way to make progress is to go make new artifacts like new new examples new kind of works of art
almost we're trying to build these these things like spun out of our thought and you know that is
that's really powerful that's really like what it's all about but in fact there are other fields
that have been obsessed with this idea of, you know, structure and relationships between things.
And, you know, physics is one, math is another.
I think that all of these are just these incredible cupboards we can raid of ideas that most of which were invented before, like the modern software era.
You know, one way to move forward is to really use the hundreds and hundreds of years of work
that have already been done
to give ourselves hints about, you know,
we effectively have like an alien civilization
that we can raid.
And that's like our own work before the 60s
when structured programming just became a thing.
So I think to go forward,
like there's always
going to be new discoveries to be made, but one very, very fruitful thing to do is to turn around,
look back and find these things and say, well, is there an interface I could discover that
someone's already found that would let me just plug into this incredible, almost battery of human creativity that just exists waiting for the taking in maybe dusty old papers and books, but it's there. No
one's hiding it. We started with fast data versus big data. We hit abstract algebra and probabilistic
data structures, but these were all just examples for Sam's idea of finding another field that's
already solved your problem and pulling in those ideas. Sam is actually working on this right now
in his latest side project. He's looking for more of these portal interfaces into math.
So I've got a project that I'm about to... I spent three or four months in this before my
current job and I'm about to restart it. But it's this like re-implementation of a lot of the
core reinforcement learning algorithms, but using like totally hardcore functional programming style.
So it's like you're pulling the same trick or you're attempting-
Trying. Yeah. Yeah. To see if I can be like a one trick pony, but from,
you know, like with my math trick.
I don't mean it in a dismissive way. I mean-
No, I'm on purpose doing it to test this theory
we're talking about, about like,
if you're a one trick pony,
but like your trick is like opening the portal,
like I just keep doing that.
Very cool, sir.
Well, good luck surviving the-
Yeah, this is all assuming.
Assuming we survive. Yeah, you too, Adam. Good luck with this, man.. Assuming we survive.
Yeah, you too, Adam.
Good luck with this, man.
So that was the show.
If you have an interesting story of a solution to a problem like Sam's, let me know.
It doesn't have to involve math.
Adam at co-recursive.com or find me on Twitter or the website or wherever.
If you liked this episode, like really enjoyed it, then tell your co-workers about it.
I've been trying to improve the quality of the episodes and hopefully it shows. If you liked this episode, like really enjoyed it, then tell your coworkers about it.
I've been trying to improve the quality of the episodes and hopefully it shows.
Thank you for listening.