Algorithms + Data Structures = Programs - Episode 192: Systems Programming & More with Kevlin Henney
Episode Date: July 26, 2024In this episode, Bryce chats with Kevlin Henney about systems programming and more.Link to Episode 192 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)TwitterADSP: The Po...dcastConor HoekstraBryce Adelstein LelbachAbout the GuestKevlin Henney is an independent consultant, speaker, writer and trainer. His software development interests are in programming, practice and people. He has been a columnist for various magazines and websites. He is the co-author of A Pattern Language for Distributed Computing and On Patterns and Pattern Languages, two volumes in the Pattern-Oriented Software Architecture series, and editor of 97 Things Every Programmer Should Know and co-editor of 97 Things Every Java Programmer Should Know.Show NotesDate Recorded: 2024-07-11Date Released: 2024-07-26Kevlin Henney ACCU 2024 TalkIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
Like I could I can dream and see on fire assembly.
Welcome to ADSP the podcast episode 192 recorded on July 11 2024. My name is Connor and today with
my co host Bryce we chat with Kevlin Henney in part three of our five-part series about systems programming and more.
Here's a fundamental question.
Sorry, the dog was throwing up this morning, which is why I was tasked with keeping her in my eyesight.
Ah, right.
I was just assuming, given that we're all on on teams here for those of you listening in podcast land and um and
so therefore i always thought it was compulsory when you are you know you're on teams you're on
zoom or whatever there has to be an animal and i just thought yes i thought that bryce was was okay
it's my turn i you know here is here is the uh the meeting dog. Usually I would do that for a few minutes.
But in this particular case, if I see the thing is she's so quiet.
If I didn't sit here on the bed with her, she would wander off and potentially throw up again.
And then I wouldn't know about it.
And then I would get in trouble with mom.
So you got to stay here with me.
I know you would like to go elsewhere, but sorry. Anyways, the fundamental question.
Fundamental question.
Does software get better with time?
Or sorry, do we get better at software over time?
Are things better now than they were and will they continue to get better?
I think that's actually more than one question, question interestingly although you're offering different perspectives on it i think um i think collectively we as individuals and as developers and development organizations i think we do get better i think there are
a number of things that are taken for granted now and things that people just do that they
never realized were a struggle that they never realized they that you know was a thing to worry
about in fact i in digging around i did a did a talk at um the acu conference um in april um
basically uh on data abstraction i wanted to kind of focus on the history of that but also to bring
it up today.
One of the things I looked at was Barbara Liskoff's,
I've been a fan of Barbara Liskoff's work for many years in terms of language design, the Clue language
and her original work on data abstraction.
But I'm nerdy like that.
So I already knew this stuff,
but I saw a couple of her talks and she said,
yeah, when she was given the Turing Award,
I think 2008, one of the um one of the things that uh one of
the critics online said is why is she getting an award for this everybody knows this stuff
and they know it because she did this you know prior to 1973 1974 data abstraction was not a thing
um and pretty much everybody that, you know,
although there was an idea of object orientation around,
her work on abstract data types had the greatest influence,
particularly on statically typed languages and the stuff that Bjarne later did
and some of the evolution of it.
And everybody just regards it as like, yeah, yeah, of course.
It's always been like this.
It's like, no, it hasn't.
Somebody had to do this and had to do this.
And so people are able to build on a lot of features in languages, in libraries, in ecosystems, but also in the way we disseminate knowledge.
I think, you know, we can kind of, okay, slightly egotistical from the point of view of the developer view.
So collective egotism is a thing.
A lot of things have happened as a result of developers wanting to communicate.
The internet, email. I mean, literally, it is developers trying to talk to other developers.
How do we communicate? Usenet, and therefore, ultimately, all forms of sharing of data.
Open source, the idea, oh, guess what? I've got a piece of code. I wonder if there's a standard way we could share that and others take advantage of
it. How can we share our knowledge? Well, tell you what, let's put this on a place that we can
all get to. It was FTP servers, and then it was World Wide Web. And then it was just like,
it's just that we're very good at solving some of these problems. Our ability to disseminate knowledge is actually really good.
So I think in the 2020s, we are definitely building on a lot of benefits that somebody from former eras would go like, wow, you know, you guys have really got it together.
But when they look closely, they might be also disappointed by, yeah, you're a bit better, but not as much better as we thought you could, given all of this. So I think that's the thing.
I think that we have a benefit, but also we are not in a position where we're making the best of
it. And then what's the statistic? I don't know. It varies. But half of all developers have been
developing for less than a year. It's something like that, you know. So the influx of people is huge.
And then trying to communicate this, if we were slightly older and stuffier, and that was the whole of the demographic, then actually we might have a different story.
But the point is that we've got a lot of people who are coming into development, and they don't have the backstory.
They don't have the history. They don't see all of this or understand why such a thing is a big deal
and and so therefore they are not in a position they're learning for the first time and so
therefore most developers are on a on on the earlier part of their learning curve and i think
that's the that might characterize our profession,
perhaps in a different way to many other professions
where there's much more of a steady state.
I think we're operating much more in a power law.
So that's what may prevent us from taking full advantage
of all the benefits that we actually have created.
But there are also all of that, all of this progress,
all this greater interconnectedness
it comes with downsides too um like like take one like security like in this new bold world
where we have all this open source code and everything's connected like one like when
everything's on the internet just like the internet itself like opened up like these attack surfaces to like almost all digital technology and and
things like uh things like uh hey baby it's okay somebody's at the door presumably um things like
open source and package managers have opened up supply chain attacks. That's like a whole new thing that
we have to think about that we didn't have to deal with before. Yeah. And I think that's an
important, that is an important thing because yeah. I'm going to have to, this may be my fridge,
which they were supposed to deliver a new fridge yesterday. And there's a whole story about why
we have to get another new fridge because the fridge that we had was a new fridge, but
I'll be right back. Talk amongst yourselves. So I'm going to pick up the point that Bryce
made there. That whole, I think that, and we've seen it on the people front as well.
You know, it's not, there's a social implication here as well. When you start
increasing connection, there's a whole lot of benefits. There's a lot of really good network
effects, but it's also an amplifier for other things and it creates new possibilities. And security is one
of them. And the language or the focus on security in modern development has shifted hugely, even in
just the last five years. And that's not going to change, but that comes as a consequence
of connecting everything together. But the assumptions that we have is really interesting.
So yeah, do you remember I mentioned I worked in the electricity industry? And that was in the
1990s. And one of the things that we were working on, SCADA systems, so supervisory control and data acquisition systems that sit in substations, they monitor stuff.
Every electricity network has these things.
OK, so you've got all of that.
And we were developing one of these.
And we were literally communicating between substations and kind of headquarters
using wet pieces of string.
It was kind of like modem die-up speed.
Wet pieces of string.
Wet pieces of string.
Modem die-up speed, okay?
So therefore, for us, having a compact representation
for our wire-level protocols was really important
because you don't have you know we we worried
about bandwidth in a way that um people now simply wouldn't understand um right and so we worried
about that that was our one of our main things it's like okay it's got to be really compact
and then the question of security came up and it's just like i remember pretty much the decision was
well we probably don't need to
worry much about security because nobody would be stupid enough to put their um uh their you know
their fundamental infrastructure uh onto the internet where it could be attacked um we're
already aware of worms and and other issues and it's just like yeah but you wouldn't put the
electricity grid online would you no not not so. You'd make sure that was it. Yeah. Now, that's charming from the modern perspective. But also,
it does betray the assumptions that we had because our driver was, well, the minute we start using
any security, then that's going to involve encryption. That's potentially, or that's extra stuff. If we were to use secure sockets, then actually we lose a whole load of bandwidth by doing that.
So, having made everything really, really compact, we'd immediately lose that and there's a whole
load of issues that we'd have with that that were performance related. Now, we took those decisions consciously.
And hindsight suggests, you know what? It turns out people do put this stuff on.
The Stuxnet worm was an example of the fact, yeah, you can disable another country's
nuclear power stations via the internet. Oops. Consequences of connectivity, that connectivity
went even further than people anticipated and
that you know there are all of these issues but it does mean everybody now has to worry about
security in a way that they were only paying lip service to even only you know five or ten years
ago um it's one of those things is like everybody's concerned your language has to be safe
you're you're you're top to top to bottom whole stack, we have to know what's going on. Because
the fact is that most of your software build is not yours. It comes from other places. Who knows
what's there? And we're seeing that that has changed the landscape of what people are wary of.
But it's also changed the way they build. And yeah, you're going to get one with the other.
But I think that is an important consequence. And we're going to get one with the other. But I think that is an important consequence.
And we're going to see that for pretty much any language.
Again, if I enter the language space today with a new language,
you can bet, you know, 20 years ago, nobody would have said,
so tell me about security and memory safety of your language.
It's not likely that would have been a big issue.
But now that's going to be one of the first five questions they're going to ask.
Hey, you've got a new programming language.
Tell me about this.
How do you do that?
So the priorities have shifted, which is obviously some languages in a better position to take advantage of that or to be able to say, yep, already sorted.
Other languages, oh, we have to do a bit of catching up. But that has now become such a concern that if you were creating a new language that you wanted
people to take seriously, you have to answer that question almost by the time you've read the first
half of the landing page for that language. You've got to have answered that question.
Well, it's interesting. Yeah, yeah, it's exactly the priorities and also the constraints have shifted. You know, if you think about it, in the 70s, the power company had had the money, they may not have been able to get
the sort of connection between substations that the programmers would have wanted.
Because the technology may not have existed for that.
Whereas, so we used to, computing started off in this era that was
very heavily resource constrained. Today, we are much more cost constrained. And that cost can be
either on the actual dollar cost for compute. Like, okay, we have an unlimited amount of compute.
We can get as much from AWS as we'd want,
but it'll cost you.
And the cost can also be on the power side.
Like, okay, you could do a whole lot of compute,
but if you do a whole lot of compute,
you're going to drain the power on your phone real fast.
And so it are like,
it's much more of an efficiency, more than resource constraint limits us. And then also,
we have this, you know, safety, security and reliability concern, that is much more of a
a priority than it than it used to be. And I mean, maybe it's that like all of
the things that we deal with today, like the cost and the, you know, energy
concerns and safety, maybe those would have been more of a priority back in the day,
but they just weren't because, you know, there were bigger problems like, you know, oh, we only,
we only have this much memory, you know, we only, we only have this much memory.
We only have this much bandwidth.
So, yeah, I think that kind of speaks to something else.
We're always going to hit some kind of constraint.
And I would also argue that power is itself a resource.
If anything, it's the original resource.
And I love the fact that the mobile phone, I've got a Samsung.
It's got better battery life than my previous Samsung.
And we're now actually almost still not quite at the levels that my Nokia in the mid-90s had.
In other words, you know, in the sense of being able to, how long can I go without recharging?
You know, we hit different boundaries at different times um and as you said memory was an issue in fact there's been some really interesting cases
where we've ended up with this curious mismatch and we saw it um with um the pc at one point
suddenly you've got all this memory but it's not addressable because it's a 16-bit system and you
have to do weird things and then we kind of got over that and then we hit in the 2000s we hit the issue of like well yeah i've got loads of memory but i've only got 32-bit
addressable um and uh and then it's a case of like well maybe actually it's cheaper to run things
across multiple machines um so you've got the kind of the hadoop kind of approach of like yeah let's
just break this break this computation up scatter it across the network um and then it's a case of people saying, well, actually, it's always faster if you can put it on the same machine.
We've got 64-bit addressing.
So I've worked with one team.
Their limit, they'd hit the 32-bit limit, even though their machine had clearly a lot more than 4 gigabytes. and so therefore we had to solve the problem with various optimizations of compacting you know
really you know messing up the c++ data structure to kind of like oh we're going to save a bit here
and a bit there so that when we multiply it up to large data so not big data but large data
it will still fit in memory but also the other option of one of the other systems they used was
like we have to run it across separate processes because that's the only other way we can take advantage of that you
hit 64 bits suddenly you can address it but now we also start hitting the other limits each time we
do something you hit the other limits and go back to connectivity um we hit light speed um as an
issue you know it's that it's one of those things that you's 20 milliseconds between London and New York.
So if you've got a trading system and somebody says, oh, we need to have New York and London in sync with respect to 10 milliseconds, you need a Nobel Prize because you are not going to break that limit.
This is one of those laws you don't get an option on.
You can't open up a config file and say, you know, I'm just going to change the speed of light today i think it should be faster well let's reboot the
system with a faster speed of light so we found and this the light limit is genuine in the sense
of um the systems we're working with it's um not i'm not just talking space travel and stuff like
that because that messes with a whole load of protocols and times at the timeouts you can't
have a tcpip connection to the probes on Mars because you'll
have timed out with those. But even if I've got something like a geosync satellite, that incurs
a round trip of over 100, 150 milliseconds. And so that's really slow. That's noticeably slow for
certain classes of system and even for phone calls. So we're always going to build a system and then we're going to
hit the limits. And as you say, there's power. We're hitting light speed limits on a number of
things that actually prove to be impractical. Yeah, it's not just on communication too.
It's also on how many transistors, how densely can I pack transistors? We're starting to reach
the point with process technology where there's just not a lot of headroom for us to make smaller transistors we
start to to reach a a scale we've started to reach a scale where uh just the physical limitation
yeah quantum effects are becoming it says just like no like this yeah you cannot build smaller
smaller transistors and that and that's the point's the point and therefore we have to
kind of like as it were
squeeze the toothpaste in a different direction
multi-core etc
and I think that
that's the point is every
era is going to hit its own limits
and regard something as normal and given
and
so I did
so I mentioned Hadoop earlier on the whole idea the map reduce
architecture the idea of pushing stuff out into the network just parallelizing the task yeah we
break it up into small tasks and do that and i i highlighted um something that um
something to somebody i was running a workshop um i was running a workshop for a company.
Where was it?
About 2017.
And I made an observation to somebody.
I said, okay, so we could break this up and look at this.
And then I said, look, somebody just wrote a script to solve this problem.
And they did it in memory.
And it was based on a blog in 2014. and they said they did it all in memory whereas historically they would have used a hadoop
solution and they'd have said okay we're going to use a compiled language and we're going to spread
it across the network and they said yeah they just use a scripting solution and because it was all in
memory it was it was over 200 times faster than the so-called fast optimal solution. And I said, that's simply
down to things like the speed of light and the cost of the network. And I had one person in the
workshop say, oh, that's because our architectures are better now. And I said, we know more, going
back to this idea. And I said, yeah, that's not a problem. I said, actually, you couldn't have done this solution 10 years in the past.
It was not available to you because you wouldn't have been able to address that memory.
It simply wasn't available to you.
But I said, and he said, yeah, we know that we're doing good architecture now.
And I said, I want you to write a letter to yourself.
Send yourself an email in 10 years time and tell you describe to your future self what
your current architecture is and then your future self will laugh and say oh you thought you had it
all worked out that's the problem is that it's not that we get some things are improving but
other things we're just chasing the horizon we're always going to be chasing the horizon the the the
shape the the problem space moves um and what one generation thinks, and so going back to the fact
that a lot of developers are coming into the industry now, they think there's a whole lot
of things that are normal. And in 10 years time, they're going to go, well, that's not normal
anymore. Or what I assumed was normal is now weird or antiquated. And I just dug out this
wonderful quote from Douglas Adams. And he said, i've come up with a set of rules that describe our reactions
to technologies um anything that is in the world when you're born is normal and ordinary and is
just a natural part of the way the world works anything that's invented between you uh when
you're 15 and 35 is new and exciting and revolutionary and you can probably get a
career in it anything invented after you're 35 is against the natural order of things um and and the point there is that this is also this is a
social thing and uh you know uh and causes uh uh older grumpy people to go around go oh in my day
you know just like oh this is unnatural this is on you this is new and fangled and it's not necessary
but being inside the technology space i think this is also quite interesting. It's probably on smaller timescales.
A lot of things that people regard as a given, somebody else had to fight for.
But also, in some time in future, you're going to have to recognize is either part of the furniture, very much given, or is actually no longer relevant.
Not because we've progressed, but because we've kind of moved up and sideways.
We're constantly moving up and sideways. If the problem space had remained fixed,
we'd always be moving up, but we're moving up and sideways. We're hitting new boundaries. We're
hitting new constraints. Every time we solve a particular problem, somebody creates software
in a different way. And I think that that, or we use software in a different way. Going back to
the beginning of this, where we talked about connectivity the the fact that we're also connected changes the the very software we're
trying to create but it also changes the problems we're experiencing um had we stuck in ourselves
in the early 90s kind of mode then actually that would all be solved and very stable but we didn't
we didn't just move up we moved sideways as as well. You know, not necessarily for CPUs yet,
but it'll come, every modern GPU is a multi-chip module. And by that, I mean, it is separate
little chips that are stuck together on some interproser. And so, everybody remembers,
you know, NUMA systems used to be all the rage back in the day in HPC before GPUs. You
know, you'd have a four socket, you know, an eight socket system connected with some, you know,
crazy interconnect. You'd have all of these, you know, you'd have this flat memory space
in your process. But in reality, there'd be some memory that some chips could access faster than
others. And that that's you know
rapidly becoming how we build each individual processor um now and so much of the engineering
into building a modern chip goes into the not the chips themselves but also this interconnect that connects each one of those individual modules on the interposer into a single thing.
And what we take for granted today of this sort of flat view of memory,
which has been an incredibly useful abstraction,
incredibly potent abstraction throughout the life of programming.
And we've had chips where it's broken down,
and just the utility of having a single view of memory
has been so great that the performance penalties
for that not actually being how the hardware works under the hood
has not been worth the complexity of having to deal with it.
Ten years from now, we may live in a very different world
where people who are doing systems programming
have to think a lot more about not just allocating memory,
but like when I allocate this memory memory where is this memory being allocated um you know it could be it could be the case that that uh 10 15 years from now the the way
we program systems will be very different than they are today yeah i think that that point about
systems programming is is is really key because that that idea the the illusion of a flat memory model,
a consistent memory model,
is baked into something that we find in C++.
And that's why C and C++ had such a difficult time
on segmented architectures historically.
Yes, yes.
There's kind of an implicit notion there that you have this.
And exactly as you say,
what we're doing is maintaining an illusion.
We're maintaining this very powerful but convenient abstraction. But that illusion has to be propped up on stage.
There's a whole load of stuff that happens behind it. And there's a cost for that.
But in practice, we're also looking at, I think, another idea that,
as you described, every single chip is a distributed system.
So we normally think of distributed systems,
we think we look out to the world and there are the distributed systems.
Actually, it's running right next to you.
You've got a distributed system.
And it's got this very different view of how memory is organized.
And that, I think, if you're not at the systems level,
then the illusion is okay. It's fine. But if you are at the systems level, this is the system.
And it challenges you with a lot of complexity. And it's having the languages or the paradigms
that will actually align with that and say, here's an easy way to think about it or work with this. And as you say, another decade could shift that at the lowest level.
It's definitely not standing still.
But at the higher level, people might not have to worry about it.
In other words, the upper level is actually potentially more stable in this respect.
Yeah.
You know, it's interesting.
I think we're all people who identify to some degree as systems programmers.
And it is, I think, to some degree, a nebulous and vague term.
But to me, systems programming is any form of programming in which you need, at least to some degree, to care about the underlying architecture of the platform. And that might
be operating system architecture, it might be hardware architecture. But it's a type of
programming where you need to think about the whole view of the system, not just application
logic. You need to think about what is it that I'm programming to? Yeah. How does this get organized?
When this is run, how is this organized at runtime
in terms of sequence of instructions,
the fact that pipelines matter and cache lines matter
and stuff like that?
And how is this organized in memory?
And then to start worrying about that.
And I think it's interesting from the C++ perspective is that C++, although we say it's a systems language and it allows us a lot of this access, it doesn't allow us a lot of this access.
In other words, we're up at L4 caching at the moment, I think, and 10 years time, who knows, L57 caching, who knows what my process is going to have.
But C++ has no opinion on this. It can't talk about that because it doesn't have an idea that
that's actually what's going on and how my memory is organized. So it can't speak to that. In other
words, it's written the system that C++ is built against, so to speak, is the flat memory model that C grew up under
with processes that really,
what you saw is what you got.
You know, you didn't have to worry
about all of this magic happening,
you know, this illusion being maintained.
And so C++ has yet to cut through that.
And it's very hard to do so.
We now talk about that as that's the hardware people
that worry about that.
We no longer think of it as a systems program,
but in one sense it is the system.
And we often worry about, you know,
the cache friendliness of our data structures.
And yet we can't talk about the cache in the language.
We have to step outside the language to say,
oh, caching matters, I profile it. But I can't talk about the cache in the language. We have to step outside the language to say, oh, caching matters, I profile it.
But I can't talk about the caching inside the language.
It's kind of like removed from it.
So, you know, we're systems programming, yet without enough access to the system, which I think is an interesting story on that subject, which is I was at Berkeley Lab back when, gosh, I can't remember the Corey.
Corey was the name of the supercomputer, was installed at Berkeley Labs.
And this was a Xeon Phi system.
And it was the second or third generation of Xeon Phi.
And this was the first one that was a standalone system.
So the Xeon Phi would boot up the OS and run the OS.
It wasn't an accelerator card.
And it was basically an x86 architecture, but with a lot more cores.
And this particular chip that we had in Cori, it had this HBM memory, this high bandwidth memory,
in addition to its sort of regular memory. And there were a few different modes for the HBM
memory. One mode was an explicitly programmable mode where you could explicitly allocate this
memory and use it for fast memory because it was higher bandwidth, although maybe a little bit lower latency than
other memory. And then there was another mode where it just acted as like another layer of caching.
And to switch between the modes, you need to get to like reboot the nodes. It was like a
boot time option. And there's this big prize in the HPC space. It's basically the Nobel Prize for the
HPC world called the Gordon Bell Prize. And to win a Gordon Bell Prize, you have to do some runs
on a big, big supercomputer. And so, when Cori was installed, it was a top 10 supercomputer.
And so, right after installation, there were a number of teams, I think it was eight teams, that got exclusive access to this supercomputer
for a period of time to do full-scale, full system runs on this 10,000 node cluster to
try to get results for a Gordon Bell submission.
So, these teams, these were the best of the best.
These were the ninjas.
They had a ton of developers.
These were folks who had been
spending two years prior to this machine being installed,
learning this architecture.
Like I can dream in Xeon Phi assembly.
You know, I knew this chip very well
and everybody else who was working
on one of these teams knew this very well. And yet, out of those eight teams, seven of the teams used this chip
in the cache mode. Some of those teams were like, it's not worth the effort
of us having to change our code to explicitly use this separate pool of memory.
We get a good enough performance
boost just using this as cache, just using this implicitly. And only one of the teams tried
to use it explicitly, but they ran into too many issues and too many quirks, and they
ended up just using it in the cache mode too. And it was such a smart move of Intel. They were like,
you know what, We're going to try
something new. We're going to have this explicitly programmable mode. But somebody at Intel was like,
hey, wait, hang on a second. We got to have a fallback just in case this is really hard for
people to use and they don't switch to using this thing. We got to have something so that this thing
will be usable. And boy, was that a smart decision, because I'm sure almost everybody who used that chip
used it in the implicit mode.
And it just speaks to how pervasive
the flat memory space abstraction is.
And hiding caches from users,
hiding differences in memory hierarchy from users,
so much work in hardware goes into that.
And it's worth it because it's very hard to program otherwise.
Be sure to check these show notes either in your podcast app
or at ADSPthepodcast.com for links to anything we mentioned in today's episode
as well as a link to a GitHub discussion
where you can leave thoughts, comments, and questions.
Thanks for listening. We hope you enjoyed and have a great day.
Low quality, high quantity.
That is the tagline of our podcast.
It's not the tagline.
Our tagline is chaos with sprinkles of information.