Algorithms + Data Structures = Programs - Episode 80: C++ Multidimensional Arrays and GPUs
Episode Date: June 3, 2022In this episode, Bryce and Conor talk about C++ multidimensional iterators, mdspan, GPUs and more!TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow NotesDate Recorded: 2022-05-26Date R...eleased: 2022-06-03ARRAY 2022PLDI 2022EURO-PAR 2022Phoenix ParkC++ std::mdspanBryce’s index_iterator.hppC++ std::mdarrayLeading Axis TheoryArray Cast Episode 28: Rank and Leading Axis TheoryThe Programming Language PodcastThread block (CUDA programming)Using CUDA Warp-Level PrimitivesCo-dfnsArrayFireBQNIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
So wait, let me ask a potentially a question that will show my ignorance about how GPUs work.
Welcome to ADSP the podcast episode 80 recorded on May 26, 2022. My name is Connor and today with
my co-host Bryce, we talk about C++ multidimensional iterators, MD-SPAN, MD-ARRAY, GPUs, and more.
Then, sir, we shall continue.
All right, what are we talking about?
You're keynoting at array.
Breaking news.
As of when?
24 hours ago?
36 hours ago?
Yeah.
Bryce will be keynoting at the Array 2022 conference taking place in San Diego on June 13th, 2022.
Why do I know the exact dates of this conference of which Bryce is keynoting at?
Because I will also be presenting.
Not keynoting though. Because I actually did the heavy lifting of writing a paper, getting it accepted, and therefore now I get to present.
I just got an email from people and I was like, sure.
And then I booked some flights.
This is the difference.
Remember that one time when I was like, you know, you'll go on to be CEO of some company one day and I'll do things.
This is the difference between you and me bryce
you get invited to do i'll be i'll be keeping on that i'm going to be
keynoting at a workshop at europar in august europar yeah it's a hbc conference nice nice
where's that taking place in scotland glasgow? Well, I was about to do a Scottish accent, but I decided not to.
That's probably.
When we, the last time that Connor and I were in Ireland, in Northern Ireland.
Visiting my sister, who is now recently married.
Congratulations, Kieran.
But your sister is not in Northern Ireland.
But yes, congratulations.
Right.
Not Northern Ireland.
But I mean, we did go down to Dublin.
We took a bus down to Dublin.
And Connor, who is what percentage Irish are you?
We'll round up and say 50%.
50% Irish.
Connor, who is some percentage Irish.
If you go by the number of names in my name,
Connor with one N, Patrick,
those are like two of the most Irish names,
and then my last name is dutch hookstra so i'm 67 on irish by name by name dear listener you may be beginning to see the problem here i also love mashed potatoes my favorite food in
the world potatoes are the best connor what seemed to be under the impression that he was a lot more irish than he actually was and so we got to dublin and
connor connor during most of this trip had been rolling out his irish accent in northern ireland
um and uh and i kept telling him dude you're gonna're going to get us killed. You are going to get us killed.
And then we meet up with Connor's sister, Kieran, and it's Jack.
Yeah, Jack.
Good memory.
And her boyfriend, now husband, Jack.
And Connor, like, does the accent one more time.
And I ask Jack, Jack, is it a good idea for him?
Isn't it a bad idea for him to be doing that?
Jack is like, yeah.
And Connor retired the accent for the rest of the trip.
Actually, the picture of us that at least used to be on the ADSP landing page. It's still there.
It's still there.
50% is still there.
It is from that Ireland trip.
Yeah.
We were in Phoenix Park with the deer.
Yes.
Yeah.
We were very excited about the deer.
And I seem to recall both of us, our shoes got quite soaking wet, but I wisely had brought
a spare pair of socks because I knew that my shoes were susceptible to becoming wet.
We went on a day trip that's
like a single day trip to dublin and bryce bought brought not bought brought with him an extra pair
of socks this is this is the guy that's what this is who we're dealing i knew it was going to be
rainy and i knew that my shoes had like if there was any amount of water, my shoes were going to get inundated.
And Connor did not.
And Connor had a less than pleasant bus ride back to Northern Ireland.
It's true.
It's true.
My feet were pretty cold the rest of the day.
When was that?
This was November of 2019, about a few months just before the pandemic. The last time that we had crazy travel plans. See, I didn't think that it was ever going to go back to, you know, going abroad for three or four weeks at a time and doing multiple trips back to back.
But I just booked a, I just finished making my London, Tokyo, Toronto travel arrangements.
Oh, yeah. in tokyo toronto travel arrangements oh yeah i'm not actually flying directly to toronto because
i'm sure toronto is a lovely and big city but there's just not a lot of flights
between tokyo and toronto i mean it is it is halfway across the world so instead i'm flying
to a real city new york and then then I'm flying to Toronto from New York.
I mean, many people say that Toronto is the New York of Canada.
So I, you know.
Right.
But that's exactly my point, Connor.
New York doesn't have to be the New York of anywhere because it's just New York. If your city has to be the X of something else, then just saying in the pyramid of cities, you're not at the very top that's true
i mean new york's got to be you know of the iconic cities of the world it's either number one or top
three no doubt um yeah yeah and i see that i say that even acknowledging that i'm a Westerner that lives in, like even taking into account, you know, Beijing and other cities.
New York's up there.
So what are you gonna be talking about at this keynote?
I will be talking about C++ standard parallelism.
I think I will, given the conference is called Array,
I'll probably be talking a good bit
about things like MD- span um yeah and um and well if i had time i would talk more about
the problem multi-dimensional iterators in uh in c plus plus um how long how long is your keynote
is it uh it's an hour damn look at that i haven't you get three
times as much talking time as me you got 20 minutes oh because i think i'm actually not
entirely sure because i haven't registered for the conference yet and it's a couple weeks away
so i should do that but uh i'm pretty sure academic papers at least in years past they
only get 20 minute presentation slots oh yeah yeah yeah i wouldn't be surprised so yeah i mean it is more of an academic conference yeah for those that you don't know array is a conference
that focuses on array languages and multi-dimensional libraries and the such so and
it's co-located with pldi which is a programming language design and implementation conference that
happens every year so multi-dimensional iterators.
People who take a look at MD-SPAN that's going to C++23, God willing, might notice that it
doesn't have any iterators, which is a bit unfortunate because, you know, we're C++
programmers. We like to be able to take our
data structures and data structure-like things and be able to sort them in, well, maybe sort
and makes less sense in this context, but do things like count them or do reductions over them or
find things in them. And for that, we would get iterators or ranges
to those things, and then plug those iterators ranges
into the C++ standard algorithms.
But you can't do that with MD-SPAN
because it doesn't give you iterators.
And the question that's probably in your mind is why not?
Well, it comes down to optimization.
We can write multidimensional iterators.
And there's sort of two approaches to how you can do it.
And neither of them is particularly palatable.
And both have performance problems.
So the first way to write multidimensional iterator, and we're going to use the 2D case,
the 2D matrix case as an example.
So the first way to write a multidimensional iterator is to collapse the indexing.
So you sort of end up losing information here.
You don't keep track of both i and j.
You just keep like one,
like where am I in the sequence number?
And if you ever need to like recover the i and j indices, you do modulo stuff using the size of the extents of the dimensions to figure out the coordinates of your point.
Now, this assumes, of course, that all the elements are in some contiguous storage and that you can just iterate through. So this approach is nice because your element-wise iteration
using your iterator is nice and fast.
You just do an increment every time.
It's just like a pointer, basically,
or a stood vector iterator.
But any time that you need to recover the indices, you have to do some expensive modulo math.
And that can be slow.
That can become divisions, which are notoriously a slow operation.
So we don't generally like that approach. Although if we wanted to just have
iterators solely for the purpose of doing element-wise operations like reductions or fines
or stuff like that, that would probably be a pretty good approach. The other approach
is to do a multidimensional iterator that actually keeps track of the indices.
So instead of it just iterating through the underlying data,
it maintains the indices and updates them.
Now, the problem with this is, and I'll actually show you,
because back in the day when I worked actually show you, because back in the day,
when I worked at Berkeley Lab, I spent a substantial amount of time doing research on this general problem space.
So the problem is, when you're doing this,
you are basically embedding one for loop inside of your multiple for loops potentially inside of
your iterator. So I'm going to show you an example of what I mean here.
I'm going to share my screen. I'm not going to show you the listener. I'm going to show
Connor the co-host. I was thinking thinking i'm sure the listener understands that when you
say you you mean me not the listener because this is a podcast and yeah they're not seeing anything
although we'll this looks like a open source on your page so we'll we'll include a link to index
underscore iterator dot hpp in uh bryce's one of bryce repos here? Let's imagine we've got some for loop that's,
you know, iterating over some two-dimensional iterator, you know, like a range-based for loop
over some two-dimensional iterator. So one of the things it's going to do, it's going to call
increment on those iterators. So it's going to do
operator++ on those iterators. And then it's going to do some amount of comparisons to check
whether it's reached the end iterator. But let's just focus on the increment right now.
So what happens when you increment one of these multidimensional iterators. Well, you can't just increment both of the indices or one of the
indices. You also have to check whether you've reached the end of one of the rows. And so,
like, what you do is, you know, the code that I have before Connor, I do increment the i or the first index,
and then check whether we've reached the end of that extent.
And if we have, then increment the j index and reset the i index to zero.
Now, that means that there's an if statement in there.
So conditional or control flow, a branch.
And I've put these little comments next to some of these operations here.
Like next to that first increment of the i-index,
I've written inner loop iteration expression.
And then next to the check of whether the first index
has reached the end of the row,
I've written a comment that says inner loop condition.
And then I've got the increment of that J index,
and I've called that the outer loop increment.
And then the reset of the I index to be zero,
I've called that the inner loop init statement.
And I've added these comments here to make it clear
to somebody who's reading this code
that inside of this increment operator is the logic or most of the logic of what would
be a language for loop if you were instead writing this as like a C style nested for
loops.
So if instead of using this fancy 2D index iterator, you were just to write one for loop over the i indices
or one for loop over the j indices
and then one inner for loop over the i indices,
that inner i-indice- C for loop in this code, it becomes part of that logic gets put into this increment operator.
And then some of the other logic.
So I mentioned the outer loop increment is inside of this increment operator.
Now, the outer loop init statement is going to happen in the actual for loop where you use this index iterator.
And ditto for the condition where you check whether the index iterator has reached the end.
But the point here is that we've embedded a for loop into this iterator.
And the problem is that the compiler doesn't understand that for loop, that inner loop. Compilers are pretty good at understanding loops and optimizing based on loops, and especially multidimensional loops. But all of that relies
upon a compiler being able to lower loops to some canonical form that it understands,
and be like, hey, this is, like this is a loop and like,
like then here's an inner loop within it. At the IR level and compilers, like at the LLVM IR level,
there is no loop construct. There's just, you know, a series of lower level, you know, assembly-ish operations. But the LLVM backend understands how to recognize a loop in that form.
One of the problems with these multidimensional iterators is that the compiler doesn't understand
that this control flow logic that's
inside of this increment operator
is actually an inner loop.
And so it can't do things like avoid
having to do that if check every iteration.
And it can't do things like vectorize or unroll the loop
or any of the loop optimizations that actually make loops fast.
So one of the cool things that I experimented with back in the day was,
well, maybe we could trick the compiler by using coroutines,
specifically using generators. So what if we had a, uh, uh, uh,
coroutine that would, like the body of the coroutine would just be two for loops, an outer
for loop over the J indices and an inner for loop over the I indices. And inside of that innermost loop, you would just co-yield the current indices.
And then this coroutine would return you
some generator type.
And then every element of this generator type
would be one of these indices.
And the idea behind this was,
well, if you do it this way,
in the cases where the compiler
does all the coroutine optimizations and inlines essentially the code in the coroutine body
in place, so it does the heap elision and the de-virtualization optimizations, then essentially
what it's going to do is it's going to take
these two for loops and just sort of inline them into the place where you use them.
And then because it's done that, the loops should get lowered in a way where the compiler
backend understands them.
So you can write code that's just, you know, C++, you know, stood for each over some iterators,
but it will be as if you'd written the efficient code of the nested C-style for loops.
And that technique actually kind of worked, but, you know, that's not really a production
solution because you don't want your multidimensional iterators, the performance of your multidimensional iterators to be entirely dependent upon coroutine optimizations kicking in. that we could come up with some set of compiler extensions
that you could use inside of an iterator to say like,
hey, I'm signaling to you
that this is a multidimensional iterator
and that I've got this logic of a for loop
embedded inside of this iterator.
And you, compiler, should be aware of that
so that you can recognize that for loop
and optimize accordingly.
But until we have that,
it is unlikely that we will add multidimensional iterators
to things like mdspan
because we would rather not have that instead of having a slow version of that.
So here is a question.
In a world where the generic algorithms provided by the standard library of C++ are able to work on these MD spans, and I've heard that they've MD arrays.
There's also a proposal for that.
So in some future C++ standard, there will both be an MD array and an MD spam potentially. And in a world where you have one of these future multidimensional iterators that enables you to use the generic algorithms on these MD arrays and MD spans for shape preserving algorithms, such as like transforms i think there's no question sort of how those work
right because it's a atomic operation that you're applying to each atom or element in your
multi-dimensional array but how does like a reduction work because in the array language world, you have a way of specifying the rank or the axis on which the reduction occurs.
Because when you do a reduction on a matrix, you can sum row-wise, you can sum column-wise, and as soon as you start to deal with cubes or hypercubes? Is there a way of using generic algorithms?
That's a great question.
And it gets to the other more subtle part of multidimensional iterators.
So when you're iterating through
a one-dimensional set of elements,
how many ways are there to iterate through?
I mean, it depends on the exact version of that question you're asking, but it's either one or two.
Forwards or forwards and backwards.
Wrong.
Wrong.
What you answered was the two reasonable ways of doing that.
But you could imagine other ways.
Like you could start in the middle and spiral out.
Yeah, some pancake or shuffle thing.
You could do weird things like that.
But like that's not, you know, that's not going to be super common.
But your answer of two, like there's two reasonable ways, forward and backwards.
Yeah, that's pretty spot on.
And I'm going to posit that in like a lot of cases, maybe 80% or 85%, maybe even 90% of times.
You know, you just really care about going forward, right? Now, in multidimensional space, there's a lot more iteration orders. There's a lot more
options. And more importantly, there's a lot more reasonable options options and perhaps even more importantly unlike in the one-dimensional
case there's not a clear default and the one-dimensional case go going from like start to
end like going forward like that is i think everybody will agree that is a pretty reasonable
like default like if you don't ask for something special that's what should happen That is, I think everybody will agree, that is a pretty reasonable default.
If you don't ask for something special, that's what should happen.
But it's a lot more complicated for even something like a matrix, a two-dimensional thing.
Do you go row-wise first or do you go column-wise first?
There's not one of those two that's a
right default but i mean array language people would disagree but continue um i'm i'm uh no
there's something there's something that's coming out the day after this gets released the next
episode of array cast there's a whole discussion on rank and
something called leading axis theory which is basically a theory that came out of array languages
that asserts that uh you should always your generic algorithm or your algorithm should
operate or be implemented such that the default is uh the leading axis. And that way you can use something
like a rank operator to drill down to any level of granularity that you want. And if you do it
any other way, you've basically limited the scope of functionality that you can implement.
And that is a great answer for theorists. But in the real world...
What do you mean? APL is in the real world what do you mean apl is in the real world in the real
world where um uh performance matters and where the uh memory access patterns um uh often are
the deciding factor and whether your code is fast or slow.
What you've just said is not necessarily true.
I mean, I'm not the right person to answer this question
because I have not fully implemented an array language interpreter or compiler
that includes a rank operator.
But Marshall Lockbaum, who has implemented multiple array languages
and was one of the subject matter experts on that episode,
said that leading access theory necessitates that you get the best performance
in terms of like contiguous elements being operated in.
Right, but what if you're on a platform
where you don't necessarily want every thread
to be accessing contiguous elements?
For example, a GPU.
On a GPU, you want all of the threads in a
warp to be accessing different elements. Or you want them to be
processing different streams. You don't want in a single iteration, you don't
want all 32 threads of the warp to be accessing 32 contiguous elements that are right
next to each other because what you instead want is you want thread zero to be accessing you know
some element and then thread one to be accessing some element that's at a stride away from that
element because you want thread zero to be able to prefetch all of the subsequent elements that it's going to be touching.
Like the, whether or not you want contiguous access or strided access, like these memory patterns, it tends to change whether you're on a, know a simp t type architecture or a sim d type architecture now sure within a within a single thread it may be the case that
um uh that you always want contiguous access but but like just as like a counterpoint to the
example i just gave if i'm on a cpu um uh a CPU and I have a bunch of threads on the same core, I might want them all to be accessing elements that have a great deal of locality because I want it to stay within cache.
But if I'm on a GPU, I might care more about being able to overlap and hide my memory latency.
So wait, let me ask a potentially a question that will show my ignorance about how GPUs work.
Are you saying that for GPUs, if so, you say you've got, let's try and make a concrete example that'll be easier for you to answer.
If you have a matrix
whatever you can say 10 by 10 elements or 10 million by 10 million elements and it's stored
in row major order which is i think i think the naive intuitive way um to think about it but if
it's not for you it's the counterintuitive way but so that means if you've got the numbers from
you know one to a hundred uh they're stored just as an iota sequence from one to a hundred as if in a vector.
And then, you know, your shape is 10 by 10.
And so say you want to sum these row wise and you want to sum them column wise.
So in the row wise way, because each set of 10 elements that you want to sum are stored
contiguous and contiguously in memory
uh you're saying that like in the case where you want to do it column wise and you're skipping
every 10 elements so it's slightly it's the exact opposite instead of having each 10 elements you
want to sum up together there are 10 elements away from each other you're seeing you're saying that like a gpu might be more performant on the
column wise data because threads can do some sort of pre-fetching for the data that they want
versus having them stored contiguously like i would assume that whether you're on a cpu or a gpu
both like the row wise the rowjor reductions would be more performant.
Remember that the...
So on the...
Think about how the GPU core works.
So all 32 threads within a warp are...
Hopefully Olivier isn't listening. They're real threads, but like,
they're sort of virtualized. Under the hood, it's like one execution unit.
So what's going to happen? Okay, we go and load data, right? We're going to go and load,
you know, 32 elements of data. All of those threads are going to do it at the same time.
And the reason they're going to do it at the same time is because there's actually only one hardware unit that does the load.
Okay. And so we want to load 32 elements of contiguous data, right? That makes sense.
So that means that each one of those threads is going to operate on one of those 32 elements,
right? And then what are we going to do next? Well,
we're talking about a reduction. So then we're going to go load the next element
that we're going to add in. And so we'll load another 32 elements. So because
those loads have to happen at the same time and we want them all to load, you
know, contiguous data, well in fact that they have to. Which elements should thread zero be adding up?
Should thread zero be adding up element zero and one, the elements that are contiguous and next
to each other? That doesn't actually make a lot of sense in the GPU programming case,
because we want every element, every thread to be working
with one element from that first load. Right. So no thread is going to be adding up two contiguous
elements. So no, what we want is we want that first thread to be adding up element zero and and the element 31, or index 32, sorry.
So the element at the zeroth index and the element at the 32nd index,
which will be from the second mode.
And then thread one will add up the element at the first and the 33rd index.
So it's not contiguous, but it's a more efficient memory access pattern because we're going to do
one load um of like the the 32 elements one of which each thread will use we're going to do two
of those back to back interesting so basically you're saying that yeah the the column wise way is is lends itself
yeah better on the cpu on the cpu each thread is going to be able to independently do its load
and so on the cpu of course we don't want we want um the that zero with thread to be able to load
and add together elements zero and one and one of the reasons why wewidth thread to be able to load and add together elements 0 and 1.
And one of the reasons why we want it to be contiguous on the CPU is because we want to use vector operations.
And vector operations, for the most part, want to operate on contiguous data. whether you want each thread to be accessing your data
contiguously for performance really
does depend on the underlying hardware architecture.
So once again, I will state that your claim
may be correct in theory, but in actual hardware,
it's not the case. And this is one of the reasons why it's
important to have an iteration model where you can iterate the same structure
in different iteration orders, because you might want to change that iteration
order depending on what hardware you're running on. Now I will assert
that the direction of iteration or this the strategy for iteration, is just a property of whatever iterator you're using.
If you've got forward iterators and you want to go in reverse, well then you go and get yourself reverse iterators.
And that same logic applies in the multidimensional case.
Like you might be able to get multiple different kinds of iterators
to the same structure.
It sounds like I need to come on ArrayCast
and explain some things to your co-hosts and buddies.
Well, I mean, yes and no.
It's just that there is no array language that's ever really been implemented targeting a GPU, specifically like targeting, not going through some, like there is the CodeDefunds, which uses ArrayFire to compile down to run on the GPU. And there's been talks of other array languages trying to do similar things,
but there's not been actually like an array language
that is directly like implemented in CUDA
or some library that uses CUDA or something like that.
Meaning that I don't think people have thought about it
to this degree.
It's like someone with your knowledge
about how gpus work
has not implemented an array language you're gonna you're gonna make you're gonna make me
like now i'm gonna have to go redo a part of my of my talk for array what do you mean educate
what do you mean you're gonna have to go redo What do you mean? You're going to have to go redo? Let's be honest, Bryce. I was just going to reuse my existing slide deck,
but now I'm going to have to go do something.
Good.
Good.
All right.
Well, we will continue this topic.
I have to think.
I also am actually now I'm actually confused
because I was thinking in my head is that
I'm going to have to talk to Marshall about what he meant
because really leading Axis, depending on the rank that you specify, like leading access theory is actually going to do the thing where you're strided. that it always leads to the best memory access patterns because your cells are always stored contiguously,
a cell being a chunk of the array based on rank.
And I mean, it might be what he said, I'm sure, is right.
It's not even strictly true for CPUs.
No, no, no.
I think I understand what he's saying.
I think whatever he was trying to say think I understand what he's saying. I think whatever
he was trying to say, I'm sure
he is correct. I'm probably
misunderstanding what he said.
I want... No, no, no.
I want to talk to this guy. I want to talk to this guy.
Marshall can come on ADSP. I'm sure
our C++ listeners...
Alright, alright. Bring him on.
Bring him on. You set that up.
Alright. Alright, i'll make it happen
i've also got my third podcast he's going to be a guest on that so marshall's just you have a third
what you're cheating on me i told you about the third podcast one time in a previous episode but
then you interrupted and then it never i put it in the show notes so i've got a few people that
signed up but also i don't have any episodes right now so it's what is your third my third podcast is
the programming language podcast we'll we'll end it here stay tuned listen great podcasting some episodes right now. What is your third? My third podcast is the Programming Language Podcast.
We'll end it here. Stay tuned,
listener. Great podcasting.
Some great podcasting. We're bringing
Marshall on. We're going to be talking about
multidimensional
iterators in the future,
multidimensional algorithms,
and CPU versus GPU.
This is going to be
exciting.
I'm going to have to schedule time to make that keynote talk.
Anyways, thanks for tuning in.
I'm making notes.
We at ADSP, the podcast, plan to stick around for at least as many episodes as CVPcast 294.
I think that's how many they had.
And more apparently,
because Bryce committed to you in the last podcast
that until one of us drops dead
or is murdered by the other person,
we're going to continue to do this.
That means through kids apparently.
I mean, you're going to have to talk to my-
I will make an exception if either of us has triplets.
Twins, no, we can handle twins.
If either of us has triplets,
we can take a break.
I appreciate that you made whatever percentage of our listeners are
parents. They are laughing to themselves
extremely hard right now
at the idea that you think
anything below triplets
is going to be just like
a walk in the park. I am going to have
very well-behaved children.
Oh, God.
That are going to do everything I say.
Look, look, look.
When I was a child, I had, my mother gave me, there was one room in the house that that was the designated room where I could have my toys.
And the rest of the house, you'd never think that a child lived there.
That's how it's going to be with my kids.
They're going to be very neat and orderly i say you are welcome to the listeners that are also parents
that are having a good chuckle having a good laugh the entertainment value is brought to you by
bruce brice adelstein adelstein well back future parent um and we'll we'll check back in with
bryce in uh several years couple years couple years it depends and see how that's going
I'll be dead because my children
will kill me I'm sure
does your mom listen to this podcast?
she's probably having a good chuckle right now too
she does not listen to this podcast
it's too bad
nor does the girlfriend thankfully
yeah
it gives us more freedom
like you joking about how easy having anything but
triplets uh i look forward to my solo podcasts when i'm recording on saturday morning by myself
because bryce couldn't make it due to the fact that he was up all night with his young babies um
which one of us is going to have kids first?
Oh, you definitely.
We've already had this conversation.
Oh, yeah, we did have this conversation.
And then we saved it for episode 200.
Ah, yes.
Yeah, so in three years' time.
Actually, less than three years at this point.
Two and a half years' time.
We're going to have to dig into the vault.
Yeah.
I've got a very aggressively labeled episode, whatever, 60 something point five.
Please do not delete.
Because it's gold and we can't delete that.
All right.
We've gone over.
And thanks for listening.
Thanks for listening.
We hope you enjoyed and have a great day.