CppCast - Catch2 v3 and Random Numbers
Episode Date: December 1, 2023Martin Hořeňovský joins Timur and Phil. Martin returns to talk about v3 of Catch2 and how it is different to v2. We also revisit the topic of random numbers and how Martin is still working on porta...ble distributions and why that is important to testing and other domains. News P2662R3 - "Pack Indexing" P1673R13 - "A free function linear algebra interface based on the BLAS" P2546R5 - "Debugging Support" P2996R0 - "Reflection for C++26" "Why I think C++ is still a desirable coding platform compared to Rust" - Henrique Bucher CLion Nova Links Martin's NDC TechTown 2021 talk on Catch2 v3
Transcript
Discussion (0)
Episode 371 of CppCast with guest Martin Horzhanowski, recorded 27th of November 2023.
This episode is sponsored in Kona,
whether C++ is still valuable given that we have Rust,
and the new C++ IDE.
Then we are joined by Martin Hozhenyovsky.
Martin talks to us about Catch-2 and random numbers.
Welcome to episode 371 of CppCast,
the first podcast for C++ developers by C++ developers.
I'm your host, Timo Dummler, joined by my co-host, Phil Nash.
Phil, how are you doing today?
I'm all right, Timo. It's good to be back together. How are you doing?
I'm doing all right, actually. I had a few pretty intense few months, which is also why I haven't
been on the show. But right now I'm doing all right. So people have probably been wondering
why I haven't been on the show since, I I think September. And also we haven't released new episodes in a little while.
So first of all, I was in hospital for a few weeks. Then we had a newborn baby and everybody
who has kids knows how intense the first month with the newborn baby is. So I had that going on.
And then I went to the Serious Health Committee
meeting in Kona, which you're going to talk about a bit later. And after that, while I was there,
I actually caught COVID. So then I was basically sick in bed for another two weeks. And I kind of
just recovered from that. You might still hear me coughing. So I was out again for another two
weeks. So it was one thing after another and yeah like there wasn't
really any uh possibility for me to do anything about cbp to to devote any time to cbp cast but
things are going well baby is doing great uh i have recovered from covert and so now i'm back
and um yeah phil what about you i was gonna say honestly tim of the length you'll go to to avoid
being on the show.
But I could talk.
I've been pretty tied up as well.
It's always busy this time of year with conferences and things,
and I've had extra ones.
And then on top of that, I'm also starting two new conferences,
and it's just that period of time where everything's got to come together. So I've had negative free time for a little while,
which has not been the best timing, given everything that's been going on with you.
So that's the backstory.
We are now back and we are sort of committed to going forward.
So let's get on with the show.
Right.
So at the top of every episode, we like to read a piece of feedback.
So this time, I don't want to mention a particular single item of feedback
but kind of just in general say that
we have received kind of mainly
feedback about whether the show
was still running and why there are no episodes
coming out and
so it's really great to know that people want us to
keep going, thank you so much
that means a lot and you will
be pleased to know that we are back
on our usual every two week schedule as of right now.
And we have some great episodes lined up actually for the next few episodes and into the new year.
So we have some exciting guests lined up that I'm very excited about.
So, yeah, you can expect us to be back in business.
That'd be great.
And we'd like to hear your thoughts about the show,
and you can always reach out to us on xmaster on LinkedIn
or email us at feedback at cppcast.com.
Joining us today is Martin Hozhanovsky.
Martin currently works in the R&D team at PEX
on audio, video, and melody content recognition.
He also used to teach C++ at CTU in Prague
and still maintains Catch2.
This leaves him with lots of opinions on testing and, strangely, also on randomness and C++.
Martin, welcome to the show.
Hi.
So we'll get to the randomness and testing in the main interview,
but I hadn't heard that you were working at PEX.
In fact, I've not heard of that before, but the Melody recognition sounds interesting is that something like shazam kind of like yeah we try to recognize
the same like audio and video in the content it doesn't really matter whether like it's
a file or stream or anything like this and we also try to do like melodies so you know you can
like recognize cover versions
of a song and you can say hey
this is the cover version of this different song
and
like it's the same one actually
so will it recognize
parody songs?
only if they are like very similar
you know like
they might, they might not based on like whether it's like inspired jenom když jsou velmi podobné. Víte, jako......můžou, můžou ne.
Víte, zároveň,
jestli je to inspirace,
víte, jako Vírtáld dělá
něco, které má podobný
způsob hudby, ale jiný text,
tak to je třeba borderline.
Jestli je to
velmi podobné
s některými věci,
tak asi určitě ano. Jestli je to style parody, tak ne. very similar with some word changes then almost definitely
yes. If it's
like style parody then
no. You know it depends
like how close it is together.
Just be interested to see how
it detects my I'll build
myself song.
Let's try that sometime. Right. So Martin
we'll get more into what you've been up to
in just a few minutes but first we have a couple of news articles to talk about.
So feel free to comment on any of these, OK?
The first one is that we had our autumn ISO C++ standards meeting from the 5th to the 11th of November.
So a few weeks ago in Kona, Hawaii, in the US, That was the second meeting dedicated to SIPA 26.
A lot of stuff happened there,
which we cannot possibly go over in the show.
So it's going to just be about a small selection
of what happened there.
We had 170 people attending,
two thirds of which actually made it in person to Hawaii
and the rest was on Zoom
because now you can also participate remotely.
We formally adopted already a few papers for C++ 26. The ones that I found most interesting was
P2662 Pack Indexing by Quentin Jabot and Pablo Halpin. So that's actually really cool. In C++ 26,
when you have like a parameter pack you know template type name uh
like dot dot dot and then inside the template you have like t dot dot dot which is like the pack
you will be able to subscript into that you can like basically write t dot dot dot
square brackets zero and then you get like the first template parameter so uh that's really cool. Then we have P1673, which is a huge paper that people have
been working on for a very long time. A free function linear algebra interface based on the
BLAS. So the BLAS is a library that has been around for decades to do linear algebra. It has
lots of vectorized algorithms, which are tuned for blazing speed on your target hardware architecture.
And now we will be able to use that directly
from standard C++ 26 through this new interface.
And the third paper that stood out to me because I was like,
yeah, this is so cool.
Everybody's doing this, but everybody's
doing these ugly macros and platform-specific hacks
and everything is B2546, debugging support. so we're going to get std breakpoint
std breakpoint if debugging and std is debugger present which gives you a portable way to trigger
a breakpoint if you happen to have a debugger attached and also query whether that's the case
finally again right so that's also been you know in the works for quite a while that paper but like
finally it actually got adopted so i'm very happy. Now, I also want to talk about two study groups. One of them, I'm a bit biased
here because I'm actually a co-chair of this study group. So I'm particularly excited about what's
going on there, but I hope that I'm not the only one excited about this. SG21, the contract study
group is making great progress. So as you may or may not know, contracts is a feature that has been
at the last minute removed from C++20
and ended up not being standardized.
And ever since we've been,
so we formed this new study group
and ever since we've been trying to like,
you know, put it back together
and actually come up with a design
that actually works for everybody.
And we actually, we've been at it for how long now?
Three, four years or something like that. And we actually, we've been at it for how long now? Three, four years or something like
that. And we're almost design complete. And so it now looks like there's a very high chance that
we're going to be actually done in time for SOS26. And we're going to hopefully get contracts in
C++26. I'm pretty confident at that point that's going to happen. We do have a new syntax that we
agreed on in Kona. So we ditched the double square bracket attribute like syntax which quite a few people were unhappy
about so now we have a new syntax which is basically just keyword pre and then in parens
you put like the predicate and then post and again in parens put the predicate and for asserts it's
a bit more complicated because we really want to write assert parens but unfortunately that's the existing c assert macro so we spend quite a lot of time
trying to figure out whether we can kind of shoehorn that into the macro thing so that like
it's either the macro or the contract depending on what you're doing and somehow so we can use
kind of like a keyword assert there for this nice syntax. But then after quite a few discussions,
we found out that it's not really going to work,
not in a way that doesn't break.
So either you break existing code
or you have code that means two different things
in two different contexts and it's not obvious.
And both of those things are very bad.
So we decided to not mess with
basically the existing assert macro.
And then we had another very long discussion
about what alternative keyword to adopt. we settled on contract underscore search after having considered about 50 different
keywords i hope one of them was co underscore assert yes yes there were quite a few quite a
few suggestions some of them rather hilarious so uh we agreed on that syntax um we just have a very
very few open design questions left at this point,
kind of little to-do items.
One of them is the semantics of contracts on Lambdas.
There are a few questions about how they interact with captures.
On virtual functions, again, there's a bit of disagreement
on how they should work on virtual functions,
whether you should inherit the contract from the base class function,
and there are reasons why you probably don't want to do that um how contract checks should behave during constant
evaluation so basically at compile time and how they should interact with like the no accept
operator and like deduced exception specifications and things like that uh whether you know a contract
check actually uh can make something not know except
because it can technically throw because you can attach a throwing violation handler,
but you don't really know at compile time if somebody is going to do that.
And if they do that, it's kind of only going to throw if it's incorrect anyway.
So it's a huge wasp's nest of questions, which you kind of have to tackle.
But I'm sure we can deal with that
and other than that we're pretty much done and the other study group that has made lots of
interesting progress in kona is sg7 the reflection study group which actually is alive and kicking
again so so very pleased to hear about that i think a lot of feedback that i heard people say
about like the committee and what the committee is doing is what about reflection?
Everybody needs reflection. Nothing happened about reflection.
And indeed, I think there was the, I think there was the TS, right.
Which had this kind of quite cumbersome,
like template metaprogramming syntax where you have to do like colon,
colon type and colon, colon value for everything,
which I think somebody implemented.
But like, it's not really nice to work with.
So then we decided you want kind of something better,
something that's a bit more like value oriented
rather than type oriented.
And then there was this big paper,
I think by Andrew Sutton,
which introduced lots of new syntax.
But then work on that just stopped.
And I think there was pretty much nothing happening
kind of the last couple of years.
But now a bunch of people have picked it up again,
which is very exciting.
There's a new paper, P2996R0,
reflection for C++26.
And the exciting news is that that and Kona
was actually forwarded to UWG and LUWG.
So that's kind of the next level where they kind of look at the design,
but like it's making progress.
And it's a paper that provides kind of a core of static reflection
that's useful enough to solve many important problems like, you know,
enum to string and things like that, iterating over members,
like typical things that you want to do,
while also letting us plan to continue building on it post-CSS 26.
So it's not going to give you everything,
but it's going to give you something that's very useful.
And so basically the four things that it does is
it gives you a reflection operator,
which is the prefix.
How do you call this character in English?
It looks like a roof or something carrot
yeah carrot or hat yeah so that thing so if you put that in front of uh like any entity like a
type or you know an object or something it produces a reflection value that's going to
have an opaque type std meta info so not dedicated types like std meta variable std meta type that there's reasons why you don't want to do that so you're going to get an object of type stdmetainfo, so not dedicated types like stdmetavariable, stdmetatype. There's reasons
why you don't want to do that. So you're going to get an object of type stdmetainfo.
Then you can do stuff with that object. There's a number of const eval meta functions to work
with them. You can say stdmetamembers off, and then you can loop over that object in a range-based
for loop or something, and it gives you the members.
And you can turn these reflections back
into C++ language elements with something
that's called a splicer.
There's a bit of a novel syntax,
like kind of a square bracket, colon,
and then the thingy, and then colon, square bracket.
So you can do things like something is, you know,
carriage and then a type, and you get like a meta type but then
you can say type name square bracket colon the meta type colon square bracket and then it's going
to give you back a type and then you can do type name that thing you know x equals 42 and then
declare a variable of that type and kind of turn it back into like something that you can use in
regular c++ syntax so you get this kind of round trip. So that is quite cool, quite comprehensive.
I think it works really nicely.
And yeah, it's been forwarded
kind of to the next level in the committee.
So again, hopefully, I think it looks good
for getting it done in time for C++ 26,
which will be very, very exciting.
Our next committee meeting will be in Tokyo, in Japan,
18th to 23rd of March,
2024.
And I hope that,
you know,
both of those study groups will make further progress.
So I'm very excited about that.
I am too.
And the,
yeah,
the syntax for reflection,
I'm not entirely sold on it yet.
I haven't read all of the rationale,
but we'll see whether that,
that survives,
but it was interesting. The, some of the Reddit discussion on it, I tend to agree with.
At this point, if we get it at all, I'll take it.
Yeah, I haven't read the Reddit.
I have been staying off Reddit for the last few months.
Very wise.
So then I have one more, one blog post that caught my attention by enrique boucher who writes
why i think c++ is still a desirable coding platform compared to rust and obviously many
many people have said their opinion on this over the last year or so with this whole safety debate
going on and they've been countless talks about this and stuff like that but i think it's an
interesting take on it which which I kind of like.
So Enrique is going into a bit more details
about performance.
They're talking about intermediate resultation
and undefined behavior and cash locality
and things like that.
And conclude that Rust doesn't really give you
performance benefits.
I mean, probably that's not what you're expecting anyway,
but most of the time,
the performance benefits, as he writes,
are either inconclusive, non-existent,
or more likely negative.
So performance probably isn't the reason
why you would switch to Rust.
Rust gives you safety benefits,
which is kind of the big selling point
that everybody was talking about coming from C++.
But for many domains, they're not really that pressing, you know, for most applications.
So that's kind of the point that they're making.
So the question is, is it really worth moving to a completely new language?
Especially if it has such a kind of long and hard learning curve as Rust.
And his personal answer is no, which I found kind of interesting.
The one thing that he says Rust is, you know, objectively an improvement on is that it's a much more modern language than C++ in many aspects.
And I kind of have to admit, that's probably right, even though, you know, you talk about modern C++ and we put a lot of work into modernizing C++ on the committee.
But Rick writes that, you know, modern Seatless Stars is not really modern.
It's just lipstick on a pig.
I'm not sure I would like put it quite like that,
but maybe there is some truth to that.
Not even lipstick.
So maybe there's some truth to that.
So for him, like Rust has that one advantage,
which is relevant,
but it doesn't
quite provide enough benefit to jump the boat
so you know it's
another opinion piece one of many but I thought
that was kind of noteworthy or interesting
but like my takeaway from this
is sounds like Superstars is not doomed
yet which I'm very happy about obviously
I
disagree with this
alright
safety is performance right All right. I think that safety is performance, right?
Because right now I have places in our code base
where I would like to reduce the sharing that we do.
Or rather, I would like to just share and read references
to some data rather than copy them around.
But I don't trust myself,
nor do I trust people who will have to refactor
the code after me, not to
mess it up. In three months, when I
need to make a quick change, will
they get all the references correctly?
I don't know.
And I would rather make a copy than
in six months to find out that we are
overwriting our data and crash.
So
there is a bunch of performance you get just from
better safety guarantees so you're what you're saying is the intersection of
safety and performance where it really shines yeah like like being able to make more complex
refactoring is that does give you better performance. Right.
Yeah, I think that's fair.
So I think it's a very hotly debated question. And actually, Enrique also posted a link to their blog post on Reddit
with the tag roastme, which people did.
There's like 443 comments there on that thread,
which I have not read.
But yeah, it's a hotly debated topic, obviously.
And so I think there isn't really one
right or wrong answer here.
All right, so the last news item for today is
there's actually a new C++ IDE.
It's called C-Line Nova by JetBrains, and it's just been released as a free early preview.
So it's kind of a new IDE, but also a version of C-Line at the same time. It's kind of a new
version of C-Line, which is available for free through the Toolbox app as a preview.
And the difference to the kind of previous C-Line, which to the previous C-Line,
which is now called C-Line Classic,
is that C-Line Nova uses the
same language engine as ReSharper,
C++, and Vider, which is the
other products
by JetBrains that do
C++, instead of
the legacy C++ C-Line
engine. So basically they completely exchange
the whole backend
that does like all the language stuff,
like the parsing and refactoring.
And so that's a major change.
I know that they've been working on this
for quite a long time, like many years.
And finally they released a version of CLion
that uses the ReSharper engine.
It's actually a major change.
It's much faster, it's better. It's more reliable.
It's also experimental.
So the plan is not actually, as far as I understand, to have a standalone new IDE separate from
C-Line.
So the plan is to then merge C-Line Nova and C-Line Classic again into like one product.
So there's not going to be like a new product or anything.
But right now it's kind of a standalone thing.
So you can try and experiment with a new engine and see if you like it so i'm very happy
to see that kind of version see the light of the world because i know they've been working quite
hard on this for for quite a while and i know it's a it's a huge improvement to kind of swap out the
the whole language engine underneath the ide so that's that's really cool and it's i think it's
technically quite a feat they pulled off
because you basically have a JVM front end
talking to a.NET Core component that's then parsing C++.
So you've got these three ecosystems all playing together,
but completely seamless.
So I haven't tried it yet, actually,
because as I said, I was in hospital and I had a baby.
Then I was in Kona and then I was sick with COVID for two weeks.
So has anybody tried it already?
No.
All right.
But yeah, maybe let's do that because it sounds like it's pretty cool.
All right.
So that concludes our news items for this week.
And I think we're ready to transition to our main topic.
So Martin, hello again.
Hi.
So Timo, you've done a lot of the talking so far.
So I'll pick up most of these questions since,
I don't know if a lot of people know this,
but I used to be involved with Catch 2 as well.
Aren't you the guy who originally wrote it or something?
There is a rumor that that might be the case.
Martin's been doing a much better job with it since
he took over, so I'm glad we've got you here.
Good to hear that you're still maintaining
Catch 2.
It's on v3 now, but has been for
a few years. What is the difference
between v3 and v2?
What's the
latest that's gone into it?
The big difference is that now
it behaves more like actual library
so you know when you like uh want to compile it you compile a static library that you link later
which gives you two big advantages one uh if you happen to use that bad linker like BFD, then it no longer takes 10 minutes to actually link your project with
tests against cache 2. And the other one is that since it's now multiple headers,
people don't pay the compilation costs for features they don't use. So if you just use like test case and require and section you you include one header
and like you include i don't know a third of the code you would with like single header model so
it compiles faster for you and this also means that like you can now add features to catch tool
and you don't you no longer have to be have, okay, so this sounds useful but I don't know whether
it's useful enough
that I want to make
everyone's compilation slower
because now you can just say, well, it's useful
it has its own header
so only the people who use it pay the cost
and it also works better with
VC package and Conan because
they expect you to have static library or to be header-only, not to have this hybrid model where you actually need to define some macro in your own translation unit to compile the implementation part of the library.
Wait, so is it header-only again now?
No, it's not.
Okay, because I need to say something
that's probably a bit controversial or not nice.
So I was using Catch2 for my private projects for years.
So mostly what I do in my free time if I write code
is generic library
stuff or like header only stuff mostly.
And, or like little, little projects here and there, not like a big app that I'm compiling
and linking together.
And so I always, I always do TDD.
I write tests for everything, but like, I always like liked that you can just do hash
include, you know, catch and catch, and then I'm done.
And now that, then when that stopped working,
I kind of tried to, okay,
now I have to include this header or that header.
And I actually have to like add stuff to my CMake file.
And very quickly I was like, no, I don't,
this is not worth it for me.
Like, because I'm not using any of the kind of advanced features.
I just want to have the kind of basic functionality.
So I basically moved on to doc test
because that gives me back this ability
to just say hash include.
And that's it.
That's all I have to do.
And it still gives me all the macros that I need.
So I basically moved on from catch.
So do you think that was a bad move
or should I come back?
What do you think?
I mean, if it works for you,
you can still mostly use cache like that.
We now provide, you know, like we do what SQLite does
where it gives you one huge header
and one huge CPP file that you drop into your project.
And so you can like include cache amalgamated.hpp
and compile the
cpp file and it still works.
But you will
pay the cost of
just including every single
header that there is in cache,
rather than including just the headers you need.
I wonder
if you had any more feedback like this
of people who kind of moved on
despite the new cool features because they couldn't just hash include it anymore.
I know that some people are all about that one header.
But I think a lot of these people actually moved away sooner
because if you use the basic features only,
then Doctess does provide this,
and it plays a lot of, quite frankly, stupid games
to give you faster compilation time.
It will go and it will forever declare the standard library,
and it will use the mac forever declare the standard library and it will use like the macros that the
standard library uses to define its current version namespace. Wait, what? Yeah, it's
Oh, God. Okay. This is what doctors does to give you the compilation times, right?
Okay. Interesting. I mean, that's mean, if you have, like, the person
who... Maybe that's something I probably didn't need
to know.
Okay, so, like, if you have the person who only needs, like,
the basic macros, this gives you, like, really
nice compile and performance.
You know, and I know, like, a lot of people have moved
away from Ketch before because they only want,
like, this basic feature.
And they see, like, all the extra
features as extra bloat yeah you know like
i'm fine with it okay i'm not fine with all the ub in doctors but hey you know all right but like
i'm fine with the idea of hey i just want like the basics and i want one header so i will use
something different yeah i think UB in dog test
is something that you could usually get away with
as long as it's maintained.
And I heard recently that it's not been kept up with lately.
It just keeps going.
And I know that there are issues where...
I think it was Erdogan Jason moved to dog moved to No, wait.
Yeah, they moved to
Basically, look, people do a lot of
weird things with C++, like
define private-public in their tests
and then, like,
then the compiler tells them, okay, guys,
no, like, you have just
broken everything. They are like, but why? I want to do this.
But there is still the way to get the single header and single CPP file.
Yes.
Which does sound like a reasonable middle ground to get you back most of the convenience,
whilst also still having some compiler performance benefit
because you're not recompiling the implementation every time.
Yeah, only the headers.
Yeah, if you want this, you can do it.
And I won't stop you, and I do support this use case.
Right, but it could be a gateway to using BigQuery
in its full glory because you can then start breaking down the headers
where you see that's going to be more beneficial.
So that seems like a good migration tactic, I think.
So there's currently no way to use it fully header-only
in V3, is that correct?
Let's be real.
K2 was never header-only.
It was CPP file in your header.
I never really looked at implementation.
My litmus test was always,
do I have to go into my CMake file
and mess with that in order to add it?
And the answer used to be no,
and now the answer is yes.
You just have to add one CPP file at minimum.
Yeah.
But I think that's probably, you you know for those people who need the
or if you care about compile times or for those people who need the extra features
like it sounds to me that it's very well worth the trade-off
and i think that's the thing the people that really needed this to be moved to a more sort
of standard library model is um it was a real showstopper for them before.
So I think it's a good move overall,
but I do get a lot of feedback as well that,
oh, we really missed the header-only version.
But I believe Catch 2 version 2 is still maintained.
Not necessarily new features, but it's all maintained.
It has been for a bit, but now the model is that if user makes a fix and it passes the CI,
I will merge it and when someone reminds me I will also release it as a full version.
But I don't maintain it anymore. Right okay Okay. But it's still an option for now?
Yes.
It's still there.
You can still download it.
The last release was 2022.
So if only v3 is getting new features, inclusion model aside,
what else are you getting with v3 now that you're not going to get in v2?
Okay. What else are you getting with v3 now that you're not going to get in v2? Okay, I think I gave a talk about this, but it was two years ago.
I think the big one is that you can have multiple reporters.
You can, in parallel, have that are human-readable,
the one that gives you nice little output for users,
but you can also have an XML reporter that gives you all the structured data
for your CI and similar things.
So you can have multiple reporters next to each other.
Some fixes in generators, as always.
The listings are structured, like the next release
will likely have remade the CMake script that automatically registers all the tests from your
test binary. And it will do so by basically using CMake to parse JSON because the old scripts that were parsing the human readable
output always run into
some edge cases where it
wouldn't handle them.
And there is also like new
generic matcher
system so you can like
match various
ranges and
types that aren't like
standard, let's say.
Like your measure now can match
templated types against other templated types.
You know, you can say I will get some range.
I have no idea what the range will be,
only just that like it behaves like C++ range
and I want to know whether it contains some element.
That's like I think the big changes. And I want to know whether it contains some element. Right.
That's like, I think the big changes.
That's quite a bit to give you a carrot to move you forward.
I'll put a link to that talk in the show notes as well.
Was that the NDC Tech Town one?
Yeah.
Yeah.
So I was there there i'll find that
out and um actually i want to complain about c++ i just remembered like a change i made for three
for like v3 so uh in c++ we have some types that are like only comparable with zero literal right
for example the new set of the compare no no, set of the ordering, the result
of the new spaceship operator.
Sorry, what do you mean by zero literal?
If you have
the result of a
spaceship operator, you can say, is this
equals to zero, or is this less than
zero? But you can't compare it with
an int variable
that has value zero. It has to be
zero literal.
Oh, okay, yes. So I made the support. So I added the support for this in v3.2. And doing this together with
changing the decomposition macros to handle the C++20 rules for equals
was so much pain.
Because in C++20, now if you say A equals equals B,
the compiler goes and looks at A equals equals B,
but also at B equals equals A.
And when it considers B equals equals A, which you didn't write, it doesn't do this in like
Sphene context or similar with some idea.
So if it finds, so if it writes to like fill in the overload for B equals equals A, which
you didn't write and it fails, that's a compilation error.
So I spent like multiple days trying to realize why doesn't it work? Takže jsem se měl mnoho děl, zkoušel se vědět, proč to nevědělo, proč to nevědělo, proč to nevědělo, proč to nevědělo.
A pak jsem se věděl, že kompilér se podívá na základní levé stránce nějakého typu,
jako základní stránce B,
a říká, že co kdybych zkoušel B jako základní stránce levé stránce,
tak to představím pro argumenty na templátních operátorech, goes, well, what if I try b equals expression left-hand side? Then I will substitute this for the template arguments
in the equal operator.
And well, this doesn't work.
So obviously, this is like compilation error.
So I had to make those overloads like 3-nay-a-way
if the expression left-hand side is found on the right-hand side instead because c++ 20.
Do you know if that was the intended behavior?
Was that deliberate or an oversight, do you think?
I mean, that was the intended behavior.
So you could, you know, like the idea originally was to have just one overload
if you have like heter heterogeneous comparisons right the idea is that if i have
like type a type b and i say i can like equal signals for type a and type b then in the code
you know why do i have to define the like the other way around as well it's just busy work
the issue is that sometimes you do have like ordersensitive comparison operators. And yeah.
That sounds like fun.
It was a lot of fun.
It added like 2% to 3% slowdown
on the assertion macros, actually.
Because there is now, I think,
six overloads
between which we have to select for the equals operator.
When you write require A equals equals B,
I think there are currently six overloads for that
to handle all this.
Yeah, I had a look in the source code recently,
so I haven't touched it for a while um you've got all these macros stamping out different overload sets i see is that part of that yeah
that is part of it because they're all the same just different operator yeah
well that's um what you've done so far Is there stuff that you're working on in Catch 2, new features you're adding at the moment?
Yeah, I was writing reproducible random generators.
Ah.
Well, it's an exercise because I wanted to write reproducible random distributions.
And so what do I use them for?
Well, I will use them in Catch, so I have some idea of what to use them for.
That's a topic I want to dig into a bit more.
So it's a great time to pause if you hold that thought.
And we'll just have a word from our sponsor,
which is once again, Sonar, the home of clean code.
So SonarLint is a free plugin for your IDE
and helps you to find and fix bugs and security issues
from the moment you start writing code.
You can also add Sonacube or Sonacloud
to extend your CICD pipeline
and enable your whole team to deliver clean code
consistently and efficiently
on every check-in or pull request.
Sonacloud is completely free for open-source projects
and integrates with all the main cloud DevOps platforms.
So that's quite fitting, given that we're talking about testing and testing pipelines.
But you brought up random numbers.
Now, we did talk with Francis Bontempo back in episode 369, quite a lot about random numbers.
And your name actually did come up.
We're proposing a few years ago now something to do with predictable distributions of random numbers and and your name actually did come up there's um we're proposing
a few years ago now something to do with um you know predictable distributions of random numbers
which i think is what you were starting to talk about before we do get into that i think timo said
that he had some feedback on the episode as well so it's a good time for you to jump in with that
feedback yes so so i actually forgot most about most of it already because that
was like months ago but i remember this was the first episode where which i missed and um so then
i was kind of listening back to it and i was just screaming at the at the thing i was like no no like
said you would be uh quite a few times uh um i don't remember most of those things but one thing that i do
really remember is i think at some point somebody said that like there isn't really anything that's
truly random or something like that and that's just not the case like i just want to point this
out very clearly that the scientific consensus is that things like for example uh nuclear decay or something like that like quantum
processes like that are to the best of our knowledge of course that might be wrong but to
the best of our knowledge scientific consensus truly random so if you have a i don't know
something like a uranium nucleus which is going to decay at some point like that is truly a random
process and um there's actually a whole thing about that in physics where, you know,
Einstein at the time said, no, that can't be true.
God doesn't play dice, you know, and that there must be some hidden mechanism.
I think you were talking about this a little bit there.
Like there must be some kind of hidden mechanism that we just can't observe
that kind of determines when the nucleus decays.
So it's not actually random.
It just looks random.
But like that has been quite comprehensively disproven since then.
There's this thing called Bell's theorem,
which says that you can't have any local hidden variables
in a quantum system like that.
And it is truly a random process.
So there is actually true randomness in nature.
And I think that's quite profound.
And I think, oh, now I remember there was another thing.
There was like somebody saying that
like there isn't really a random number generator
that is truly random, right?
Because it always relies on something
that your machine does.
And I just remembered about this
random number generator,
which actually exists somewhere.
And I think you can, there's like,
you can actually use it.
There isn't like an API for it,
where I think they have this room with like lots of lava lamps and like a webcam directed at
the lava lamps and then it just kind of computes like a random seed based on like where the blobs
of like liquid are like in in the lamps and stuff like that so um things like that do exist
probably safer than radioactive decay stuff like that. Things like that do exist.
Probably safer than radioactive decay.
It was
a propagation video
from Cloudflare, I think.
It's called LavaRand.
That's what it's called.
Yeah, they have a...
I just found it here on the
internet. It's like at the office of
Cloudflare. They have a wall with lava lamps
and they use that for random number generation.
So I think that's pretty random,
if you ask me.
I think to an extent,
it's like a philosophical question.
Like with my physics hat on,
yes, true randomness exists.
We just need to harness it.
That's good to know. So Martin, is that what you're
proposing for the standard? That we're going to get lava lamps
into C++?
Or is it radioactive decay?
Yeah, it would be fun to say
every compiler vendor has to
have a set of lava lamps,
but no.
Right, so what's the problem
that you're trying to solve here?
So but no. Right, so what's the problem that you're trying to solve here? So,
which specific one do you mean?
With random, because...
Which problems do you have
that you're trying to solve?
With random, everything.
No, okay, so,
it's interesting because, like,
every time I look into random,
like, in more detail
I find more things that I
dislike
you know hate is a strong word
I don't care about random that much
but that I dislike
and so my original papers
were about three things
one was that
when you have like distribution
like uniform in distribution from minus 10 do 10,
tak dostanete různé rezultaty na různých platformách,
specificky různých STD library.
A také dostanete různé rezultaty na různých verziích
samého standardního library.
Protože například msvc změnil algoritmu, kterou používají, different versions of the same standard library. Because, for example, MSVC changed the algorithm they use
to do the rejection sampling,
and so now you will get different results on MSVC
between, I don't know, I think 2019 uses the old code
and 2022, like Visual Studio, uses the new code.
So, you know, it's not even stable
between the versions of the standard library.
So that was one thing.
Before we move on,
because that was the problem
that we were specifically talking about
in the episode with Francis
and has an impact when it comes to testing.
Do you want to remind us what that is?
So actually I should interject
because I ran into this with with audio stuff uh where you
have an audio algorithm a lot of them use like some form of random number generator right you
generate white noise or whatever and you run into this problem if you want to unit test these things
you want to have like the same output right if you change something so as long as you just use
a random number generator it's fine because the standard says this is this algorithm so it needs to produce this sequence but with the uh with the as soon as you use a distribution like
uniform float distribution uniform in distribution like that's just not guaranteed anymore and you
get a different sequence on every um on every compiler and so that makes testing it pretty
much impossible and you have to basically write your own distribution,
basically to get a predictable sequence,
which is hard because then it's like a lot of maths.
And yeah, that's a problem that I ran into very concretely,
you know, while developing like an audio product.
So I guess that's kind of what you're talking about, right?
That problem.
Yes, but you know, it's not just testing.
If you do procedural generation in a game or something like this,
it's the same issue.
Or repeatability.
You are doing physics simulations,
and you can just say,
okay, so we had this seed,
we generated some random numbers,
and the simulation did something really weird.
Then you have to use the same seed and go investigate what happened there.
You know, like, is this just... does the result make sense?
It's just not likely to happen, or is this something that is an issue with our simulation. And if you don't have repeatable random numbers,
then it's much harder to do this than it would be otherwise.
So isn't the solution for the distributions and the standard,
we just basically say, instead of specifying just the general mathematical properties,
we specify the exact formula that they're supposed to use?
That's one approach that we can do.
This got an OMDB against it from Walter Brown in Prague.
Okay.
He is a strong believer in distributions should be mathematically described,
but not specified exactly.
And I also talked about this with some other people.
And I think that the better approach would be to have named distribution.
We don't say this is uniform int distribution with this implementation.
But you say, okay, this is like,
I don't know, let me, like,
this is like Lemire's distribution,
or, you know, this uses the Lemire's
rejection method
for uniform int ranges.
Because then you are not locked into
like the one implementation.
Because if we standardized the implementation in 2000, I think the random was very early in
C++ 11, so I don't know, 2006, then we would be forever locked into bad implementation.
Because the one that we use right now, Lemire is, I think, two years
old, you know, and it's much faster than most of the ones that we used before.
So, like, the reproducer is good, but we should really, like, make named distributions after
the implementation.
You know, similarly, you would have, like, Box-Miller transform, but you would also have,
like, the Ziggurat algorithm and so on
each has a name distribution rather than what it does than what it is so i really like this
approach because it also solves another problem which i also ran into uh while writing like audio
code which is sometimes you don't actually care about all the mathematical properties of these distributions, right?
For example, if you roll a dice, you want random numbers one to six, right?
Over and over and over again.
So if you use like the mathematical properties,
the way they are in the standard, it guarantees that it's uniform, right?
But that actually prevents it from being a one,
because six doesn't fit into like
int max like an integer number of times so then you don't have a guarantee that it's gonna like
a distribution is allowed to take a random number throw it out and and try again right to guarantee
this uniformness but sometimes if in high performance code you don't care about that so
much like you're generating noise or something you don't care about that so much. Like you're generating noise or something.
You don't care about perfect uniformity mathematically,
but you care definitely about the thing not running twice as long occasionally.
You want it to take the same time always.
So you want a distribution that isn't perfectly uniform,
but it's only nearly uniform, but is like a one,
like you just take the modular or something,
obviously not the actual modular,
like something faster than that,
but you know, something like that, right?
So, and that wouldn't be compatible
with like the standard stuff.
But like, if you say, well,
this is like the modular distribution or whatever,
which isn't quite uniform, but it's fast.
You could standardize it.
That's pretty cool. I you could standardize it. That's pretty cool.
I will standardize it.
Because the big
advantage of random in C++
is that it's not
beginner-friendly, but
because it's decomposed, you can just write
your module distribution in five
minutes and say, look,
this is my module distribution.
It's not uniform. It's not like
anything, but it's mine. It works. Go. You know, and like all the compact, like all the random
number generators and so on will work with it. So like you don't have to standardize this because
it's easy to write and has only some uses. But like, actually, the big issue that we have with distributions is, okay, so they
are not repeatable. The other
issue that is in the standard and that I
ran into basically
this month when I was implementing
my own is that
there is actually
no one size fits all.
You want a different distribution,
different implementation of a distribution
for use case where you generate one random number it's all. You want a different distribution, different implementation of a distribution for
use case where you generate one
random number and
for the use case where you generate a lot of them.
Imagine that we are
talking about something simple like
uniform int. This is
well-studied, we know how to do it quickly
and so on.
If you are writing a shuffle
Fisher-Rates algorithm,
you will create like
n different distributions, always
with different bounds.
Because every time you need a random number,
you need it from different bounds
for the shuffle.
And this has like
very different properties from what you want to happen
if you are generating a lot of numbers
in the same range.
You know, my most common use
case for random number generation currently
is writing input data for
benchmarks.
You know, like, okay, so I want a vector of
10 million integers
in some range that makes sense
for what I'm doing.
And this is a different problem.
You want a different
implementation for this
than if you are shuffling
a vector of 10 million integers.
Because for the shuffle,
you will always need a different
bounce on the range.
And this
is interesting because
you can see that all the
big implementations of std-late
basically implement distributions as free functions that kind of hold onto some data.
If you create a uniform distribution of ints,
sorry, it's a uniform integer.
If you create a uniform distribution of ints from minus 10 to 10,
then every time you call the operator
on, like, the call operator with RNG,
it will compute the image
of the minus 10 and 10 range
in unsigned ints.
It will subtract the new bounds
to determine that actually, like,
the range is 21, I think. I might be off by one, you know.
So it will... Okay, so I need an integer between 0 and 21. Then it will generate an integer between
0 and 21 using some well-described algorithm that we understand and know is uniform.
And then it will take the integer that is generated and subtract it back to the original range.
And a lot of this work that you are doing
is fixed given the range you are working on.
You know, like the images,
like the distance in your range
and the images that you need to do,
like the adjustments that happens
to get from the integer to unsigned integer.
These are all fixed, right?
But you will recompute them every time you call the call operator
on your range, on your distribution,
because the API is such that it's easier to do this,
to implement it like this,
than to do a distribution that is optimized.
Because then you would have to basically
write it twice.
But, you know, nobody really wants
to do that. So instead
we have these weird
distributions that are
basically pieces of...
There are three functions that hold a little bit of data
for you, but
are not actually optimized for being
objects that are
reused.
So, like, the end
result would be that if you wanted to
make proper distributions for your
random library, you would have to
implement both. You would have to have, like,
you know,
uniform integer distribution
that is an object
that will optimize all the calculations
that it can, assuming you will call it
over and over again.
And then you will have a free function that just doesn't.
Because it assumes that you will
call it once and not ever again.
So what you're saying is
just calling rand is not enough?
No.
And we also have a lot of other issues
in the standard.
We know that the generate canonical is wrong.
We have wording that...
Okay, so the wording contains implementations
that you are supposed to use.
If you use it,
it will not return the values that it's supposed to use. If you use it, it will not return the values
that it's supposed to return, right?
Because the generic canonical
is supposed to give you
a random number between 0 and 1 non-inclusive.
If you implement the algorithm
that's written in the standard,
you can get 0 to 1 inclusive
because of rounding.
And then the same happens for the uniform
real distributions, where they are supposed to give you from A inclusive to B non-inclusive.
But if you use the suggested implementation, which is called GenerateCanonical, then even
if you fix GenerateCanonical, you will find yourself with inclusive range, again
due to rounding in floats.
So actually I use a completely different algorithm for that in cache, so I don't have to deal
with this.
Nice.
So if you've not really been in the world of random numbers, this probably all sounds
much more complicated than you were expecting, but it's all...
I mean, like, the really hard part
is that basically, you know,
floating points aren't, you know, real numbers.
You can't do, like, the idea of,
okay, so I want random number between 0 and 1,
non-inclusive, then, you know, in, like, real numbers,
you could say, well, I'm going to generate
number between 0 and 2 to 64 minus 1, and I will divide it by 2 to the power of 64.
Sorry. And I will divide it by 2 to the power of 64. And mathematically, this will never give you
exactly 1. Right? And this happens for every exponent you can use. You will never get exactly one. The math
just doesn't do that. Voting point numbers, they do because of the rounding. So you have to
basically handle them differently. And we kind of aren't ready for that or other people forget
that you can just do math
with floating point numbers
well thousands of audio people
have been doing maths
including that kind of stuff
with floating point numbers
to generate noise and other things
it works
but I think it's different
because there's no requirement
for it to be mathematically precise
the requirement is that it sounds nice so for that it's different because there's no requirement for it to be mathematically precise.
The requirement is that it sounds nice.
So for that, it's definitely good enough.
What other applications are there for random numbers where getting this stuff right really matters?
That's a good question.
There is always like,
I'm going to go back to generate canonical for a moment.
So it gives you a random number between 0 and 1, non-inclusive.
And it's supposed to do this by algorithm that basically takes up to,
let's speak in concrete numbers, so for double,
it will take 53 random bits and then divide them to get a double in
that range. It gets 53 bits because that's how many bits you have of precision in a double in
the mantissa. And this is uniformly distributed. It works with the one exception that actually To funguje, s jednou způsobem, že generujete než 0,2 % všech náročných náročných náročných náročných.
Všech náročných náročných náročných náročných můžete generovat takto.
Můžete mít 500 náročných náročných, které nemůžete generovat.
Není to největší, že byste je generovali.
Fyzicky nemůžete je vždy vytvořit.
A pak se zase otázka stává,
je to vám tím důležité?
Kdybyste se těmto nápadem představili více,
co byste s nimi dělali?
Dělali jste něco jako, že máte hru a musíte to
přijít s šanci 10%,
nezáleží.
Nezáleží, protože je to stále uniformní,
takže ty 10% je stále 10%.
To je v pohodě.
Na druhé straně, když to používáte,
aby vytvořil výpady,
máte takovou novou, chlapou
mathovou bibli want to test,
so you want to generate random inputs
to basically fuzz it, because
you can generate all 2 to the power of
64
roughly, doubles, to test it.
So you know you will try to
generate all the different numbers eventually.
That's an issue, because
you skip a lot of possible
numbers that you could
test your function with, right? And it's hard to say without like knowing,
it's hard to say like whether this matters to you without knowing what you then do with the numbers.
I guess cryptography is another area, right?
Anything that has to do with cryptographic
security and stuff like that, where you use random
numbers and you kind of have to rely on certain
mathematical properties.
Yeah, but for those, you usually
just get
all the, like, so many random bits
from your source of strong
randomness, and then just say, yeah, these are my bits.
You know, you don't transform them into floating point numbers.
Yeah.
Hopefully.
Yeah, I think that's more a case of not being able
to reverse engineer it than how uniformly they are distributed.
I mean, the uniform distribution is important,
but it's over bits, not over sub-ranges.
Right, right.
Well, we are randomly running a little bit long,
so we're back into the flow.
But we do need to wrap up.
There's a couple of questions I wanted to ask you
before we do wrap up.
First of all, are you using Catch 2 in your day job?
I actually don't at PAX. And I miss it regularly. Because I became so used to a lot of
little things in Catch that I don't have in gtest. And it's really annoying. You know, like, if you use catch and you pass it some filter that gives you nothing,
then catch will actually return non-zero error code.
I think it's three, but, you know, don't quote me on that.
So if you, like, mess up your CI
and suddenly you don't run any tests, it will fail.
But if you do this with gtest, it's like,
yeah, it's fine.
You know, no tests run, no tests failed. Nebylo testování, nebylo testování, co chceš? Je to v pohodě.
A také se mi zkoušela šafla v káči být stabilní přes návrhy.
Když máte set káči a říká you say, okay, I give you seat 42, now shuffle the tests.
And then you say, okay, I give you still the same seat,
but now shuffle the tests, but don't shuffle in,
I don't know, the uprox tests,
then you will still get the same relative order of tests.
So if you are shuffling your tests randomly
and you run into an error, you can quickly
bisect which order of the tests is the issue.
Because you can remove or add more tests and they will not change the relative order of
the other tests.
So the filtering is done after the shuffling?
No.
It wouldn't work because then you couldn't add in more tests.
This is actually stable even if you write completely new tests.
Okay, interesting.
It's a neat trick. It basically, instead of shuffling, we hash all the information about the test that we have, like name, text, class name, so on.
And then we basically add in the random seed and sort the hashes.
And because like the hash is stable and like, you know, the random seed is fixed, then you
will get like different orders with different random seeds.
But with the same random seed, you will always get the same order of tests compared to each
other.
Oh, there's something about catch I didn't know.
Nice.
So
moving away from
catch and randomness for a
moment then, what else
in the world of C++ are you interested in
at the moment?
I'm actually
looking into Rust
mostly.
I'm kind of
over C++ a lot
because
I've been
working with Rust a bit and
with the exception of having
existing codebase,
there is very little
of C++ that I miss
in Rust.
Yes, technically
Rust doesn't really do
some kinds of
generic city that you
could do in C++.
You can do
variadic templates and so on,
but I don't use those much.
And in return, i get a lot more
safety so you know things that in c++ i would have to think really hard about whether this is
a good idea in the long term whether this will be maintainable i can just be like yeah this is fine
the compiler will tell me if it's not so it sounds like you know if the question is whether somebody
should move away from c++ to rust the answer is going to be different for everybody, right?
Like, we discussed a blog post at the beginning of the episode where somebody said, well, it's not quite worth it for them.
But yeah, the answer is probably going to be different for other people.
So that's really interesting. Like, for example, I wouldn't want to actually move our current main product to Rust
because there is like seven years worth of C++ and we use a lot of C++ dependencies and so on.
So, you know, we would have to find something equivalent like in Rust for every one of our dependencies.
And then we would have to make sure that the outputs are compatible between the Rust and C++ versions pro každou z nášich významů. A pak musíme si určitě uvědomit, že významy jsou kompetentní
mezi rastem a C++ verzi,
protože vám vám promíňujeme
způsob kompetence
a jedna z věcí, které děláme,
je významy video.
A mějte mě,
že mít významy video
významy video s FFmpeg
je těžké.
V této tři týdny jsem se vytvořil významy FFmpeg is hard. I burned the last three weeks
updating FFmpeg.
Not fun.
So this would be painful to move.
But if we were making a new product,
I would say, hey, let's
try Rust.
It will take a bit to learn,
but in the long term it will be nicer
because we already have to take a lot of care
to optimize
our allocations by knowing when to just pass string views and spans and so on.
And that helps with this.
So let's move to that, but I wouldn't want to move to legacy product.
So what about using CPP2 or carbon,
assuming that they are stable?
Maybe.
Assuming they are stable,
it's doing a big lift in there.
Yeah, well, I mean, they're not ready yet,
but just looking ahead,
is that going to be a viable route for someone like you
that thinks, well, Rust
would be better
in your experience, but
getting there is the hard part.
So I'm just interested to
see whether those successful languages
have a role to play there.
I mean, it depends on how the details
shake out.
Right.
We target a lot of different platforms,
so we would so I think
I don't
think Carbon
I actually don't remember.
Well, basically the point is
that we would need to have
the two chains target all the different
platforms that we use, and
for it to be sufficiently stable, and so on,
and so on.
I'll check when I think it's like stable enough and we will see we'll get you back on in a year or two to
see how you got on all right well i think we do need to to wrap up there uh anything else you
want to tell us before we let you go, Martin? Yeah, I think
it's funny that last time I was here
in 2020,
it was also about
K2 and random.
It's definitely not random, you know?
Yeah, it's not a very
uniform distribution.
I'm afraid.
Where can people reach you if they want to
stay in touch
yeah I have twitter
I have email
it's like
basically the main two
places to find me
okay
I think the links will be in the description
they will
yep so we'll put those there.
In which case, thank you very much for coming back on to tell us again about Catch-2 and random numbers and how they've evolved in the meantime.
Yeah.
Thank you so much, Martin.
It was a lot of fun talking about these things, and I hope to have you back on the show again at some point.
And to everybody else, we will be back in two weeks.
So yeah, looking forward to that.
Stay tuned.
Right.
Bye everybody and have a great weekend.
Bye.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in.
Or if you have a suggestion for a guest or topic, we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate it if you can follow CppCast on Twitter or Mastodon.
All those links, as well as the show notes, can be found on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.