CppCast - AI and Random Numbers
Episode Date: September 15, 2023Frances Buontempo joins Phil and returning guest co-host, Matt Godbolt. Frances talks to us about her new book on modern C++ as well as her the topic of her previous book on machine learning. We discu...ss the differences between LLM-based AI and more statistical approaches, as well as where random numbers fit into all this and the limitations of their current support in C++. News CppCon programme announced C++ on Sea videos "Inside STL" - The Old New Thing (August archive) Open source C++ projects that use modern C++ features (Reddit) Links "C++ Bookcamp" (title may change) - Frances' new book "Genetic Algorithms and Machine Learning for Programmers" - Frances' previous book Overload issues (submit articles on the main ACCU site) Frances' paper bag escapology certificate Shannon's mind reading paper ERNIE (Electronic Random Number Indicator Equipment) P2059R0 - "Make Pseudo-random Numbers Portable" (defunct) "Using, Generating and Testing with Pseudo-Random Numbers" - Frances' ACCU 2023 talk "Program your way out of a paper bag" series: "How to program your way out of a paper bag" (slides) "How to Evolve Your Way Out of a Paper Bag" (video) "Diffuse your way out of a paper bag" (video) "How to Evolve Your Way Out of a Paper Bag" (video) "Crowd Your Way Out of a Paper Bag" (video)
Transcript
Discussion (0)
Episode 369 of CppCast with guest Francis Puntempo, recorded 7th September 2023.
This episode is sponsored by Sonar, the home of clean code. In this episode, we talk about the CppCon program and C++ on C videos, inside the STL,
and modern C++ in open source.
Then we're joined by Francis Bintempo. Francis talks to us about modern C++, machine learning,
and random numbers.
Welcome to episode 369 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Phil Nash, joined by our last minute guest co-host,
Matt Godbolt. Matt, how are you doing today? I'm doing very well, thanks, Phil. How are you?
Ah, yeah, I'm okay. I'm okay. We pulled this together a little bit at the last minute because
Timo was meant to be here, but I don't want to go into too much detail. We'll let him fill everyone
in when he gets back. But let's just say he's had a an early paternity situation right so i was gonna say why am i here again i'm i'm
very honored to be here i'm very glad to be back and uh uh it's good to be here but uh yeah um
fingers crossed all's going well with tomorrow and we'll look forward to hearing all about it
when he's back on the show yes absolutely well at. Well, at the top of every episode, we like to read a piece of feedback.
This one's from Reddit user neithermango8264, you know who you are, regarding our last episode with
Abbas Sabra about static analysis. One of the best episodes. The point made at the beginning
resonates powerfully with my experience working on open
source projects. One of the worst feelings is when you spend days tracking down a bug that static
analyzers can easily detect. So I'm pleased about that feedback, both as a CPP cast host and as a
colleague of us and someone who works on static analyzers. Now we did also have a small handful
of people who emailed to say that they found the hidden chapter art
that I sneakily inserted in the last episode.
When I was talking about adding chapter art,
I actually put one in just to see who had noticed,
and I think we had like two or three people.
That was about the fact that you can put things into the view.
I see.
Yeah, even though we said how much hard work it was,
I think that was the episode that I was on that then you responded to saying that.
And now ironically,
I'm here again.
Cool.
These things,
these things come around.
Yeah.
Not a lot of people saw that or at least responded.
Bear in mind,
it's only been a week since the episode went live.
So maybe we'll get a few more.
All right.
Well,
we'd like to hear your thoughts about the show. You can always reach out to us on X, formerly known as Twitter, Mastodon, LinkedIn,
or email us at feedback at cppcast.com. So joining us today is Frances Brentempo.
Hello.
Frances has many years of C++ experience, along with various other languages, including Python
and C Sharp. She's worked as a programmer at various companies, mostly in London, with a focus on finance. She enjoys testing and deleting code
and tries to keep on learning. She's given talks on C++ and more besides, which you can find on
YouTube. She is the editor of ACCU's Overload magazine and will happily consider articles
from anyone listening. Frances, welcome to the show.
Thank you. Nice to be here.
So I noticed that you worked in finance. I did not know that. What did you do in finance,
the finance world?
Deleted some code.
Deleted some code? Why is deleting code so important? What's up with deleting code?
Sometimes you can make things compile quicker if you get rid of loads of bits you aren't using.
Sometimes you can refactor stuff and make it simpler to understand.
My background is in mathematics.
I started life as a secondary school maths teacher.
Oh, my gosh.
That's a very brave profession.
The kids were all right.
I think I was a bit young at the time. I took time out. That's a very brave profession. The kids were all right.
I think I was a bit young at the time.
Trying to keep everyone quiet was difficult because I got really excited about maths.
It made loads of noise.
But anyway, that was a lifetime ago.
I learned to code and program some embedded devices up in Leeds after I finished my degree. Then I took time out and went back to university
and did a PhD in some machine learning stuff,
which meant I knew about things like Monte Carlo simulations
and some stuff that gets used in finance.
I managed to get a job in London.
My aim was to try and end up on one of the quant teams
and do the rocket science math stuff.
I got close, but I kept ending up being asked to
test other people's code so i've left huge amounts of like unit test frameworks that ran really
quickly that found loads of bugs we deleted code and i've just had to do the maths in my spare time
so here we are well we are going to touch on some of those subjects again in the the main interview but
before we get into that so we just got a couple of news articles to talk about so feel free to
comment on on either any of these i review so first of all cpp con announced their program
just recently uh it's all going to be on site this year so no no hybrid conference there's
80 breakout sessions there's a couple of keynotes being announced one
from bianna straustrup a regular there and another one from andre alexandrescu who's going to be
talking about ai so topical and there's going to be three more announced in in the coming days and
weeks uh there's usual things like the back to basics track and various other tracks the committee
fireside chat pre and post conference classes
there's going to be a safety and security panel so again you know quite on topic and
lightning talks are going to be hosted by me this year so um you are warned
and talking to conference news uh c++ on c the videos are starting to be released so i'll put some some
links in the show notes for that and that includes um our very own matt garbolt's talk on what's new
in the pilot explorer uh lightning talks in this case they were hosted by francis our guest today
and uh timo's uh safety and c++ talk uh which includes the results of that survey that he
talked about on some
previous episodes. So it seems like we're keeping everything in-house on all these videos.
I was going to say, there's accusations of nepotism going on here.
Well, they're just the ones I thought were relevant to mention now. There's plenty of
good content that doesn't involve any of us. So do check those out. Now, there was a series of
posts by Raymond Chen onmond chen on his
old new thing blog i mean he writes every few days so that's no surprise but he did this series on
inside stl in august so it's actually a whole load of posts i'm not going to list them all but it
covers everything from pairs vectors strings lists maps and sets including the old unordered
varieties decks arrays and smart pointers.
And for each one, it actually goes deep into the implementation of them,
comparing the three major standard library implementations
or their trade-offs and consequences.
For example, in Microsoft Visual C++,
they use a really tiny block size in decks,
but they probably want to change that,
but they have to wait for an ABI break
to be able to do that.
So all those sort of little things
you don't really think about necessarily
when you're using these types.
It's really good to see
what's actually happening on the other side.
I thought this was a fascinating series of articles
because, yeah, a lot of stuff in the STL
we sort of just take for granted
and you don't realize
that there are still engineering trade-offs
even down in the STL.
I think it was like very pure algorithms and very, very you know like how many ways are there to write a
linked list how you know but apparently there's a lot more to it than that and uh gives you a
certain amount of sympathy for the poor folks who have to uh write the stl and keep it uh
performant for all cases uh and then our final item was something I saw on Reddit. It's actually a question from
somebody who's a university student, wanted to know what open source projects use modern C++
features. And it was mostly for their university projects they're asking, but obviously there's a
practical application to that question. So I think it's quite interesting seeing the responses there.
Lots of libraries mentioned, including some of the usual suspects.
Facebook's Folly, SerenityOS, we've had on the show before.
Some lesser-known libraries as well.
Interestingly, coincidentally, I've just recently been dabbling
in some ideas for something I'm calling Catch-23.
Oh?
Which is not a promise for a new library or anything,
but it is taking some of the ideas
from from the catch testing framework and saying what would that look like if we did them in
thoroughly modern c++ you know using c++ 23 features i've even got my irons in 26 features
but i'm not quite going to go there yet so it's more of a playground at the moment we'll see what
happens in the future you couldn't resist the a new pun a pun based library name that's the real reason you're doing it isn't it
it is i admit i had my eye on this name for some years now and i suddenly realized oh it's actually
here this is an excuse to actually get started on it so it's not actually on github yet but um
maybe i'll uh maybe i'll put it in so listeners watch out for catch
23 yep okay so that's the news license we wanted to discuss today so um we're gonna move over to
our interview with with francis now now francis one of the reasons we have you here is because
you do have a new book coming out soon c++ book camp so let's start with that um now presume the book camp there
is just a play on like boot camp in a book but what is it all about and why did you feel we had
to write it so so one caveat that manning have a series of books called book camp and you're quite
right playing on the idea of boot camp just trying to get you up to speed with things but we are discussing maybe
having a different title that's more precise about what's covered i think we've discounted the idea
of modern c++ because there's several books with that sort of titles and that's not particularly
helpful in the long run naming is hard yes always what i'm trying to do is just give people a short small book that you're possibly reading
maybe a weekend or so which i think matt did manage maybe i did this weekend in fact
this weekend just passed yes and i've just tried to pick some salient things that have changed
since before c++ 11 so i know loads of people who, 20 years ago,
used to be doing C++ full-time,
stepped away and learnt bits of other languages,
and then tried to come back and have just gone,
where do I start?
Because there are loads of books out there
that are really detailed on each of the different standards.
And if you tried to read all of those,
a new standard would have come along by the time you'd finished.
So just try to pick some salient points to help you like get enough knowledge to feel a bit more confidence
to start using some of the newer standards this you'll still need to go away and learn loads more
afterwards but i feel like maybe it's enough to just get you back in the driving seat again if
that's what you want to do
yeah and it allowed me to practice some stuff and i realized there's loads i didn't understand
properly so i've learned loads as well the learning by teaching approach to to learning
right right so yeah i thought that was a really interesting spin on teaching c++ to people who
already have some grounding and background in C++ because
very quickly you introduced a whole bunch of things that were interesting and fun as opposed
to, as you say, if you just read the standard, you're like, good grief, I have no idea. What is
our value something, something? I don't even know why this is important to me. I don't care. I'm
bored. I want to do something else now, right? So what sort of went into the thinking behind that?
How did you come up with the ideas to teach this in the way that you did?
By a lot of head scratching.
I guess, actually, because I did start life as a maths teacher,
even when I was at school, I tended to be the person people turned to when i don't get
this and i go all right and i got quite good at thinking of games or silly little examples that
are quite fun i'm particularly proud of my blobs racing out of a paper bag i mean that's loads of
fun to sit back and watch but if you can find a fun engaging example that's fun to play or look at
afterwards then you start thinking about it more and tinker with it more.
But yeah, trying to come out with examples was really hard.
Quite proud of falling across the mind-reading machine.
That one, I was thinking about statistics and predicting the future and years ago and i wondered if you could
make a game that would be able to play rock paper scissors now we should talk about ai and robotics
later on for that exactly but from the gestures you can guess what people are going to do to an
extent and i found a web page that was just went no let's keep it simpler we'll predict
whether you're doing heads or tails and it kept beating me i'm like what's going on and i had a
link in there to a very in-depth russian paper from years ago which i put some of through google
translate which then pointed me back to shannon's from the 1950s, where he had built a mind reading machine, an actual machine with hardware rather than software.
So I thought, right, let's do this.
And this machine tries to anticipate a human guessing heads or tails.
Yeah.
So as you say, mind reading, and this is what presumably like um and this is foreshadowing and
perhaps but like humans aren't very good at random numbers randomly choosing things and so computers
can go i know your trick you know you did three heads in a row so the next one's almost certainly
going to be tails right yeah it's just tracking some states using very little memory but yeah
so yes lots of having too many tabs open in my browser trying to read the whole internet
scratching my head going for long walks coming out of crazy ideas i mean
yeah basically don't ever look at my collection of tabs that are open on piles of books
none of us would do very well if people saw what tabs we had open like the amounts
cpp reference and stack overflow in my case that people would probably not
let me write code um yeah i've got too many tabs open in my brain so your book's currently in early
access what does that mean so you can buy it now directly from malling and you'll get an ebook and
you can join they've got what they call a live book page where you can read it,
but you can leave me comments or just call out bits you'd like further explanations
because we haven't hit the printing stage.
I have now written all nine chapters, so the final chapter will be up shortly.
And we're trying to get ready for a production run this month,
so I'm waiting for some final feedback.
There will be a physical book.
Oh, yes.
A real actual book.
Maybe from next month, depending.
So real soon now.
Got it.
So you kind of like, you incrementally buy the book
and then you get it ahead early while it's still being made,
and then you get a chance to comment on it
before you get the final physical copy in your hands.
Yeah, and it helped me make it better
and call out any nits you might have seen
whilst you were reading it because there are some.
But I mean, that's always good, having early feedback and things.
And yeah, I do look at the live book version every now and again.
People have left some really helpful comments
where I don't understand this, and I've tried to explain better.
So I know a lot of our listeners are really great
at finding little faults in things,
so all you have to do is buy the book, the daily access,
find all those nitpicks, and send them on.
Now, you do have another book that you wrote a few years ago now
called, I've got the full title here, Genetic Algorithms and Machine Learning for Programmers, colon, Create AI Models and Evolve Solutions.
So capturing that AI craze, but that was written in 2019.
So long before the current MLBM craze.
How do these things compare?
How have things moved on in the meantime well i i deliberately when i
wrote that book steered clear of neural networks and things that you might need a little bit more
maths for i was trying to just find some simple examples to just allow people to get the basic
idea of how well at least how i think loads of ai models work which is you loop around something and you
go left a bit right a bit you tinker a bit and there's usually some randomness involved in there
so i found several examples like from genetic algorithms to particles swarm optimization
and more besides which i'd given talks always called like diffuse your way out of a paper bag
or evolve your way out of a paper bag so those are on youtube i'll hit a point where i've got
about five or six talks i've given at conferences that's got to be enough material for a book so
then i had to find some more things to learn about pull them together so even at the time, I didn't go into neural networks.
There are plenty of other people who've written books about that kind of thing.
And you do need to be able to follow some calculus to understand what's happening there,
which isn't hard, but that's a busy space.
There are plenty of good books out there about that.
Since then, we've seen the large language models really kick off.
And even things with image recognition stuff, some of it's because there's more processing power going on and people are learning to do CUDA and things like that.
Because there's a lot of grunt work needed.
If you start looking at the amount of calculations that are happening, it is absolutely mind blowing.
Someone somewhere has got a huge
electricity bill for chat gpt and what are we doing to the planet but maybe that's a different
discussion for another day yeah i mean definitely the thing of the last several months has been
all about various different chat app type things so talking about all those calculations and
computations you just mentioned is that
mostly during the the training phase or every time you submit a new part of a query it has to do
a whole load of computation again i'm not an expert but i i would stake money on it being the
the training's the worst bit but but I'm not sure. Okay.
So when I think of machine learning or AI type stuff,
that is what I think.
I think about neural nets and things only because that's what I've seen in popular literature.
So what kind of machine learning is not neural networks?
Okay.
I mean, if you look at a lot of data scientist roles and things you blur the lines between
statistics and ai a lot of it gets rudely called curve fitting or so just regression stuff and
that's fitting like y equals mc squared line through some points which you might have done
when you're 14 15 right shoe size and height yeah of prediction but but if
that model works that model works but then there's other there's another space of doing building some
simulations of things so i talked the other year about how to crowd your way out of a paper bag
i i i misappropriate is this a problem in your life, being stuck in paper bags?
It's just become a little bit of a theme,
and it's quite a nice way of having a little visualisation
of what's going on to see what's happening.
I did try a hardware version once,
but it turns out I'm not very good at soldering.
But that's another story.
So, like, the cellular automata,
I don't know if you've come across things like Conway's Game of Life.
And that kind of thing. we've got some simple rules and things just move or materialize and then dematerialize
but you can actually use that to simulate people moving through space and p and that kind of thing
gets used to model the best routes for far escapes through buildings and that kind of thing. And that's not a neural network at all.
You just have some rules that say,
if you get near someone else, pause so you don't stand on them
and try and go to the nearest exit.
And then you just simulate and see what happens.
And you can do this kind of thing if you're modeling a spread of diseases.
So you do things things like monte carlo
simulations there which is just doing some random stuff inside a model and see what happens and that
the for example the r number and the coven modeling was driving the randomness that was
happening and then you can predict what might happen if your model's right obviously if your
model's wrong and your R number's wrong,
it's not going to be accurate,
but that's why you then measure and say,
is this close or not?
Is it AI or machine learning?
Not quite.
It's kind of stats, but no neural networks involved.
This is a much bigger space than people realise.
I feel like the folks who are with the LLMs
have kind of cornered the idea
of what constitutes AI and machine learning, whereas, as you say,
statistics is kind of the underpinning of the whole thing
and fitting a line through a multidimensional space, presumably,
rather than just your X and Y, is what most of these things are doing.
That's really interesting.
Yeah, some of it.
There's other things you can do i mean
the phd i did a lot of that was focused on what's called decision trees you give
the thing some data and it spews out basically a flow chart for you i mean i was supposed to
be investigating whether chemicals are toxic or not so has it got a benzene ring in it yes or no not by not by consuming them oh no by by by i think we all did
that as students well i couldn't possibly comment i did have the odd pint of guinness
you there you instead of getting a neural network that just comes out toxic or not toxic
it shows you the rules it's using so that that means you've got slightly comprehensible model and
there's a whole area of research about trying to comprehend or understand what neural networks are
actually up to to see whether it makes any sense in terms of the domain you're using it in but
your chat gpt is like a big black box and if if someone said, why did it give me this answer?
No one can answer it.
But if you have a tree that says, well, if X is greater than 10,
then if Y is more than 300, it's, yeah, got it.
Reverse engineer that out by curve fitting.
I have played around with trying to generate some rules
from some simple feedforward neural networks.
And you can actually do that to a point.
Trying to do it with these really deep neural networks, I think that would be too difficult.
But it's an active research area and it's absolutely fascinating.
And I'm going to have more tabs open in my browser if I'm not very careful.
I'm going to sit on my hands.
So, yeah, you mentioned the paper bag thing uh where so where is that comfortable how how has that become uh a thing
for you why why the paper bag i i think the first time i did this i gave my first talk at the acu
conference a very long time ago now and i wanted to just give a a bit of an
overview of what some kinds of machine learning were partly to remind myself how some of this
work because i'd done my phd a while ago and i was kind of missing it it was interesting
so i played around with a couple of simple examples but i wanted something visual to show during the talk right so i i came
out i think i used ankoly optimization there where you just have some ants trading a path
round and i went well okay oh no but if if we give them a challenge of when you're somewhere
in the bag can you get out of the bag or not and then the title was can you code your way out of a
paper bag and i took i printed off
a certificate and i got members of the audience to sign it if they thought i'd done well enough
and that's up on my wall just around the corner and if i find you the link at some point to the
conference it's got on the schedule link to my bio link to the slides and link to the picture
of my certificate and once i've done that
it just became a bit of a gag you know that's amazing so you're a certified paper bag uh
and so software not hardware as explained so right right and presumably again that's all in
in c++ was what you were doing or was with other? No, it's been, so the book had a bit of JavaScript
and a bit of Python, which some people hated.
They went, well, if you're doing machine learning,
you should do everything in Python.
I actually maliciously did some of it in JavaScript
because why not?
And I do know one or two younger people
who've picked up the book, like still school-aged kids
who don't know how to set up a C++ compiler,
aren't quite sure how Python works.
But the JavaScript stuff, you just chuck it on your laptop or whatever,
and then you can play with it without the ramp up.
So, like I said, some people didn't like that, but hey.
No, absolutely.
No, I mean, making it accessible and teachable is more important at some level.
Obviously, I think if you're then going to write this stuff in production,
which I don't think your ant colony is necessarily production code,
it'll end up being C++.
And nowadays, as you mentioned earlier, like CUDA and things like that
and what you need to do in order to do the millions and billions of calculations
to actually train a system or even fit stuff like a decent amount of data uh
it needs to be performant enough um yeah yeah i have seen most of those talks not all of them
they are very fun and accessible so i'm going to put links to them all in the in the show notes so
encourage people to go off and watch some of those and that will be your gateway into the world of ML.
Is there anything else you want to say about that book in particular and just ML in general?
I could say hundreds of things.
Just listen to some of my talks, have a play with some of these ideas.
Some of this machine learning stuff,
you can actually code up from scratch yourself.
It's not all hardcore maths.
Just find something fun and play with it and that's all one of the things in your in your book the
example the the mind reading example i thought was fascinating because it is such a relatively
small piece of code to get the effects that you get and also from my own sort of like performance
based background i'm thinking is this how the branch predictor works inside the
code it's the similar kind of statistics being kept patterns are being found in your code is a
branch taken is it not taken so i was like oh i wonder if i can use this to model uh the branch
predictor and so yeah it's it was a it was a it was a cool bit of code to see awesome all right
well we're gonna continue the conversation at the moment.
Hold that thought because we're going to have a little sponsor break. As we said before, this episode is sponsored by Sonar, the home of clean code. So Sonar Lint is a free plugin for
your IDE. It helps you to find and fix bugs and security issues from the moment you start writing
code. And you can also add Sonar Cube and Sonar Cloud to extend your CI CD pipeline and enable
your whole team to deliver clean code consistently
and efficiently on every check-in or pull request.
SonarCloud is completely free for open source projects
and integrates with all of the main cloud DevOps platforms.
So back to our interview with Francis.
Now, one thing that did come up in both of your books,
and you did mention it in connection with ML a few times, actually, is the subject of random numbers.
And this is something else we wanted to talk about today.
It turns out there's a lot to this.
First of all, how does randomness relate to ML and AI?
And what do you think of the current state of random number support in C++?
So a bit of a two-parter there.
Two huge questions on a on a
real simple level in terms of ai if you're trying to solve a problem you could brute force it
and that that becomes really self-evident with say genetic algorithms um someone's designed a genetic algorithm framework to do um seat
planning for weddings now i don't know if you've ever tried doing that yourself
oh my goodness so you've got some constraints of these two cannot sit next to each other these do
have to sit next to each other i've got this many tables and if you were to brute force it list every possible combination you hit millions
really quickly with very few guests and your brain explodes and you just go i don't know i'm
going to pay a wedding planner to do it for me perhaps or rather than brute forcing everything
you try a few random initial points and go well that one's rubbish that one's rubbish
and that one's okay here but we've got this other bit that needs improving on so you start some
random stuff and then you have your better ones you maybe nudge left a
bit right a bit tinker a bit but what do you choose next well you randomly try something
rinse and repeat and then if you've got the objective function right and you use the randomness
to steer towards something better and us as humans need to be
involved in this modeling and define better then it might come out with an acceptable solution
so basically instead of trying everything it tries a few things randomly but we haven't defined
random and that is a big topic and then the second part of that question was
about the current state of random number support in c++ which is a big topic i think right so c++
11 introduced all the random engines and distributions and there's been a few little
tinkers and there's several different engines but that was all c++
11 and started to see other things introduced in the algorithms like shuffles and stuff like that
but nothing fundamental a lot of people by default use the mcsen twister but it's got loads of state so that makes it pretty much impossible to do cuda stuff with
because it's just too much to pipeline backwards and forwards i've seen a proposal which i don't
have the link to to hand i mentioned it in my acu talk this year to come out something with much
less state so as you can do the gpu stuff much more easily and make things more efficient now i i
don't know if that's proposal still being actively worked on and i have noticed in i think some of
the python numeric libraries they've moved away from using the mosen twister because it's got all
that extra state and are using some newer ideas so i've seen i think we we have a really good foundation for c++ 11
but we haven't seen anything radical change since so yeah we'll see what happens
call me old-fashioned but what's wrong with calling old c's rand function it's absolutely fine. But if you want to, for example, roll a dice
and make sure you get the numbers one to six
with equal probability,
you'll probably get it wrong the first time you do it
because you need to watch out for...
You mean you just modulus it with six, right?
Well, it'd be close,
but I'm not playing dice or cars with you mate because you're
cheating obviously you we know this don't we six doesn't go into max int equal number of times so
some your six would be slightly less likely than the others right don't do that so there are there
are ways and means of i mean i i also know that the brand at least some of the
computers i've checked the rand rand function only returns like a value between zero and six
five five three five or something like that anyway so yeah it's it's not really very hasn't got much
range in the first place so um but you mentioned mercenar twister has a bunch of states so um
and there are newer algorithms and and so what is the problem with the state then with with something like CUDA?
Is it is it just the computation for a number if you're going to be doing thousands of them in a row or.
Right. So the basic random number generators like C, a linear congruential ones, you start with a number,
you multiply it by something and do and then do a modulus calculation so it looks like it's
wanging backwards and forwards but you've just got to stay with one number there maybe that's it
but if you don't get the thing you're doing the modulus right or the things you're multiplying
right you'll end up cycling really quickly and also once you hit a number you've seen before
you know what's going to come next
that is the site right yeah but hidden state when the same twister goes well are you some
some bits from the last number some bits from the number before some bits from another number some
bits from another number i can't remember the size but it's several bytes of state which you can't shovel onto your gpu because there won't be enough space
necessarily or if you do that you haven't got space for the things you want to actually use
the gpu for so got it so a sort of more somewhere in between the the full complexity of the musen
twister and not just having the last number and just multiply it by 13
and dividing by modulacy with 255 or whatever the heck you did before.
But I see. Got it.
And I was reading another problem with the Moussain Twister state
is that because it is so big,
if you're seeding it with a 32-bit or even a 64-bit integer,
the difference is so big that given the first few numbers,
you should be able to reverse engineer what the seed was.
Oh, I mean, obviously there's cryptographically secure
random number generators, which is one thing,
and the MSN Twister is not trying to be that.
But even so, if you can guess with a couple, then you maybe…
Or possibly even one.
Yeah, all the pseudo-random numbers are not cryptographically secure that's an entire other
space that i know very little about but simple level a lot of these pseudo random numbers
tend to just do some multiplying and modulus the cryptographic ones start doing crazy stuff like
raising to the power of two and then doing some other stuff as well and then that
just gets numerically much harder to figure out what's going on so you mentioned the difference
between a true random number generator and a pseudo random number generator i have not said
the word true at any point i don't think right first there's no such thing as a random number
listen to my accuTalk from earlier.
I'm sure there was an XKCD that proved otherwise.
Well, they claim the number 42 is random, but you can't have a random number.
You can have a sequence of things that seems a bit arbitrary
that it's quite hard to spot the pattern, right?
You can't generate with software
i was going to say hardware software trying to generate something that's random or arbitrary or
is almost impossible you can do things that are hard to figure out what's coming up next
that which means we can have fun with coding your way out of a paper bag or writing a writing
reading machine or whatever that's fun you can try things with hardware so i i guess was it ernie
was doing the numbers for the premium bombs right that was a big machine that just randomly dropped
like ping pong balls down tubes and stuff which for non-uk residents was the sort of government saving
scheme stroke lottery kind of thing that that we had in the uk um is it still going maybe it is
i i think they still do premium bonds i don't know if they're still using the hardware machine
that was there but there's another thing if i toss a coin it might come up heads or tails is that
random or if I knew
the precise initial conditions
which is why it would be nice if T. Moore
was here as well perhaps
opinions another day
if I knew the precise initial conditions
in theory
our physics is good enough to work out
what's going to happen
so is anything random.
We've got to philosophy pretty quickly.
That's a bit deep.
Is it a deterministic universe or not?
Or does quantum get us out of jail in this?
Did I mention my undergraduate degree was in maths and philosophy?
So you walk right into this one.
So what is the answer then tell us
i i don't know i i think this this has some very profound questions about how you look at the
universe i've been trying to think and speak without using the word random for the last year
and that's been really interesting because it's made me start thinking about how I think about things.
So, I mean, this whole train of thought kicked off by me rereading Ian Stewart's Does God Play Dice book, which is about chaos, partly. his response to the idea of nuclear decay and it appearing to randomly like decay down
you couldn't predict when you could look at you could state a half-life so you've got a property
but you don't know exactly when something is going to happen and einstein's opinion as far
as i understand is that there must be something more precise going on here.
Just having this statistical model wasn't proper physics.
But most people nowadays go, oh, well, he's wrong.
And it is like that.
I don't know.
That's why we needed Timo on.
I know.
He's probably having that experience of shouting at the podcast right now.
Yes, I'm sure. Poor Timo. Timur on. I know. He's probably having that experience of shouting at the podcast right now. Yes.
I'm sure.
Poor Timur.
Yeah, he did say he was quite excited to talk about that.
So maybe we'll do a follow-up at some point.
But for practical purposes, though, we can sort of make a distinction between the pseudo-random number generators that we have to purely encode,
and some which has some external source of entropy.
Usually we put those two things together to some degree,
but do you have anything else to say about that?
All I can say at that point is that, I mean,
obviously you can use a random device to see the pseudo-random numbers,
and that's using the entropy on your file system.
And it will lead to unexpected results let's say unexpected or hard to predict
and then you can have fun writing a game or something and i think that is great so what's
the limits of that then so i mean for games it's going to be fine but we've talked about
cryptographically secure the randomness but is it enough for that yeah again i'm not an expert
on the cryptographically secure stuff. And certainly using some hardware entropy would make things harder to predict.
So, yeah, you need an expert on about the cryptographic stuff or give me time to read up more on it.
I have seen other situations.
I mean, that's something that's also not there in modern C++ easily, if you're doing some monte carlo simulations you
might want to tweak things to make sure you get as many things above average as below average so
if you go plus one you want to go minus one as well and it's hard to adapt some of these to what are called quasi-pseudo random numbers.
One of the finance places I worked at in London,
I was on a risk team and they were running Monte Carlo simulations.
Now, you need to be able to report to the financial conduct authority how you've come out of the numbers you've come out with.
So you can't just say, here's the seed to my random number,
because, well, this is a discussion we probably need to do next.
That doesn't necessarily mean you can regenerate all the numbers you need.
So part of the overnight batch process was generating enough random numbers for everything,
doing that thing I said of, if I go above average by this much,
I want to go below average by this much.
So I've got that symmetric distribution to be fair and we generate these massive files of random numbers and once in
a while we get a more complicated model come in and one of the quants the rocket scientists would
go we've run out of numbers and the number of times i heard a mathematician go, I've run out of numbers. We've put a purchase order in to get some more.
We'll be back with you in a few weeks.
But yeah, that was hard, just trying to get the coverage and things and not hit a point where you start repeating.
And even on a send twister, you can nudge towards things start repeating.
If you're running a million simulations per each financial instrument
and then you bump the interest rate by one percent as you do it again and see what happens you're
like but now it's repeating itself so what are we gonna do oh yeah gosh and i bet those those giant
um those huge files of numbers um presumably you could just you know you, you GZip them to make them a bit smaller.
That was the domain of the database people.
Well, there's a thing.
And of course, if you start GZipping things,
you can see how compressed,
if they don't compress very well,
then they're really random.
If it compresses a lot, then you're in trouble.
You know, your random number generator is rubbish.
Yes, nought, nought, nought.
Yes, it is actually,
that's actually quite a serious thing that some people do sometimes,
just as a ballpark figure.
Is it random or not?
If I compress it and it gets slightly bigger,
we're probably in a good spot.
I always sort my random numbers before I compress them.
It's much better.
Deduplicate. You touched on one subject there that i did want to get onto so maybe we'll come to that now which
is uh portable random seeds which is uh something of interest to me so i'll explain my use case
first so i used to maintain or originally wrote the the catch and then catch two test frameworks
and one of the
things that does is randomize test runs the idea being you want you want to you know one test to
run before another one time and then after another time so you don't get dependencies between tests
or if you do you find out sooner rather than later and there's a few other places that randomness
comes into the actual running of tests and and even test data, which we might come onto as well in a bit.
So there's randomness even with unit testing,
but you also want to be able to control that.
So we print out the random Cs, or at least we did,
so that you can then reproduce this random run if you did actually find a problem.
Trouble is half the time, or a lot of the time,
you originally get the problem is running on a build server.
And then you try to reproduce it locally.
And it turns out, at least the way random number generation is, or not generation is what we get onto,
is specified in the current C++ standard.
It's not portable, which means that that random C is going to give you different results,
potentially on different machines or other variations in environment. And there was a paper a few years ago from Martin Aronofsky,
who now maintains Catch2, for this reason,
to make the random number distribution in C++ portable,
which unfortunately didn't make it through.
But what's the solution to this?
How big a problem is it?
I don't know.
It'll be interesting to see what happens.
A lot of people
assume if I tell you the seed,
then you can reproduce this.
If you
reuse the same computer and you
don't upgrade anything in between,
maybe you'll be alright.
Yeah,
that's surprisingly difficult.
But I guess if implementers are free to
implement things however they want as long as it gives you a uniform distribution so you're not
cheating when you roll your dice then we aren't specifying how the seeds are actually working
yeah i don't i don't know what solution is i shall sit back and watch and learn more but this is not
a problem that's inherent in random number generators in general specific to the implementation
exactly yeah i mean if you did the maths yourself with pencil and paper then yeah it will work out
but you haven't seen my ability of maths it would be different every time in theory or maybe get a sir ai to do it for you
yeah i mean it's not mathematically impossible but i guess if implementations are doing slightly
different things then and it's not specified how then there we go right and my understanding is
that most of that unspent lack of specification is
in the distribution stuff rather than the seeds and the randomness itself because i think we can
all see that like you multiply by 372 and divide by whatever that's going to get you the same answer
but then the thing with the getting it in a fair dice roll out of it is like well how how do you
modulus by six in a fair way you need to
keep some state from the previous time and then mix it in the next time to make sure that the
distributions are are still over time the same and then i think that's the unspecified underspecified
part so that the you know implementers can choose to do it whichever way they like and maybe there's
floating point precision inside some of these things who knows and and unspecified things it's
yeah yeah i don't know the details i mean again mathematically the distribution
and a cpp reference for all the distributions tells you the maths that it should be doing
right to spew the numbers out yeah i'm not now we're talking about it i'm not sure precisely
where the differences would fall out.
I'm going to be spending an afternoon thinking and reading now.
More tabs.
More tabs open.
Sorry.
That's all right.
We threw around the terms generation and distribution.
But what is the difference then?
What does each part mean?
So C's round, as we said, we get between norton and whatever so you just get a number
generated a random number but if you viewers can't the listeners sorry can't see the air quotes the
the little pause between each word is air quit yeah thank you for that so but if i want to end up with a dice roll so a number between one
and six i then need to like to use a technical term splurge out or bucket up the numbers
and it's that splurging or bucketing function that gives you the distribution distribution so uniform each number between a lower and upper bounds equally likely if i want
a normal curve or a bell curve i want things to be bunched up so i've got more of them around the
mean in the middle and fewer things on the extremes and then there are all kinds of other distributions
but it's just splurging smearing out the numbers or bucketing
them that's all right interesting and that's the part that doesn't seem to be portable between
machines do you know why that is is it to do with the architecture or no idea gonna find out and
report back okay you heard it here first and i mentioned testing will come up again i know this is
something you've um we were talking about just before the show is uh property-based testing
which is another area where random numbers and testing collide uh what can you tell us about
property-based testing and other forms of randomness in testing. Well, something that I pulled on a bit in the first book,
the genetic algorithms one,
was if you're doing random stuff,
there goes the air quotes again.
Sorry, listeners.
If you're writing code that's doing some random stuff,
now, whether that's larking about coding your way out of a paper bag
or doing some serious stuff like COVID modelling
or some finance stuff,
how do you test what you've done?
Now, you can mock out the random numbers and just returning zero instead flushes out all kinds of problems.
I could tell you all sorts of stories there.
You wouldn't believe it.
But that's not sufficient.
That's enough to find some silly edge cases how do you know
that you're actually doing coming out of the right kind of maths if something different happens every
time well if you're simulating stuff and you you've got an r number for some covid modeling
you ought to be able to recover that r number from the results you get. So you can say, if I do this model and send in
R of 1.5, then all the results I get out should have an R value of 1.5.
And then a framework can try several simulations for you and see whether that happens or not.
Or you might say the average height of the people in this modeling scenario be this does that happen
or i only get outliers one time out of a thousand so you can start looking at averages stats and
properties around things and that will flush out some cases you've missed in many cases i've
repeated myself you know what i mean so when i deal with those R values, I never give them
a name to make sure the R value is
actually an R value.
Am I missing a pun there?
There's a joke in there.
I think it's too early in the morning for me
to work that one out.
Thank you, Phil. I'm sure
I've missed the joke there about it catch up on the
replay i will so yeah property-based testing is something that i have a big interest in as well
so i've tried to build at least some of that into into catch too but there are separate property-based
testing frameworks available for C++.
So not just limited to more sort of statistical types of things where you want to check distributions,
but anything where you just want to test a property of something
that always holds regardless of the inputs you give it.
Property-based testing is a really good way to go there.
And that involves random numbers,
which gets some people nervous when they think unit tests have to be reproducible.
That's the thing, it's not a unit test,
it's a property-based test.
Yeah, I mean, so when I've done some property-based testing,
it sometimes flushes out some edge cases.
So if something fails, you go,
but then you can use that as a unit test
to pin down some bad behavior that's going on.
I guess an extension of the property-based testing
is fuzzers to just blam your code with all kinds of nonsense and see if it falls over or not
and that is that's still flushing out cases that you need to be aware of test for and maybe defend
against and that's using randomness as well.
And that's exactly why the portable random numbers would be so useful,
because at some point you do need to be able to reproduce
a particular run, which gets difficult otherwise.
But as you say, just pinning it down to a specific unit test
gets you a long way there.
Yeah.
I actually find that you can have a bit of a dance
between unit tests and property-based tests.
So sometimes you can write some unit tests
and think, well, these are quite good.
And in fact, they're not specific to these examples.
I can extract a property from that.
You write that as a property-based test.
And if at some point it then finds an issue
that you hadn't thought of,
one of those edge cases,
then you can capture a unit test back out of that
and you go around the loop. Yeah. quite good to think of it that way but it all depends on
good random numbers yeah so yeah coming back to the the c++ uh book camp book which that's hard
to say you say there might be changing the name book yeah it's tricky isn't it yeah but and we kind of alluded to this at the beginning but you know
we talked about you know modern c++ and why you didn't call it modern c++ and obviously
the thing about modern is it's out of date before you finish the sentence yes uh but what what kind
of things are you excited uh for coming up in the future so the book is c++ 20 right but is that or is it 23 is tiny tiny tiny bit of 23
because there wasn't 23 when i started writing last september and now this september there is
so there's little bits so a little bit of c++ 23 but is there other things that you're looking
forward to uh maybe in a future revision of the book or a new book for 23 or 26 ideas in the future what
what are you looking forward to i'm spending my time at the moment looking backwards trying to
go through the corrections on things to be honest i do it would be interesting to see if we get
beyond the standard generate for the co-routines so i can do things are doing python
quite easily like you and stuff and do it off somewhere else we'll see if that grows or not
and also there are loads of little bits with um ranges views that i've not played with in anger
those are more syntactic sugar or nice ways of doing things that i've seen more and more of
that happening and i i've only used a small amount of some ranges and views in the book just as but
the whole book was just little flavors of things to get people started but yeah interesting to see
where co-routines does end up but that's a big topic oh yeah i thought it was it was very brave
of you to to even bring them up in the book, right?
Co-routines are the thing that I haven't touched myself yet.
It's the thing that I found most interesting when I was reading through it at the weekend
because it's a huge topic, as you say, and most people, well, I speak for myself,
have gone, oh, that looks complicated.
La, la, la.
Don't forget about it.
Just don't even think about it.
But you have a pretty comprehensive example, as you say,
a flavor of all the bits and pieces you need in order to actually use co-routines.
It was entirely gratuitous.
But after I'd introduced you to the mind-reading machine,
I thought, might as well put it in a co-routine.
It doesn't bring any benefit to the mind-reader at all.
So I've had several goes at trying to write co-routines
i'd really like to thank phil nash for doing the workshop at c++ on c that helped me realize some
bits i'd missed and make that part of that chapter better i think that that was a really useful
workshop that was actually matthias po Puce's material that I presented.
And I saw you cramming the night before from a restaurant table just opposite us,
trying to get your head straight.
So thanks for stepping in and doing that.
That was so useful.
Yeah, I think there's two types of people when it comes to co-routines.
There's those that look at it and think, oh, that's too complicated.
I'll learn it later.
And then those that look at it and say, that's too complicated.
I should do a talk on it. Or oh that's too complicated i'll learn it later and then those that look at it and say that's too complicated i should do a talk on it or somebody else's workshop indeed
um okay so um we should start wrapping up um is there anything else you want to to tell us before
we let you go francis well just thanks for having me on. If anyone can define random properly, then like find me on X or Mastodon or something
and let me know what you think.
And maybe that could be Tima.
Yes, looking forward to it.
So where can people reach you
if they want to let you know what randomness is
or reach out to you for anything else?
So I'm on what used to be called Twitter.
I'm on Mastodon and I'm linkedin and i shared some links with phil so i guess you can put them up somewhere they will be
in the show just google me because you'll probably find me because i've got quite a unique surname
yes which will also be in the show notes as part of the title. Well, thanks very much for coming on the show today
and being a random guest.
It's been great to have you on the show, Francis.
It's been really interesting talking about random stuff.
And I'd never really thought about randomness
in terms of testing.
So I've learned something today,
both from you, Phil, and you, Francis.
So that's been very interesting.
And thank you for having me as a co-host,
guest co-host again. It's been fun interesting. And thank you for having me as a co-host, guest co-host.
Again, it's been fun to be here.
Sending our best wishes to Timur, obviously.
So thanks for stepping in at short notice, Matt.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in.
Or if you have a suggestion for a guest or topic,
we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate it if you can follow CppCast on Twitter or Mastodon.
You can also follow me and Phil individually on Twitter or Mastodon.
All those links, as well as the show notes,
can be found on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.