CppCast - AI and Random Numbers

Starting point is 00:00:00 Episode 369 of CppCast with guest Francis Puntempo, recorded 7th September 2023. This episode is sponsored by Sonar, the home of clean code. In this episode, we talk about the CppCon program and C++ on C videos, inside the STL, and modern C++ in open source. Then we're joined by Francis Bintempo. Francis talks to us about modern C++, machine learning, and random numbers. Welcome to episode 369 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Phil Nash, joined by our last minute guest co-host, Matt Godbolt. Matt, how are you doing today? I'm doing very well, thanks, Phil. How are you?

Starting point is 00:01:12 Ah, yeah, I'm okay. I'm okay. We pulled this together a little bit at the last minute because Timo was meant to be here, but I don't want to go into too much detail. We'll let him fill everyone in when he gets back. But let's just say he's had a an early paternity situation right so i was gonna say why am i here again i'm i'm very honored to be here i'm very glad to be back and uh uh it's good to be here but uh yeah um fingers crossed all's going well with tomorrow and we'll look forward to hearing all about it when he's back on the show yes absolutely well at. Well, at the top of every episode, we like to read a piece of feedback. This one's from Reddit user neithermango8264, you know who you are, regarding our last episode with Abbas Sabra about static analysis. One of the best episodes. The point made at the beginning

Starting point is 00:02:02 resonates powerfully with my experience working on open source projects. One of the worst feelings is when you spend days tracking down a bug that static analyzers can easily detect. So I'm pleased about that feedback, both as a CPP cast host and as a colleague of us and someone who works on static analyzers. Now we did also have a small handful of people who emailed to say that they found the hidden chapter art that I sneakily inserted in the last episode. When I was talking about adding chapter art, I actually put one in just to see who had noticed,

Starting point is 00:02:35 and I think we had like two or three people. That was about the fact that you can put things into the view. I see. Yeah, even though we said how much hard work it was, I think that was the episode that I was on that then you responded to saying that. And now ironically, I'm here again. Cool.

Starting point is 00:02:52 These things, these things come around. Yeah. Not a lot of people saw that or at least responded. Bear in mind, it's only been a week since the episode went live. So maybe we'll get a few more. All right.

Starting point is 00:03:03 Well, we'd like to hear your thoughts about the show. You can always reach out to us on X, formerly known as Twitter, Mastodon, LinkedIn, or email us at feedback at cppcast.com. So joining us today is Frances Brentempo. Hello. Frances has many years of C++ experience, along with various other languages, including Python and C Sharp. She's worked as a programmer at various companies, mostly in London, with a focus on finance. She enjoys testing and deleting code and tries to keep on learning. She's given talks on C++ and more besides, which you can find on YouTube. She is the editor of ACCU's Overload magazine and will happily consider articles

Starting point is 00:03:42 from anyone listening. Frances, welcome to the show. Thank you. Nice to be here. So I noticed that you worked in finance. I did not know that. What did you do in finance, the finance world? Deleted some code. Deleted some code? Why is deleting code so important? What's up with deleting code? Sometimes you can make things compile quicker if you get rid of loads of bits you aren't using. Sometimes you can refactor stuff and make it simpler to understand.

Starting point is 00:04:12 My background is in mathematics. I started life as a secondary school maths teacher. Oh, my gosh. That's a very brave profession. The kids were all right. I think I was a bit young at the time. I took time out. That's a very brave profession. The kids were all right. I think I was a bit young at the time. Trying to keep everyone quiet was difficult because I got really excited about maths.

Starting point is 00:04:32 It made loads of noise. But anyway, that was a lifetime ago. I learned to code and program some embedded devices up in Leeds after I finished my degree. Then I took time out and went back to university and did a PhD in some machine learning stuff, which meant I knew about things like Monte Carlo simulations and some stuff that gets used in finance. I managed to get a job in London. My aim was to try and end up on one of the quant teams

Starting point is 00:05:00 and do the rocket science math stuff. I got close, but I kept ending up being asked to test other people's code so i've left huge amounts of like unit test frameworks that ran really quickly that found loads of bugs we deleted code and i've just had to do the maths in my spare time so here we are well we are going to touch on some of those subjects again in the the main interview but before we get into that so we just got a couple of news articles to talk about so feel free to comment on on either any of these i review so first of all cpp con announced their program just recently uh it's all going to be on site this year so no no hybrid conference there's

Starting point is 00:05:41 80 breakout sessions there's a couple of keynotes being announced one from bianna straustrup a regular there and another one from andre alexandrescu who's going to be talking about ai so topical and there's going to be three more announced in in the coming days and weeks uh there's usual things like the back to basics track and various other tracks the committee fireside chat pre and post conference classes there's going to be a safety and security panel so again you know quite on topic and lightning talks are going to be hosted by me this year so um you are warned and talking to conference news uh c++ on c the videos are starting to be released so i'll put some some

Starting point is 00:06:25 links in the show notes for that and that includes um our very own matt garbolt's talk on what's new in the pilot explorer uh lightning talks in this case they were hosted by francis our guest today and uh timo's uh safety and c++ talk uh which includes the results of that survey that he talked about on some previous episodes. So it seems like we're keeping everything in-house on all these videos. I was going to say, there's accusations of nepotism going on here. Well, they're just the ones I thought were relevant to mention now. There's plenty of good content that doesn't involve any of us. So do check those out. Now, there was a series of

Starting point is 00:07:04 posts by Raymond Chen onmond chen on his old new thing blog i mean he writes every few days so that's no surprise but he did this series on inside stl in august so it's actually a whole load of posts i'm not going to list them all but it covers everything from pairs vectors strings lists maps and sets including the old unordered varieties decks arrays and smart pointers. And for each one, it actually goes deep into the implementation of them, comparing the three major standard library implementations or their trade-offs and consequences.

Starting point is 00:07:36 For example, in Microsoft Visual C++, they use a really tiny block size in decks, but they probably want to change that, but they have to wait for an ABI break to be able to do that. So all those sort of little things you don't really think about necessarily when you're using these types.

Starting point is 00:07:52 It's really good to see what's actually happening on the other side. I thought this was a fascinating series of articles because, yeah, a lot of stuff in the STL we sort of just take for granted and you don't realize that there are still engineering trade-offs even down in the STL.

Starting point is 00:08:04 I think it was like very pure algorithms and very, very you know like how many ways are there to write a linked list how you know but apparently there's a lot more to it than that and uh gives you a certain amount of sympathy for the poor folks who have to uh write the stl and keep it uh performant for all cases uh and then our final item was something I saw on Reddit. It's actually a question from somebody who's a university student, wanted to know what open source projects use modern C++ features. And it was mostly for their university projects they're asking, but obviously there's a practical application to that question. So I think it's quite interesting seeing the responses there. Lots of libraries mentioned, including some of the usual suspects.

Starting point is 00:08:46 Facebook's Folly, SerenityOS, we've had on the show before. Some lesser-known libraries as well. Interestingly, coincidentally, I've just recently been dabbling in some ideas for something I'm calling Catch-23. Oh? Which is not a promise for a new library or anything, but it is taking some of the ideas from from the catch testing framework and saying what would that look like if we did them in

Starting point is 00:09:11 thoroughly modern c++ you know using c++ 23 features i've even got my irons in 26 features but i'm not quite going to go there yet so it's more of a playground at the moment we'll see what happens in the future you couldn't resist the a new pun a pun based library name that's the real reason you're doing it isn't it it is i admit i had my eye on this name for some years now and i suddenly realized oh it's actually here this is an excuse to actually get started on it so it's not actually on github yet but um maybe i'll uh maybe i'll put it in so listeners watch out for catch 23 yep okay so that's the news license we wanted to discuss today so um we're gonna move over to our interview with with francis now now francis one of the reasons we have you here is because

Starting point is 00:09:59 you do have a new book coming out soon c++ book camp so let's start with that um now presume the book camp there is just a play on like boot camp in a book but what is it all about and why did you feel we had to write it so so one caveat that manning have a series of books called book camp and you're quite right playing on the idea of boot camp just trying to get you up to speed with things but we are discussing maybe having a different title that's more precise about what's covered i think we've discounted the idea of modern c++ because there's several books with that sort of titles and that's not particularly helpful in the long run naming is hard yes always what i'm trying to do is just give people a short small book that you're possibly reading maybe a weekend or so which i think matt did manage maybe i did this weekend in fact

Starting point is 00:10:53 this weekend just passed yes and i've just tried to pick some salient things that have changed since before c++ 11 so i know loads of people who, 20 years ago, used to be doing C++ full-time, stepped away and learnt bits of other languages, and then tried to come back and have just gone, where do I start? Because there are loads of books out there that are really detailed on each of the different standards.

Starting point is 00:11:19 And if you tried to read all of those, a new standard would have come along by the time you'd finished. So just try to pick some salient points to help you like get enough knowledge to feel a bit more confidence to start using some of the newer standards this you'll still need to go away and learn loads more afterwards but i feel like maybe it's enough to just get you back in the driving seat again if that's what you want to do yeah and it allowed me to practice some stuff and i realized there's loads i didn't understand properly so i've learned loads as well the learning by teaching approach to to learning

Starting point is 00:11:56 right right so yeah i thought that was a really interesting spin on teaching c++ to people who already have some grounding and background in C++ because very quickly you introduced a whole bunch of things that were interesting and fun as opposed to, as you say, if you just read the standard, you're like, good grief, I have no idea. What is our value something, something? I don't even know why this is important to me. I don't care. I'm bored. I want to do something else now, right? So what sort of went into the thinking behind that? How did you come up with the ideas to teach this in the way that you did? By a lot of head scratching.

Starting point is 00:12:36 I guess, actually, because I did start life as a maths teacher, even when I was at school, I tended to be the person people turned to when i don't get this and i go all right and i got quite good at thinking of games or silly little examples that are quite fun i'm particularly proud of my blobs racing out of a paper bag i mean that's loads of fun to sit back and watch but if you can find a fun engaging example that's fun to play or look at afterwards then you start thinking about it more and tinker with it more. But yeah, trying to come out with examples was really hard. Quite proud of falling across the mind-reading machine.

Starting point is 00:13:17 That one, I was thinking about statistics and predicting the future and years ago and i wondered if you could make a game that would be able to play rock paper scissors now we should talk about ai and robotics later on for that exactly but from the gestures you can guess what people are going to do to an extent and i found a web page that was just went no let's keep it simpler we'll predict whether you're doing heads or tails and it kept beating me i'm like what's going on and i had a link in there to a very in-depth russian paper from years ago which i put some of through google translate which then pointed me back to shannon's from the 1950s, where he had built a mind reading machine, an actual machine with hardware rather than software. So I thought, right, let's do this.

Starting point is 00:14:13 And this machine tries to anticipate a human guessing heads or tails. Yeah. So as you say, mind reading, and this is what presumably like um and this is foreshadowing and perhaps but like humans aren't very good at random numbers randomly choosing things and so computers can go i know your trick you know you did three heads in a row so the next one's almost certainly going to be tails right yeah it's just tracking some states using very little memory but yeah so yes lots of having too many tabs open in my browser trying to read the whole internet scratching my head going for long walks coming out of crazy ideas i mean

Starting point is 00:14:50 yeah basically don't ever look at my collection of tabs that are open on piles of books none of us would do very well if people saw what tabs we had open like the amounts cpp reference and stack overflow in my case that people would probably not let me write code um yeah i've got too many tabs open in my brain so your book's currently in early access what does that mean so you can buy it now directly from malling and you'll get an ebook and you can join they've got what they call a live book page where you can read it, but you can leave me comments or just call out bits you'd like further explanations because we haven't hit the printing stage.

Starting point is 00:15:33 I have now written all nine chapters, so the final chapter will be up shortly. And we're trying to get ready for a production run this month, so I'm waiting for some final feedback. There will be a physical book. Oh, yes. A real actual book. Maybe from next month, depending. So real soon now.

Starting point is 00:15:55 Got it. So you kind of like, you incrementally buy the book and then you get it ahead early while it's still being made, and then you get a chance to comment on it before you get the final physical copy in your hands. Yeah, and it helped me make it better and call out any nits you might have seen whilst you were reading it because there are some.

Starting point is 00:16:14 But I mean, that's always good, having early feedback and things. And yeah, I do look at the live book version every now and again. People have left some really helpful comments where I don't understand this, and I've tried to explain better. So I know a lot of our listeners are really great at finding little faults in things, so all you have to do is buy the book, the daily access, find all those nitpicks, and send them on.

Starting point is 00:16:40 Now, you do have another book that you wrote a few years ago now called, I've got the full title here, Genetic Algorithms and Machine Learning for Programmers, colon, Create AI Models and Evolve Solutions. So capturing that AI craze, but that was written in 2019. So long before the current MLBM craze. How do these things compare? How have things moved on in the meantime well i i deliberately when i wrote that book steered clear of neural networks and things that you might need a little bit more maths for i was trying to just find some simple examples to just allow people to get the basic

Starting point is 00:17:19 idea of how well at least how i think loads of ai models work which is you loop around something and you go left a bit right a bit you tinker a bit and there's usually some randomness involved in there so i found several examples like from genetic algorithms to particles swarm optimization and more besides which i'd given talks always called like diffuse your way out of a paper bag or evolve your way out of a paper bag so those are on youtube i'll hit a point where i've got about five or six talks i've given at conferences that's got to be enough material for a book so then i had to find some more things to learn about pull them together so even at the time, I didn't go into neural networks. There are plenty of other people who've written books about that kind of thing.

Starting point is 00:18:10 And you do need to be able to follow some calculus to understand what's happening there, which isn't hard, but that's a busy space. There are plenty of good books out there about that. Since then, we've seen the large language models really kick off. And even things with image recognition stuff, some of it's because there's more processing power going on and people are learning to do CUDA and things like that. Because there's a lot of grunt work needed. If you start looking at the amount of calculations that are happening, it is absolutely mind blowing. Someone somewhere has got a huge

Starting point is 00:18:45 electricity bill for chat gpt and what are we doing to the planet but maybe that's a different discussion for another day yeah i mean definitely the thing of the last several months has been all about various different chat app type things so talking about all those calculations and computations you just mentioned is that mostly during the the training phase or every time you submit a new part of a query it has to do a whole load of computation again i'm not an expert but i i would stake money on it being the the training's the worst bit but but I'm not sure. Okay. So when I think of machine learning or AI type stuff,

Starting point is 00:19:29 that is what I think. I think about neural nets and things only because that's what I've seen in popular literature. So what kind of machine learning is not neural networks? Okay. I mean, if you look at a lot of data scientist roles and things you blur the lines between statistics and ai a lot of it gets rudely called curve fitting or so just regression stuff and that's fitting like y equals mc squared line through some points which you might have done when you're 14 15 right shoe size and height yeah of prediction but but if

Starting point is 00:20:06 that model works that model works but then there's other there's another space of doing building some simulations of things so i talked the other year about how to crowd your way out of a paper bag i i i misappropriate is this a problem in your life, being stuck in paper bags? It's just become a little bit of a theme, and it's quite a nice way of having a little visualisation of what's going on to see what's happening. I did try a hardware version once, but it turns out I'm not very good at soldering.

Starting point is 00:20:36 But that's another story. So, like, the cellular automata, I don't know if you've come across things like Conway's Game of Life. And that kind of thing. we've got some simple rules and things just move or materialize and then dematerialize but you can actually use that to simulate people moving through space and p and that kind of thing gets used to model the best routes for far escapes through buildings and that kind of thing. And that's not a neural network at all. You just have some rules that say, if you get near someone else, pause so you don't stand on them

Starting point is 00:21:13 and try and go to the nearest exit. And then you just simulate and see what happens. And you can do this kind of thing if you're modeling a spread of diseases. So you do things things like monte carlo simulations there which is just doing some random stuff inside a model and see what happens and that the for example the r number and the coven modeling was driving the randomness that was happening and then you can predict what might happen if your model's right obviously if your model's wrong and your R number's wrong,

Starting point is 00:21:45 it's not going to be accurate, but that's why you then measure and say, is this close or not? Is it AI or machine learning? Not quite. It's kind of stats, but no neural networks involved. This is a much bigger space than people realise. I feel like the folks who are with the LLMs

Starting point is 00:22:02 have kind of cornered the idea of what constitutes AI and machine learning, whereas, as you say, statistics is kind of the underpinning of the whole thing and fitting a line through a multidimensional space, presumably, rather than just your X and Y, is what most of these things are doing. That's really interesting. Yeah, some of it. There's other things you can do i mean

Starting point is 00:22:25 the phd i did a lot of that was focused on what's called decision trees you give the thing some data and it spews out basically a flow chart for you i mean i was supposed to be investigating whether chemicals are toxic or not so has it got a benzene ring in it yes or no not by not by consuming them oh no by by by i think we all did that as students well i couldn't possibly comment i did have the odd pint of guinness you there you instead of getting a neural network that just comes out toxic or not toxic it shows you the rules it's using so that that means you've got slightly comprehensible model and there's a whole area of research about trying to comprehend or understand what neural networks are actually up to to see whether it makes any sense in terms of the domain you're using it in but

Starting point is 00:23:20 your chat gpt is like a big black box and if if someone said, why did it give me this answer? No one can answer it. But if you have a tree that says, well, if X is greater than 10, then if Y is more than 300, it's, yeah, got it. Reverse engineer that out by curve fitting. I have played around with trying to generate some rules from some simple feedforward neural networks. And you can actually do that to a point.

Starting point is 00:23:49 Trying to do it with these really deep neural networks, I think that would be too difficult. But it's an active research area and it's absolutely fascinating. And I'm going to have more tabs open in my browser if I'm not very careful. I'm going to sit on my hands. So, yeah, you mentioned the paper bag thing uh where so where is that comfortable how how has that become uh a thing for you why why the paper bag i i think the first time i did this i gave my first talk at the acu conference a very long time ago now and i wanted to just give a a bit of an overview of what some kinds of machine learning were partly to remind myself how some of this

Starting point is 00:24:34 work because i'd done my phd a while ago and i was kind of missing it it was interesting so i played around with a couple of simple examples but i wanted something visual to show during the talk right so i i came out i think i used ankoly optimization there where you just have some ants trading a path round and i went well okay oh no but if if we give them a challenge of when you're somewhere in the bag can you get out of the bag or not and then the title was can you code your way out of a paper bag and i took i printed off a certificate and i got members of the audience to sign it if they thought i'd done well enough and that's up on my wall just around the corner and if i find you the link at some point to the

Starting point is 00:25:16 conference it's got on the schedule link to my bio link to the slides and link to the picture of my certificate and once i've done that it just became a bit of a gag you know that's amazing so you're a certified paper bag uh and so software not hardware as explained so right right and presumably again that's all in in c++ was what you were doing or was with other? No, it's been, so the book had a bit of JavaScript and a bit of Python, which some people hated. They went, well, if you're doing machine learning, you should do everything in Python.

Starting point is 00:25:55 I actually maliciously did some of it in JavaScript because why not? And I do know one or two younger people who've picked up the book, like still school-aged kids who don't know how to set up a C++ compiler, aren't quite sure how Python works. But the JavaScript stuff, you just chuck it on your laptop or whatever, and then you can play with it without the ramp up.

Starting point is 00:26:16 So, like I said, some people didn't like that, but hey. No, absolutely. No, I mean, making it accessible and teachable is more important at some level. Obviously, I think if you're then going to write this stuff in production, which I don't think your ant colony is necessarily production code, it'll end up being C++. And nowadays, as you mentioned earlier, like CUDA and things like that and what you need to do in order to do the millions and billions of calculations

Starting point is 00:26:40 to actually train a system or even fit stuff like a decent amount of data uh it needs to be performant enough um yeah yeah i have seen most of those talks not all of them they are very fun and accessible so i'm going to put links to them all in the in the show notes so encourage people to go off and watch some of those and that will be your gateway into the world of ML. Is there anything else you want to say about that book in particular and just ML in general? I could say hundreds of things. Just listen to some of my talks, have a play with some of these ideas. Some of this machine learning stuff,

Starting point is 00:27:20 you can actually code up from scratch yourself. It's not all hardcore maths. Just find something fun and play with it and that's all one of the things in your in your book the example the the mind reading example i thought was fascinating because it is such a relatively small piece of code to get the effects that you get and also from my own sort of like performance based background i'm thinking is this how the branch predictor works inside the code it's the similar kind of statistics being kept patterns are being found in your code is a branch taken is it not taken so i was like oh i wonder if i can use this to model uh the branch

Starting point is 00:27:57 predictor and so yeah it's it was a it was a it was a cool bit of code to see awesome all right well we're gonna continue the conversation at the moment. Hold that thought because we're going to have a little sponsor break. As we said before, this episode is sponsored by Sonar, the home of clean code. So Sonar Lint is a free plugin for your IDE. It helps you to find and fix bugs and security issues from the moment you start writing code. And you can also add Sonar Cube and Sonar Cloud to extend your CI CD pipeline and enable your whole team to deliver clean code consistently and efficiently on every check-in or pull request. SonarCloud is completely free for open source projects

Starting point is 00:28:32 and integrates with all of the main cloud DevOps platforms. So back to our interview with Francis. Now, one thing that did come up in both of your books, and you did mention it in connection with ML a few times, actually, is the subject of random numbers. And this is something else we wanted to talk about today. It turns out there's a lot to this. First of all, how does randomness relate to ML and AI? And what do you think of the current state of random number support in C++?

Starting point is 00:29:02 So a bit of a two-parter there. Two huge questions on a on a real simple level in terms of ai if you're trying to solve a problem you could brute force it and that that becomes really self-evident with say genetic algorithms um someone's designed a genetic algorithm framework to do um seat planning for weddings now i don't know if you've ever tried doing that yourself oh my goodness so you've got some constraints of these two cannot sit next to each other these do have to sit next to each other i've got this many tables and if you were to brute force it list every possible combination you hit millions really quickly with very few guests and your brain explodes and you just go i don't know i'm

Starting point is 00:29:54 going to pay a wedding planner to do it for me perhaps or rather than brute forcing everything you try a few random initial points and go well that one's rubbish that one's rubbish and that one's okay here but we've got this other bit that needs improving on so you start some random stuff and then you have your better ones you maybe nudge left a bit right a bit tinker a bit but what do you choose next well you randomly try something rinse and repeat and then if you've got the objective function right and you use the randomness to steer towards something better and us as humans need to be involved in this modeling and define better then it might come out with an acceptable solution

Starting point is 00:30:53 so basically instead of trying everything it tries a few things randomly but we haven't defined random and that is a big topic and then the second part of that question was about the current state of random number support in c++ which is a big topic i think right so c++ 11 introduced all the random engines and distributions and there's been a few little tinkers and there's several different engines but that was all c++ 11 and started to see other things introduced in the algorithms like shuffles and stuff like that but nothing fundamental a lot of people by default use the mcsen twister but it's got loads of state so that makes it pretty much impossible to do cuda stuff with because it's just too much to pipeline backwards and forwards i've seen a proposal which i don't

Starting point is 00:31:54 have the link to to hand i mentioned it in my acu talk this year to come out something with much less state so as you can do the gpu stuff much more easily and make things more efficient now i i don't know if that's proposal still being actively worked on and i have noticed in i think some of the python numeric libraries they've moved away from using the mosen twister because it's got all that extra state and are using some newer ideas so i've seen i think we we have a really good foundation for c++ 11 but we haven't seen anything radical change since so yeah we'll see what happens call me old-fashioned but what's wrong with calling old c's rand function it's absolutely fine. But if you want to, for example, roll a dice and make sure you get the numbers one to six

Starting point is 00:32:50 with equal probability, you'll probably get it wrong the first time you do it because you need to watch out for... You mean you just modulus it with six, right? Well, it'd be close, but I'm not playing dice or cars with you mate because you're cheating obviously you we know this don't we six doesn't go into max int equal number of times so some your six would be slightly less likely than the others right don't do that so there are there

Starting point is 00:33:21 are ways and means of i mean i i also know that the brand at least some of the computers i've checked the rand rand function only returns like a value between zero and six five five three five or something like that anyway so yeah it's it's not really very hasn't got much range in the first place so um but you mentioned mercenar twister has a bunch of states so um and there are newer algorithms and and so what is the problem with the state then with with something like CUDA? Is it is it just the computation for a number if you're going to be doing thousands of them in a row or. Right. So the basic random number generators like C, a linear congruential ones, you start with a number, you multiply it by something and do and then do a modulus calculation so it looks like it's

Starting point is 00:34:06 wanging backwards and forwards but you've just got to stay with one number there maybe that's it but if you don't get the thing you're doing the modulus right or the things you're multiplying right you'll end up cycling really quickly and also once you hit a number you've seen before you know what's going to come next that is the site right yeah but hidden state when the same twister goes well are you some some bits from the last number some bits from the number before some bits from another number some bits from another number i can't remember the size but it's several bytes of state which you can't shovel onto your gpu because there won't be enough space necessarily or if you do that you haven't got space for the things you want to actually use

Starting point is 00:34:54 the gpu for so got it so a sort of more somewhere in between the the full complexity of the musen twister and not just having the last number and just multiply it by 13 and dividing by modulacy with 255 or whatever the heck you did before. But I see. Got it. And I was reading another problem with the Moussain Twister state is that because it is so big, if you're seeding it with a 32-bit or even a 64-bit integer, the difference is so big that given the first few numbers,

Starting point is 00:35:25 you should be able to reverse engineer what the seed was. Oh, I mean, obviously there's cryptographically secure random number generators, which is one thing, and the MSN Twister is not trying to be that. But even so, if you can guess with a couple, then you maybe… Or possibly even one. Yeah, all the pseudo-random numbers are not cryptographically secure that's an entire other space that i know very little about but simple level a lot of these pseudo random numbers

Starting point is 00:35:54 tend to just do some multiplying and modulus the cryptographic ones start doing crazy stuff like raising to the power of two and then doing some other stuff as well and then that just gets numerically much harder to figure out what's going on so you mentioned the difference between a true random number generator and a pseudo random number generator i have not said the word true at any point i don't think right first there's no such thing as a random number listen to my accuTalk from earlier. I'm sure there was an XKCD that proved otherwise. Well, they claim the number 42 is random, but you can't have a random number.

Starting point is 00:36:36 You can have a sequence of things that seems a bit arbitrary that it's quite hard to spot the pattern, right? You can't generate with software i was going to say hardware software trying to generate something that's random or arbitrary or is almost impossible you can do things that are hard to figure out what's coming up next that which means we can have fun with coding your way out of a paper bag or writing a writing reading machine or whatever that's fun you can try things with hardware so i i guess was it ernie was doing the numbers for the premium bombs right that was a big machine that just randomly dropped

Starting point is 00:37:19 like ping pong balls down tubes and stuff which for non-uk residents was the sort of government saving scheme stroke lottery kind of thing that that we had in the uk um is it still going maybe it is i i think they still do premium bonds i don't know if they're still using the hardware machine that was there but there's another thing if i toss a coin it might come up heads or tails is that random or if I knew the precise initial conditions which is why it would be nice if T. Moore was here as well perhaps

Starting point is 00:37:53 opinions another day if I knew the precise initial conditions in theory our physics is good enough to work out what's going to happen so is anything random. We've got to philosophy pretty quickly. That's a bit deep.

Starting point is 00:38:10 Is it a deterministic universe or not? Or does quantum get us out of jail in this? Did I mention my undergraduate degree was in maths and philosophy? So you walk right into this one. So what is the answer then tell us i i don't know i i think this this has some very profound questions about how you look at the universe i've been trying to think and speak without using the word random for the last year and that's been really interesting because it's made me start thinking about how I think about things.

Starting point is 00:38:47 So, I mean, this whole train of thought kicked off by me rereading Ian Stewart's Does God Play Dice book, which is about chaos, partly. his response to the idea of nuclear decay and it appearing to randomly like decay down you couldn't predict when you could look at you could state a half-life so you've got a property but you don't know exactly when something is going to happen and einstein's opinion as far as i understand is that there must be something more precise going on here. Just having this statistical model wasn't proper physics. But most people nowadays go, oh, well, he's wrong. And it is like that. I don't know.

Starting point is 00:39:38 That's why we needed Timo on. I know. He's probably having that experience of shouting at the podcast right now. Yes, I'm sure. Poor Timo. Timur on. I know. He's probably having that experience of shouting at the podcast right now. Yes. I'm sure. Poor Timur. Yeah, he did say he was quite excited to talk about that. So maybe we'll do a follow-up at some point.

Starting point is 00:40:01 But for practical purposes, though, we can sort of make a distinction between the pseudo-random number generators that we have to purely encode, and some which has some external source of entropy. Usually we put those two things together to some degree, but do you have anything else to say about that? All I can say at that point is that, I mean, obviously you can use a random device to see the pseudo-random numbers, and that's using the entropy on your file system. And it will lead to unexpected results let's say unexpected or hard to predict

Starting point is 00:40:25 and then you can have fun writing a game or something and i think that is great so what's the limits of that then so i mean for games it's going to be fine but we've talked about cryptographically secure the randomness but is it enough for that yeah again i'm not an expert on the cryptographically secure stuff. And certainly using some hardware entropy would make things harder to predict. So, yeah, you need an expert on about the cryptographic stuff or give me time to read up more on it. I have seen other situations. I mean, that's something that's also not there in modern C++ easily, if you're doing some monte carlo simulations you might want to tweak things to make sure you get as many things above average as below average so

Starting point is 00:41:14 if you go plus one you want to go minus one as well and it's hard to adapt some of these to what are called quasi-pseudo random numbers. One of the finance places I worked at in London, I was on a risk team and they were running Monte Carlo simulations. Now, you need to be able to report to the financial conduct authority how you've come out of the numbers you've come out with. So you can't just say, here's the seed to my random number, because, well, this is a discussion we probably need to do next. That doesn't necessarily mean you can regenerate all the numbers you need. So part of the overnight batch process was generating enough random numbers for everything,

Starting point is 00:41:57 doing that thing I said of, if I go above average by this much, I want to go below average by this much. So I've got that symmetric distribution to be fair and we generate these massive files of random numbers and once in a while we get a more complicated model come in and one of the quants the rocket scientists would go we've run out of numbers and the number of times i heard a mathematician go, I've run out of numbers. We've put a purchase order in to get some more. We'll be back with you in a few weeks. But yeah, that was hard, just trying to get the coverage and things and not hit a point where you start repeating. And even on a send twister, you can nudge towards things start repeating.

Starting point is 00:42:41 If you're running a million simulations per each financial instrument and then you bump the interest rate by one percent as you do it again and see what happens you're like but now it's repeating itself so what are we gonna do oh yeah gosh and i bet those those giant um those huge files of numbers um presumably you could just you know you, you GZip them to make them a bit smaller. That was the domain of the database people. Well, there's a thing. And of course, if you start GZipping things, you can see how compressed,

Starting point is 00:43:14 if they don't compress very well, then they're really random. If it compresses a lot, then you're in trouble. You know, your random number generator is rubbish. Yes, nought, nought, nought. Yes, it is actually, that's actually quite a serious thing that some people do sometimes, just as a ballpark figure.

Starting point is 00:43:32 Is it random or not? If I compress it and it gets slightly bigger, we're probably in a good spot. I always sort my random numbers before I compress them. It's much better. Deduplicate. You touched on one subject there that i did want to get onto so maybe we'll come to that now which is uh portable random seeds which is uh something of interest to me so i'll explain my use case first so i used to maintain or originally wrote the the catch and then catch two test frameworks

Starting point is 00:44:04 and one of the things that does is randomize test runs the idea being you want you want to you know one test to run before another one time and then after another time so you don't get dependencies between tests or if you do you find out sooner rather than later and there's a few other places that randomness comes into the actual running of tests and and even test data, which we might come onto as well in a bit. So there's randomness even with unit testing, but you also want to be able to control that. So we print out the random Cs, or at least we did,

Starting point is 00:44:37 so that you can then reproduce this random run if you did actually find a problem. Trouble is half the time, or a lot of the time, you originally get the problem is running on a build server. And then you try to reproduce it locally. And it turns out, at least the way random number generation is, or not generation is what we get onto, is specified in the current C++ standard. It's not portable, which means that that random C is going to give you different results, potentially on different machines or other variations in environment. And there was a paper a few years ago from Martin Aronofsky,

Starting point is 00:45:06 who now maintains Catch2, for this reason, to make the random number distribution in C++ portable, which unfortunately didn't make it through. But what's the solution to this? How big a problem is it? I don't know. It'll be interesting to see what happens. A lot of people

Starting point is 00:45:28 assume if I tell you the seed, then you can reproduce this. If you reuse the same computer and you don't upgrade anything in between, maybe you'll be alright. Yeah, that's surprisingly difficult.

Starting point is 00:45:44 But I guess if implementers are free to implement things however they want as long as it gives you a uniform distribution so you're not cheating when you roll your dice then we aren't specifying how the seeds are actually working yeah i don't i don't know what solution is i shall sit back and watch and learn more but this is not a problem that's inherent in random number generators in general specific to the implementation exactly yeah i mean if you did the maths yourself with pencil and paper then yeah it will work out but you haven't seen my ability of maths it would be different every time in theory or maybe get a sir ai to do it for you yeah i mean it's not mathematically impossible but i guess if implementations are doing slightly

Starting point is 00:46:35 different things then and it's not specified how then there we go right and my understanding is that most of that unspent lack of specification is in the distribution stuff rather than the seeds and the randomness itself because i think we can all see that like you multiply by 372 and divide by whatever that's going to get you the same answer but then the thing with the getting it in a fair dice roll out of it is like well how how do you modulus by six in a fair way you need to keep some state from the previous time and then mix it in the next time to make sure that the distributions are are still over time the same and then i think that's the unspecified underspecified

Starting point is 00:47:15 part so that the you know implementers can choose to do it whichever way they like and maybe there's floating point precision inside some of these things who knows and and unspecified things it's yeah yeah i don't know the details i mean again mathematically the distribution and a cpp reference for all the distributions tells you the maths that it should be doing right to spew the numbers out yeah i'm not now we're talking about it i'm not sure precisely where the differences would fall out. I'm going to be spending an afternoon thinking and reading now. More tabs.

Starting point is 00:47:51 More tabs open. Sorry. That's all right. We threw around the terms generation and distribution. But what is the difference then? What does each part mean? So C's round, as we said, we get between norton and whatever so you just get a number generated a random number but if you viewers can't the listeners sorry can't see the air quotes the

Starting point is 00:48:16 the little pause between each word is air quit yeah thank you for that so but if i want to end up with a dice roll so a number between one and six i then need to like to use a technical term splurge out or bucket up the numbers and it's that splurging or bucketing function that gives you the distribution distribution so uniform each number between a lower and upper bounds equally likely if i want a normal curve or a bell curve i want things to be bunched up so i've got more of them around the mean in the middle and fewer things on the extremes and then there are all kinds of other distributions but it's just splurging smearing out the numbers or bucketing them that's all right interesting and that's the part that doesn't seem to be portable between machines do you know why that is is it to do with the architecture or no idea gonna find out and

Starting point is 00:49:19 report back okay you heard it here first and i mentioned testing will come up again i know this is something you've um we were talking about just before the show is uh property-based testing which is another area where random numbers and testing collide uh what can you tell us about property-based testing and other forms of randomness in testing. Well, something that I pulled on a bit in the first book, the genetic algorithms one, was if you're doing random stuff, there goes the air quotes again. Sorry, listeners.

Starting point is 00:49:54 If you're writing code that's doing some random stuff, now, whether that's larking about coding your way out of a paper bag or doing some serious stuff like COVID modelling or some finance stuff, how do you test what you've done? Now, you can mock out the random numbers and just returning zero instead flushes out all kinds of problems. I could tell you all sorts of stories there. You wouldn't believe it.

Starting point is 00:50:19 But that's not sufficient. That's enough to find some silly edge cases how do you know that you're actually doing coming out of the right kind of maths if something different happens every time well if you're simulating stuff and you you've got an r number for some covid modeling you ought to be able to recover that r number from the results you get. So you can say, if I do this model and send in R of 1.5, then all the results I get out should have an R value of 1.5. And then a framework can try several simulations for you and see whether that happens or not. Or you might say the average height of the people in this modeling scenario be this does that happen

Starting point is 00:51:05 or i only get outliers one time out of a thousand so you can start looking at averages stats and properties around things and that will flush out some cases you've missed in many cases i've repeated myself you know what i mean so when i deal with those R values, I never give them a name to make sure the R value is actually an R value. Am I missing a pun there? There's a joke in there. I think it's too early in the morning for me

Starting point is 00:51:35 to work that one out. Thank you, Phil. I'm sure I've missed the joke there about it catch up on the replay i will so yeah property-based testing is something that i have a big interest in as well so i've tried to build at least some of that into into catch too but there are separate property-based testing frameworks available for C++. So not just limited to more sort of statistical types of things where you want to check distributions, but anything where you just want to test a property of something

Starting point is 00:52:15 that always holds regardless of the inputs you give it. Property-based testing is a really good way to go there. And that involves random numbers, which gets some people nervous when they think unit tests have to be reproducible. That's the thing, it's not a unit test, it's a property-based test. Yeah, I mean, so when I've done some property-based testing, it sometimes flushes out some edge cases.

Starting point is 00:52:35 So if something fails, you go, but then you can use that as a unit test to pin down some bad behavior that's going on. I guess an extension of the property-based testing is fuzzers to just blam your code with all kinds of nonsense and see if it falls over or not and that is that's still flushing out cases that you need to be aware of test for and maybe defend against and that's using randomness as well. And that's exactly why the portable random numbers would be so useful,

Starting point is 00:53:09 because at some point you do need to be able to reproduce a particular run, which gets difficult otherwise. But as you say, just pinning it down to a specific unit test gets you a long way there. Yeah. I actually find that you can have a bit of a dance between unit tests and property-based tests. So sometimes you can write some unit tests

Starting point is 00:53:29 and think, well, these are quite good. And in fact, they're not specific to these examples. I can extract a property from that. You write that as a property-based test. And if at some point it then finds an issue that you hadn't thought of, one of those edge cases, then you can capture a unit test back out of that

Starting point is 00:53:43 and you go around the loop. Yeah. quite good to think of it that way but it all depends on good random numbers yeah so yeah coming back to the the c++ uh book camp book which that's hard to say you say there might be changing the name book yeah it's tricky isn't it yeah but and we kind of alluded to this at the beginning but you know we talked about you know modern c++ and why you didn't call it modern c++ and obviously the thing about modern is it's out of date before you finish the sentence yes uh but what what kind of things are you excited uh for coming up in the future so the book is c++ 20 right but is that or is it 23 is tiny tiny tiny bit of 23 because there wasn't 23 when i started writing last september and now this september there is so there's little bits so a little bit of c++ 23 but is there other things that you're looking

Starting point is 00:54:39 forward to uh maybe in a future revision of the book or a new book for 23 or 26 ideas in the future what what are you looking forward to i'm spending my time at the moment looking backwards trying to go through the corrections on things to be honest i do it would be interesting to see if we get beyond the standard generate for the co-routines so i can do things are doing python quite easily like you and stuff and do it off somewhere else we'll see if that grows or not and also there are loads of little bits with um ranges views that i've not played with in anger those are more syntactic sugar or nice ways of doing things that i've seen more and more of that happening and i i've only used a small amount of some ranges and views in the book just as but

Starting point is 00:55:32 the whole book was just little flavors of things to get people started but yeah interesting to see where co-routines does end up but that's a big topic oh yeah i thought it was it was very brave of you to to even bring them up in the book, right? Co-routines are the thing that I haven't touched myself yet. It's the thing that I found most interesting when I was reading through it at the weekend because it's a huge topic, as you say, and most people, well, I speak for myself, have gone, oh, that looks complicated. La, la, la.

Starting point is 00:56:00 Don't forget about it. Just don't even think about it. But you have a pretty comprehensive example, as you say, a flavor of all the bits and pieces you need in order to actually use co-routines. It was entirely gratuitous. But after I'd introduced you to the mind-reading machine, I thought, might as well put it in a co-routine. It doesn't bring any benefit to the mind-reader at all.

Starting point is 00:56:24 So I've had several goes at trying to write co-routines i'd really like to thank phil nash for doing the workshop at c++ on c that helped me realize some bits i'd missed and make that part of that chapter better i think that that was a really useful workshop that was actually matthias po Puce's material that I presented. And I saw you cramming the night before from a restaurant table just opposite us, trying to get your head straight. So thanks for stepping in and doing that. That was so useful.

Starting point is 00:56:56 Yeah, I think there's two types of people when it comes to co-routines. There's those that look at it and think, oh, that's too complicated. I'll learn it later. And then those that look at it and say, that's too complicated. I should do a talk on it. Or oh that's too complicated i'll learn it later and then those that look at it and say that's too complicated i should do a talk on it or somebody else's workshop indeed um okay so um we should start wrapping up um is there anything else you want to to tell us before we let you go francis well just thanks for having me on. If anyone can define random properly, then like find me on X or Mastodon or something and let me know what you think.

Starting point is 00:57:29 And maybe that could be Tima. Yes, looking forward to it. So where can people reach you if they want to let you know what randomness is or reach out to you for anything else? So I'm on what used to be called Twitter. I'm on Mastodon and I'm linkedin and i shared some links with phil so i guess you can put them up somewhere they will be in the show just google me because you'll probably find me because i've got quite a unique surname

Starting point is 00:57:57 yes which will also be in the show notes as part of the title. Well, thanks very much for coming on the show today and being a random guest. It's been great to have you on the show, Francis. It's been really interesting talking about random stuff. And I'd never really thought about randomness in terms of testing. So I've learned something today, both from you, Phil, and you, Francis.

Starting point is 00:58:19 So that's been very interesting. And thank you for having me as a co-host, guest co-host again. It's been fun interesting. And thank you for having me as a co-host, guest co-host. Again, it's been fun to be here. Sending our best wishes to Timur, obviously. So thanks for stepping in at short notice, Matt. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast.

Starting point is 00:58:39 Please let us know if we're discussing the stuff you're interested in. Or if you have a suggestion for a guest or topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate it if you can follow CppCast on Twitter or Mastodon. You can also follow me and Phil individually on Twitter or Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com.

Starting point is 00:59:06 The theme music for this episode was provided by podcastthemes.com.

CppCast - AI and Random Numbers

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.