Two's Complement - Compile-Time Programming (with Hana Dusíková)
Episode Date: February 20, 2022Ben and Matt are joined by Hana Dusíková and discuss panoramic photographs, Matt's career peak, and compile-time programming, including her ground-breaking regular expression library. Links from the... show: Hana's Panoramic photos CTRE library Hana's slides
Transcript
Discussion (0)
I'm Matt Godbolt.
And I'm Ben Rady.
And this is Two's Compliment, a programming podcast.
Hey, Ben.
Hey, Matt.
How are you doing?
I'm doing great.
Glad to hear it.
So we've been talking a bit about performance,
and I've realized that kind of my career has taken me from hacking around at assembly
through to trying to coerce compilers into doing all the work for me, right?
That seems to be the journey that I've taken.
And I realized that there are people still that are much, much better
at making the
compiler do all the heavy lifting for for us than I am and so I figured we should talk to somebody
who can tell us all about the kinds of things you can do if you really really work hard at the
compiler and and some of the tricks you can do with C++. Okay yeah well that sounds cool. So
and so we have a guest today we have
a guest we have a guest there's another window on our screen which no one else can see obviously but
so surprise to nobody on this call but surprise to our listeners we have a very special guest today
uh we're joined by hana dusikova hi hana hi how you doing thank Thank you so much for agreeing to do this. We are excited to have you here.
It's too late here, but it's great.
And the last few days was kind of up and down, so I'm great.
Oh, good.
I'm glad to hear that.
And thank you as well for dealing with the time zone difference.
We're happily in our lovely early afternoon, whereas you're way into your evening.
So we appreciate you taking the time for then.
So I just want to introduce you to our guests.
Our guests? No.
I want to introduce us to our listeners.
You are our guests.
You can see how often we do this.
So, Hanna, you're a staff scientist at Avast,
and you represent the Czech national body for C++, right?
Is that the right way to say that?
Yeah, that's correct.
And also I'm an officer in the C++, right? Is that the right way to say that? Yeah, that's correct. And also I'm an officer in C++ committee.
I'm chairing SG7, which is a study group
for compound programming and static reflection.
That's not a sci-fi series, right?
No, no.
I figured.
Maybe it's a prequel to Terminator, probably.
Well, we'll get to that in a minute, I'm sure.
But yeah, I'm always confused because, what is it,
WG21 is the C++ Standards Committee?
Is that what its name is?
Yeah, it's Working Group 21.
And then SG is Special Group?
Study Group.
Study Group.
Okay, so you're SG7, which is Compile Time Programming,
and what was the second part?
Static Reflection. Static Reflection. Wow, these all sound like very exciting things which is compile time programming and what was the second part and static reflection static
reflection wow these all sound like very exciting things that i want now in my programming language
it's really it's really a boring topic it's it but that's now you tell us we just invited you
here to talk about this kind of well we talk about all things but um we talk about a lot of the
boring stuff on this podcast other people seem to find it interesting i don't understand why but this is just matt and i talking about
the stuff that we always talk about i mean uh we can just listen to matt talking and it's exciting
i think i think at cppcon matt you did you did like 30 minutes uh stand-up comedy
i don't think that's true i mean maybe that you just thought that my my presentation
was just so funny but uh no well yeah i don't know you did great because you cannot like you
didn't have any feedback and you just go and go well normally the feedback is please stop talking
we've had enough or at least that's what I read on people's faces.
No.
Bless you.
So my understanding is when you're not hacking on C++
and dealing with all the committee stuff,
you're also a keen photographer and a Greyhound owner.
But there's something cool about the kind of photography you do, right?
Yeah, I'm doing panoramic photography.
It's like these spherical pictures,
like before Google Earth before
Google Earth was cool actually right uh-huh and it's quite technique
technical it's not like it's not like typical photography you need to have
like tripod you need to have like a special head for that and also white
white lens and you need to take i think usually 30 pictures and make sure
no one is moving or no not me i'm moving uh like going between uh pictures because then someone is
cut in half and it looks or you get two copies of the dog that was running at the same time through
your picture right yeah exactly but you can also erase them.
If they move between pictures, there is always some overlap.
So you can remove almost the whole crowd.
That's amazing.
And if you have time and patience.
Which strikes me, if you're writing C++,
especially the kind of template- c++ that perhaps or you one
writes uh yeah you might have a lot of spare time and patience yeah right so what what got you into
panoramic photographing because like panoramic i remember once taking photographs like with my
my my handheld and like using some photos panorama stitching software to kind of put it together is that the same kind of thing but just
more yeah it's actually uh really similar and it's the technical part is to rotate your lens
around a nodal point which in a lens so there is no parallax error and then it's really simple
oh i see so this is like this the center of the tripod creates a point,
and the lens is always around that,
as opposed to just turning the lens and then the point.
Oh, okay.
That makes sense.
That's interesting.
Usually when you do panorama with a mobile phone,
you take it in your hand and do stuff like this.
Yeah.
It's hard to like to describe in
podcast yeah no but the thing where you hold your phone out and slowly pan it from one side to the
other yeah it makes error but actually google is using it in android to uh extract uh depth from
picture from pictures oh gosh interesting i had no idea they were doing so much like i just take
photographs and then every now and then google photos says hey i made a panorama out of four photographs you just happened to take together
that looked like they could have been put together.
And I have no idea how they do that.
Yeah, but, like, my approach is a bit different.
It's not machine learning at all.
It's just processing power.
And it's a lot of gigapixels.
Yeah, you've sent me some images before that have barely almost
crashed my browser because they're so huge. So what kind of resolutions are we talking about?
We are talking about limits of a JPEG. Right. JPEGs have limits? Yeah, it's 16,000
something and it depends on implementation. So you went 16 somewhere as like a width and height. Interesting.
Well, today I learned.
And there is actually a funny thing.
Not every implementation supports
the whole specification of JPEG.
And it depends about the last five pixels.
That's the difference between good implementation
and bad implementation.
The last five pixels is in the middle of the square in JPEG.
The macro blocks, right? They're 16 by 16 or whatever.
Yeah, and not every implementation is able to handle it.
How peculiar.
I would have thought this was a completely solved problem by now,
but obviously not.
Yeah, I found out when I tried to like tried to uh load jpeg in uh
apple preview and it was only in only gray nothing else that's amazing i just thought you know a jpeg
is a jpeg i know it's a complicated format and i'm really excited by like the clever tricks that
jpeg uses to sort of get the compression levels that it does but i didn't realize that it was
much space left over for them to to get things wrong turns out this computing luck is pretty difficult
after all getting things right yeah then uh then uh i asked sean parent uh that's like sean parent
of adobe photoshop fame right yeah yeah yeah you. You should invite him here too. Yeah.
Yeah, he has always some story.
And he told me that he makes sure that the DJ pack and decoder in Photoshop
makes all the corner graces.
And it was really hard.
So have you been sending him test images now?
Or was it already handling all of these things, right?
Not to him, but to Tony Van for 221 years yep he's working
on a projection and he asked me for some big picture so i sent him one or one of them and
he told me uh it crashed their uh application and he needed some fixes that's that's really
i mean images right i know you know every now and as i say like you crash browsers that you
load up a big image like i'm fond of of looking at these high resolution, similar panorama, actually similar technology, funnily enough, where people have decapped silicon.
So you use acid to etch off the top of the chip and then you use a microscope and you take loads and loads of photographs really highly zoomed in.
And then you stitch together the die shot of the of the chip and it's super cool because
now you can start reading off how the chip actually worked for you know like 6502 era technology at
least and at least that's the stuff that i can tractably look at and get some idea what's going
on but those images typically are you know they're pngs and they are you know 60 60 000 by 60 000
and so yeah you load them into your browser.
And of course, the browser immediately shrinks it down
because it knows it can't fit that into memory.
But then it's kind of coming down like an old school modem,
you know, 14K4 modem.
You're like, why on earth in this day and age
is it taking so long for an image to come down?
And then it kind of goes boom, and it's like Chrome's dead.
And you're like, oh, I see.
Or sometimes, again, in Apple Preview,
you can open the image.
It's totally fine.
And when you zoom it, it's suddenly gray.
And if you unzoom it, it's still gray.
Oh, and gray is like, I've stopped responding gray.
Yeah, probably.
I see.
Right, that's like, I know Ubuntu does this thing
where sometimes if the application is not listening anymore,
it's like, well, I'm going to render the last picture you had,
but in black and white to show the user that you're not the picture's there but the application isn't yeah it's funny i mean it is i mean it's really
interesting these things because you know we in our sort of day jobs we well i speak for myself
and maybe for ben here and obviously hannah your jobs are probably very different but like memory
is kind of something we don't worry about as much anymore.
We work on server-side stuff for trading,
so we just put more RAM in the machines and we're fine, right?
But when you're dealing with images that are as large as that,
you have to write your software differently.
You don't necessarily want to or can even rely on, say,
the virtual memory system doing a good job
because it can't possibly
page things in and out as well as you can if you know how your software is going to access the
image in the first place and i don't think i'd ever really thought about that i think this applies
to every domain if you know if you knew a semantic of your algorithm well enough you will always be
better than some generic algorithm i guess so i guess so and
that's probably true so talking of knowing more about your domain um and a very
gramophone reed needle scratch noise uh segue yeah um one of the really cool things that i
know you do uh is um work on as we say to the in the intro some some compile time programming
things and you're most well known for or at least i'm i know you most well through uh the
regular expression library the compile time regular that's not true i'm actually most well
known for my slides that is also true yes but the slides if i'm not wrong are the things that
explain the tricks that you use in your ctre library yeah
so yeah we should all right we should let you know we'll put some some things in the notes about this
but hannah's slides are absolutely legendary and have basically changed the way people do
presentations in the c++ space because they are super interactive and she'll show like a
parse tree of a you know what looks like a photograph or a picture a nice rendered picture of a parse tree of a regular expression including like this is this
node this type that here's the multiple of this one or whatever that kind of like tree view
and then everyone's like oh this is cool now we can see how this works and then she'll like single
step through a matching of string and different things are highlighted like so so far so you just
hack this in powerpoint and you've just done all it manually and then she'll just edit live give me a give me
a regular expression and she'll type it in and the whole tree changes and it's like live updating all
the stuff it's the best so that's pretty amazing i mean yeah it's definitely something to aspire to
i know when i did a redo of our presentation on memory accesses,
I spent a whole bunch of time trying to do similar highlighting of like,
this is where the cache comes in and this bit lights up and then this cache line lights across like this.
You remember that presentation, right?
Yes, I do.
It's a pale imitation of what Hanna is able to do.
So, yeah.
Hanna, how are you doing all of that stuff?
I'm actually programming it.
Right.
So I implemented a really
simple version and a really hockey version of expression matching in javascript too just to be
able to hook them into like into d3 library to draw the nice diagrams and visualize it and i
spent a lot of like month you re-implemented the same thing you'd written in C++ in your own
particular style which we'll talk about in a second about why it's so cool that you've done
it that way you wrote it in JavaScript again just so that you could draw the slides to then tell you
as to how you did the C++ version yeah because it was less work than making slides like with
some animations because then I change something and i need to like redo
everything oh no no way right that's i i've still i was gonna say and i guess because of the nature
of the library it would be difficult to like transpile that right like with um the what is
the thing that i'm thinking of graph is no no no no no like the the regular expression you
re-implemented the regular expression library
to make this work in javascript yes yes so like there i don't know why i'm blanking on the name
of this thing right now but there's this the tool that you can use to to basically compile uh oh
c++ into javascript so like wasm and stuff um yeah yeah that's what i'm thinking of right yeah that thing like
now i'm blanking on it yeah right m scripton and so on yes yes but the whole point is is that you're
doing compiler tricks to make that work so it's like the chances of that being supported by the
you know wasm compiler is basically zero the thing is the wasm compiler will happily do it
it's just that the everything happens at compile. So you have to actually put the compiler in the web page
in order to then actually change it.
Of course, yes. It's too late.
Who would be daft enough to put a compiler on a web page anyway?
So actually, for visualization of the compile time library,
you need to have runtime library with hooks to draw it.
So it's a different machine.
Tell us about the compile time nature of it.
And bear in mind, I'm pretty good at this C++ stuff.
Ben has got a working knowledge,
and we don't really assume that our listeners have much of a thing.
We try and draw them from a wide community.
So what is compile time programming?
Okay.
And why should we care know what why should we
care i guess okay uh since i learned about c++ at first i didn't like templates this is because
this is why we can be friends super complex hard yeah exactly and uh then uh i read somewhere that
they are actually touring complete so and uh someone in some article jokingly said,
you should be able to make regular expression in templates.
And it bothered me in my mind somehow.
How I would do that?
And I was thinking like,
through many years,
I was like experimenting with some parts only like,
then I finally was able to match only string against another string
and then I
learned more and more
and then suddenly I was
able to match
regular expression but not
regular expression in string but
express as a template
like the expression tree
of the regular expression
but encoded as a template type name like the expression tree of the regular expression,
but encoded as a template type name in C++.
So it's like repeat, angle brackets, string,
character one, character B, character C. Got it.
So you had to kind of, as a programmer writing the regular expression,
you actually had to write the encoded regular expression
as the tree that it would ultimately be
rather than just the string that we know and love, maybe?
Yeah, the regular expression.
Tolerate. We love and tolerate.
We know and tolerate.
And actually, I like the form
because it's more expressive than regular expression,
which is like...
Right. They're very terse, aren't they?
And it's nice if I can just imagine seeing one of these
and knowing that one of the things that you could put in this
are comments inside your regular expression in C-style comments.
So at least explain what on earth is going on,
which is harder to do, at least with the standard regular expressions.
Yes, exactly.
And then I was thinking how to convert a string into this form.
And that's simple. I love one parser.
Simple. You've just said simple. You glossed over something which is very challenging.
Like the only data structure you have in template metaprogramming is a stack. Okay. You can have a type list,
and you can add a type in front
and remove type from front.
That's all you have.
So this is kind of very Lisp-like primitives
that you have access to, right?
But when you have a stack,
you're actually Turing complete,
and you can do whatever you want.
Wow.
I don't think I registered that
that was the only thing you really needed
in order to be able to do this kind of stuff. I mean, for Turing complete, you can do whatever you want. Wow. I don't think I registered that that was the only thing you really needed in order to be able to
do this kind of stuff. I mean, for Turing complete, you need to have two stacks.
Okay. Right. Right. And just to be clear, the thing that you're sort of stacking here
is that you're making a new type, which is specialized in different ways,
and you're concatenating onto a type. So this is all in the type system of the compiler. There is
no actual stack that you're manipulating
that's going to be in your runtime program.
It's all in the compiler at compile time.
Yeah, it's a type.
And if you are a type representing your current state in a parser,
and then you are a type representing one character on input.
Right.
And it modifies to another type,
which represent different state of parser.
And then you apply this repeatedly
in a recursion way in a function.
Yep.
And at the end, it will return you a result,
which will give you only this is a regular expression,
this is not a regular expression.
That's like step one.
OK. So let me just try to run that past you again so you build you're parsing this thing and
you're building this type up and it's sort of pushing and popping as you're yeah following the
lalr parser rules and at the end it either succeeds and you get back do you get back a
bool or do you just get back a type that is the tag type of saying this is a valid regular expression
or this is not a valid regular expression you can fail in multiple fashion you
can fail in some horrible expert like error in compiler also you can propagate uh boolean that
the regular expression is uh wrong and then you can have only one static assert checking the boolean
and with a nice error uh this is wrong, this is an incorrect error expression. I see.
So static assert is an assertion that
runs at compile time, right? It's just
either true or false, and there's a compiler error
if whatever you pass to the static error is not
a constant known at compile time and
true. Yeah, it must be known
at compile time. Right.
And then next step, when you have a parser...
Okay, so this is just the beginnings.
And then next step is to have some context as a third parameter during parsing.
Another stack where you, based on the symbol in the grammar, do some modification on the stack.
Okay, so you've got one that's just like the parse stack, which you were using just to keep track of where you are,
how many open parens and closed parens that kind of thing and the other one is kind of where you're
putting i'm guessing this is going to be like where the actual sort of compiled regular expression is
gonna for like typical expression grammar you can have like uh number and number and then uh action
uh plus came and you take two items from top of the stack and wrap them into number.
Into plus operation. And the plus operation is actually, you can easily map to the first stage
like the regular expression as an expression.
So you have like concatenation, so there is an operator concatenation, then you just take two elements from the top of the stack,
wrap them, and place them back to the stack.
Okay, so it's like a stack machine that you would write,
even if you're writing a very simple math parser type thing,
push two, push two, add.
Or if you've ever used a TI calculator.
Or a Texas instrument calculator.2, add. Or if you've ever used a TI calculator. Or a Texas instrument calculator.
Yeah, exactly.
But instead, multiplying numbers from input,
you actually store the operation with the numbers itself.
Okay.
So you kind of just keep it as that.
You don't actually add them.
You say this is an addition operation with one and one
or whatever the equivalent is.
Okay.
Yeah, the input is actually uh from the grammar the lll one parser will give you give it to you as a stream of operations in postfix notation okay yep so then it's just
two atoms concatenation atom concatenation atom concatenation and and then you have another sub-expression, and then at the end, there is multiple options
or and.
Right.
And then you will have the regular expression as a tree, as a type.
Got it.
And then you can easily run, because you just call the operator evaluate over the type,
and it will go recursively in all branches and at the end it will give you
true or false if the input matches this required expression so this time at this point we have
transitioned from compile time to run time right so that the yes again i'm just going to write
read this back to you to make sure i've got this straight because my head is spinning but
the the result of the pars is accumulated into this the second
stack that you're talking about which effectively ends up being a bespoke type who is only
responsibility in the entire world is to hold the compiled operations that will match the regular
expression that was given right so if i say a or b it's a tree representation of the a and the b and the or or something something
ish like that and then that type i can now at runtime give a string and it will do the matching
of a or b in this instance uh that's amazing because presumably that means if i make a mistake
i think well as you just said if i make a mistake in my regular expression, I don't have to wait till runtime for the thing
to throw an exception saying,
oh, by the way, that's not valid.
It happens at compile time.
But that's actually the first phase.
Right, that's the bit we were just talking about before
with the Boolean or the tag type, right?
Yeah.
And then, so what implications does this have for,
and what's the difference between, say,
just using a normal normal regular
expression your compiler knows everything you are doing and it can inline everything and
it's actually going down to assembly like compare compare compare compare red so it really does
net out all this sort of very complicated sounding sounding highfalutin type
based stuff once it's all boiled down to uh like actual code generation the compiler just
essentially writes the handwritten code that you would have done yourself if you were said okay
this needs to match either a or b yeah and it's just the two compares and an or or whatever and
we're done wow my like first approach because, because I wanted regular expressions in a really performant one,
I was thinking about using LLVM to write the code and use LLVM optimizer.
But LLVM is really hard to understand and get into.
So I knew C++, so I write it in c++ kind of and then optimizer
will do it for me anyway that's so you still get to use llv if you're compiling with clang you
really are using llvm it's just you're using it through the medium of normal c++ uh as opposed to
like you were saying if you were to you could useVM directly yourself, in which case you'd build up nodes
and you'd then build a DAG
and you'd build the comparison operations
and the basic blocks that are doing the compares
and whatever.
And then you'd say to LLVM,
hey, can you generate me the assembly for that?
Which would allow you to do some of that at runtime,
I suppose, which is the only advantage
that you might need
if you actually wanted to have dynamic regular expressions,
which is usually a security risk, right?
If you let somebody type in a regular expression, you're probably doing it wrong, like runtime
ways.
I think even evaluating a regular expression at runtime is a security risk.
Right, because depending on the regular expression, you could have some absolutely
catastrophic backtracking stuff going on where it could just blow out this...
Not just backtracking, you can just run out of stack.
Right.
Right.
I suppose so.
It's potentially unbounded.
Right.
I don't think I thought about that.
That's actually mildly terrifying given that my website is entirely regular expression based in terms of all of the horrible things that it does.
Mostly running at our clients. so it should be fine.
Oh, yeah.
No, there's plenty of stuff that's going on on the server, unfortunately.
Okay, good luck.
But, you know, like, I don't think a regular expression overrun
or stack smash is my biggest source of security risk,
given that I just take arbitrary bits of user code and run them
on my server so you know anyway that's uh also true yeah that's um yeah so that's amazing so
compile time is is letting us push stuff that would traditionally be done with like a a static
offline parser generator stuff and then a code generator like you said like llvm or even just writing your own stuff and being able to
hoist it into mostly sane looking c++ you know i have a very low threshold for template metaprogram
like from user perspective it's just ctr match angle brackets and in the brackets it's there
is a regular expression as you know it and so from a user's point of view it looks it's just a very slightly different way of looking at use like the
angle brackets instead of the round brackets i think corinthian said that uh they are called
hana brackets because of that but hana brackets oh that's what i'm going to call them um that's
we've had all sorts of funny names of the things over the time like i i'll call them chevrons you know like the things that like are um um and then operator like this and
no one can see this of course on the podcast but i'm doing like um two crocodile hands um facing
one way is like the output operator and then the other way is the input operator and and then um
a good friend it's like alligator and this is crocodile something like
that yeah i don't know how that works what do you call that i mean we've stopped using these except
for their you know as nature intended to shift variables around uh you know bit shift stuff
that's the right thing hopefully everyone's using fmt lib fmt this now nowadays for any kind of
actual string manipulation um and then the only other one that comes to mind is something that a
colleague of mine and ben's
used to call um mummy and daddy duck and their baby ducks which is whenever you have to use
double ampersands and then the dot dot dot in a template list it's you know amp amp dot dot dot
and it's like the the two mummy and daddy ducks and then the little little baby ducks and i think
now i can't unsee that now every. It's not very often I can write code
that needs that level of thing, but it's there.
What do you call that?
It's like a variadic R value.
It's a perfect forwarding
and variadic pack of variadic arguments.
Gosh, that sounds like a tongue twister.
And I'm probably incorrect
and probably someone from core will say to me, you should call it. Gosh, that sounds like a tongue twister. And I'm probably incorrect,
and probably someone from core will say to me,
you should call it unirazor references,
you should call it rvalue references, or no.
It's the double duck operator.
Stop talking about it, double duck.
Yeah.
It's the double duck operator.
Double duck is the best.
I think you're right.
No, that's cool.
So, I mean other other things i mean regular expressions are probably the um only the beginning of this right
like the in terms of like very small grammars that make sense to be able to compile into a
program you know we've all kind of realized that string matching is painful if you have to write
it yourself and regular expressions for all the warts that we were joking about earlier
they can be a very efficient way of encoding something you want to get done
and certainly ctre gives you a pragmatic way of saying here's a string matching thing that
everyone understands and yet i won't don't have to pay any kind of runtime performance costs i get
perfect assembly output that's going to match it as fast as anyone could possibly write
handwritten stuff what else can we do yeah i was gonna say like does this
general technique i mean obviously for regular expressions you've you've got this library but
does this mean that you can do more things generally at compile time using this technique
like you know i can imagine like you know uh a few situations in which you might want to have
other things calculated at compile time and be like,
oh, it'd be super cool if I could have a regular expression here,
but I can't because it's too early in the process.
And I mean, I don't know, that's sort of just my naive impression,
but does this open the door to more of those kinds of things?
You can calculate anything you want.
If you don't have any input-output
and we are working on inputs too
so you can actually open a
file, read it and then
do some transformation and then
store it in your binary
or generate code based on it.
It's from Jean-Hit Embed.
The stood embed thing, right.
Oh, okay.
This is long, you know, we've always wanted to have one
has always wanted to have like a uh a configuration file kind of thing where you know you want to have
something which isn't code necessarily but does go go into the code that's a brilliant way of
doing it i can just imagine you know json parsers yaml parsers things being able to generate bespoke
types potentially i mean template based types i guess not not yamo please not yamo why
i like yamo what's wrong with yamo hannah you need to write parser for it and then i will ask again
okay all right subset of yamo i get that the the ampersand thing that lets you refer to other
parts is probably very very tricky but maybe there's something else i'm not thinking of
yeah all right but i I don't like Jason.
Can I just say it out loud?
It's not very fashionable.
I know Jason is like the lingua franca of everything.
But like, I can't put comments in Jason.
And that makes it really hard for me to write anything where I'm explaining why some, you know, like package.json for like a node project.
Right.
You have to have a package.json.
It has to be valid.json.
Not only does it have to be exactly valid JSON,
like I can't have a trailing comma
and I have to always quote everything all the time.
I can't explain that I bumped a package version.
Please don't do, you know, this is why this is here.
I don't directly, depending on this package,
I want package, but I need version three.
Otherwise I get a security warning.
And I can't put a comment on the line saying,
I don't care about this package,
but it needs to be three or greater
or something like that.
And there's just written everywhere.
And I know I used to have a trick where I would always just put an extra key
of like under, under, comment, under, under, quote,
and then I could put a comment in the middle of a dictionary
and just hope that no one cared that there was the thing.
Anyway, rant about JSON over.
I think actually like parsing JSON is like 200 lines of code in c++ i recently write a bright one uh so
that i mean i've heard i know you know um ben ben dean and ironically of course jason turner
um did a presentation a long while ago on on parsing jason and i kind of like it feels like
that's great and it shows the technique off but i I would never want to do that because I hate JSON so much.
Well, I like Jason Turner, just to be clear,
if you listen to this.
Jason the person, not Jason the file format.
Yeah, right.
No, I mean, so I'm wondering what other things.
I mean, could one write a C compiler?
What?
I think someone did already. Get a few years ago there was some
someone on reddit uh with context per c compiler that's absolutely daft i love it i mean i've seen
people with you know the brain language brain f language that i won't swear on on our podcast but
that one i've seen that that kind of makes. I can see how that could go together,
even with my primitive understanding of template trickery.
But actually, it's not template metaprogramming.
It's actually Consexpr.
Oh, so now we... Yeah.
Yeah, go on. Tell us about...
So what's the difference between template and Consexpr?
Yeah, template metaprogramming is actually abusing
the properties of the language.
They came with templates to make it easier to write some containers, etc.
And then someone found they are actually Turing-complete.
I see.
So that was never intentional.
No, that's never intentional.
My naive thing of like, I want to write an array of or a vector,
an expanding array of type T, that's what I think of templates right and you're like well that's not turing complete
that's just like yeah fill in the dots with the t in all the places yeah right then there is a
contextual programming which is just really bad name uh for this function uh should or may be
in compile time and then you are writing just ordinary C++.
So you can actually write really easy code,
really simple code.
And in most cases,
you can just add a context modifier to all your function and it will work anyway.
So if you implement that C compiler,
just mark everything context pro
and it should probably work.
Wow. So that's i mean so that's
where we've come along from so what use was a kind of amusing hack that was discovered by folks back
in the day of like hey this is accidentally turing complete which i mean i think so many things in
life end up being accidentally turing complete right um to constexpr being sort of brought into
language as a way of tagging
just normal functions and if i remember rightly the first few editions of languages the language
that had constexpr it was super limited you could only like have a single return statement they were
afraid of that right you know you they were afraid of this and so you could write your
constexpr multiply two numbers together which was you know constexpr int multiply int a and b return
a star b that was all you could do.
And you go, oh, there you are.
That's really exciting.
I was like, well, what does that help me here?
You could do recursion, but you couldn't do conditions.
You couldn't do any cycles.
Then they allow it, and they allow it more.
In C++20, we allow allocations. So you can actually allocate,
you can use a set vector now. In constexpr time. So this is just a normal C++ program that's
running in the compiler effectively, and the output of it can then be used in things that are
in the compiled program. Wow. You actually cannot return
set vector from the function to out from the compile
time because there is like a problem that if you allocate something in compile time,
but it will live in a runtime.
I see.
So there is like the distinction.
So yeah, dynamic allocation that you did during compile time can't live on into
the lifetime of the program because it's not dynamic anymore, right? How do you delete
the memory that was like, well, it was in the compiler.
Yeah. All your allocation must be freed when you are leaving the constant evolution context
right but you can use it inside of your function to make the algorithms much easier
simple than just using templates and tricks you can actually implement your own stack with
vector with actual normal code rather than like consing on the side of some
lisp like thing that's getting bigger and bigger and bigger and so you're no longer
abusing the type system to hold the internal state of your algorithm you can just use freaking
variables right like like a normal person would do that's so cool exactly and then you can
implement the parser in much more ordinary way and it will be much faster
and i mean not like one or one magnitude but like exponential versus linear and this is what
runtime speed or you're talking compile time speed so the compiler itself is much better
at i don't know understanding the code that it normally reads all day every day to compile our
code than it is dealing with the abuse of the template system
and all the instantiations that requires.
Yeah, for example, if you have like a regular expression
which is a few kilobytes long, which is crazy, by the way.
I've seen the ones for like the match email addresses
and they look a bit like that.
Yeah, and if you try to compile it with a CTRE,
I don't know if it will
ever end and it uh usually it crashes on uh insufficient memory right i can imagine
because of all but then i uh made a prototype of ctree using context-per-parsing which is like
limiting also the template instantiation and then i tried a few megabytes
long cargo expression and it took like half second or something to compile that's such a big deal
that is huge yeah like we've talked a lot ben and i one of the things that the themes of this uh
of this podcast is like my impatience at waiting for code to build and and Ben's impatient waiting for tests
to run right that's our like our shtick here that's kind of true yeah and so having tools
in our toolbox that allow us to use these advanced techniques that are coming down the pipeline or
even here already that don't also suck compile time is's very important to be close to my heart. So I'm really excited to hear these.
That's amazing.
And you can actually easily test your code.
I was going to ask.
Yeah, exactly.
And your code won't ever compile if there is any problem.
You can just run it in compile time and place it into static assert,
like require in test.
And if you
are okay, it will compile. If not,
your compilation will fail.
So the compiler runs the test implicitly
as it's compiling. That
sounds like Ben's heaven.
Yes.
So the answer to the question
of should I check things with the
compiler or should I check things with test is
yes.
Yes, both. All of the above.
I mean, that's great.
But actually, you should test both.
In C++20, you can actually have different code
which will be run in compile time and in run time in the same function.
Oh, wow.
Yeah, this is something I wanted to talk about, actually.
That is one of the things about constexpr
is that any code you write that's constexpr can also be run dynamically.
So, for example, if you call a function like my multiply routine that I just sort of said, and you need it at compile time, the compiler will obviously evaluate it with all the constants available and promise you that, like, you know, you've got an array of size 10 because you did two times five, right?
End of story.
But if you call it with dynamic parameters,
it will just generate the code and do it like normal.
So therein, and then Hannah's just reminded me
or reminded us that you can now ask the compiler,
am I being evaluated inside the compiler
or am I being evaluated at runtime?
Like, am I in the matrix or not?
And then you can change the answer,
which of course, is terrifying.
And you shouldn't do that.
You shouldn't do that.
You shouldn't do that.
So when might...
All right, you said you shouldn't do it, but why would you need to know?
Yeah, sometimes it's really hard to implement something in context-perlimitation.
It's usually recursive algorithms.
But in the runtime, you can use some intrinsic, for i see or inline assembly i see so like for example if i had a
matrix multiply routine four by four matrix multiply routine then of course i want to be
able to do that at constexpr because who knows i might be rotating some points or something that's
in a big list of things but if i'm running on the real-time system i actually want to use the
the underlying x86 architectural
instructions myself to do it.
So you need to be able to switch on those.
So you can have two paths, one for compiler and one for your runtime.
So back to your point that you made, which is that having the compiler run your test
is fine, provided you don't have any code that differs.
But if you do have code that differs, you should also test it with the normal techniques techniques which brings me on to the next question i have actually about this which is how do you debug
this because i'm used to putting breakpoints and printfs everywhere and i don't think the
compiler lets me do that or maybe it does yeah there is no tooling for that and uh like one
trick i found is to create a template which is not defined. It's only declared, if I'm correct.
Okay, so you just say, there exists a thing called Bob,
but you don't give a body.
So it's like the compiler can't do anything with Bob if you try to do...
It's actually a template type name, something struct Bob.
Okay, so Bob is...
Yeah, sorry, this is terrible now.
I use Bob for anything, or Ian is the other one.
But Bob is of T, right?
And you know, so you've got a T that you can give it.
Okay.
And if you try to instantiate it somewhere,
like it's like print for debugging,
it will fail in compilation,
and it will tell you there is no instance of Bob T.
Print for debugging of,
and then whatever you put inside.
So, yeah, you can do your uh yeah you can
throw an exception kind of thing with a with a helpful message but the helpful message unfortunately
is just a type name inside a deliberately left out um bodied this is like good for debugging but
uh for uh like when i wrote ctre i start from the parser and I wrote a lot of tests to make sure it's working
properly and then I built on that.
So when I change something, it will immediately break
so I know something is not right and it helps.
So obviously test helps.
Well, that's very on brand for us.
So I'm glad to hear that.
Even when the compiler is doing all the work for you, testing is useful.
Yeah, you're actually writing a code.
So you need to test it.
If you want to test it, then it's meaningless.
Absolutely.
So how long until we have a template programming language testing framework? Actually I think catch2 already has some
macros which will force evaluating something in compile time to check it. So there are definitely
already some tools. But in CTRE I was lazy and my tests are actually a bunch of static asserts. And if any of the CPP file fails to compile, then there is a problem.
Right, right.
And every time I get some issue or mostly issues,
then I try to make it happen, the problem, and then I fix it.
Right.
If I try to fix it before that i don't know what i fixed yeah
that never works out gotta watch it fail first yeah yeah start from a failing test you know
and then work backwards otherwise yeah you so actually it's exactly same as normal programming
right exactly just like aqua yeah yeah i guess i've gotten so lazy with debuggers like actual
like gdb and whatever that i'm so, you know, I'll happily write something
without testing it as much as perhaps I ought to,
or at least certainly if I'm trying to understand
someone else's code,
you can really do far wrong
than single stepping through
and kind of going and getting the feel of it.
And I feel like there were perhaps this tooling
for that that would be useful.
Can I?
I think you can always hook into compiler with GDB.
That's true um if you can
understand what's going on inside gdb using sorry inside gcc using gdb i mean i wouldn't surprise
me that you can understand that hana frankly no no i definitely can't oh i have to ask by the way
so there there is a library called hana already that's a template metaprogramming library.
Who was first?
Who stole it?
Were you before the library or were you named after the library?
I was born before the library.
And then they stole your name.
No, no, no, no.
There is no connection at all.
I find that hard to believe.
I mean, there's certainly some normative determinism going on here.
Surely you're now part of it.
It took me a while.
It was like, I think it was a CPK, I did my lightning talk about CTRE,
and people were like talking, like, you must know Louis.
Oh, yes, right.
It's Louis, right?
Yeah, yeah.
Yeah, yeah.
Out of Wustana.
And no, what's going on?
I didn't know. So you didn't know that there is like, yeah, out of Boost HANA. And no, what's going on? I didn't know.
So you didn't know that there is a big compile time sort of pre-procedure.
Actually, I don't even know what HANA is, I'll be honest with you.
I know you.
It's like a style of using templates,
but without you actually knowing that you are using templates.
It's using function argument deduction tricks.
Okay.
And actually last week we had, no, it might be this week, I don't know, I'm not
sure anymore.
You have no idea what time it is or what day yet.
We had a study group seven meeting and we were discussing future of reflection and we were reminding people
that
boost HANA style
or HANA style function
were rejected.
Right. And there's you saying this, right?
No, it was David actually.
It's funny.
But no, literally no connection.
That's funny. It is. Maybe connection that's funny it is it's amusing maybe it was
prophecy maybe it was prophecy that well take it as prophecy that seems like a great great uh point
to end on here as we've got to time but um thank you so much for being with us today this has been
really really interesting and i think i've finally gotten i mean so full disclosure before now i've
been to several of honda's presentations on this kind of stuff
and I've never really had it stick in my head the way, even when...
No, no, no.
Matt was my remote clicker.
So before it was fashionable to do presentations remotely,
Hanna was ahead of the curve.
I forget, what was the reasoning you...
My flight was cancelled.
I couldn't go to the conference.
So I did my presentation remotely, but
something failed with remote control of his computer because I wanted
to run slides on Matt's computer because
there are animations and everything.
I was supposed to control it with VNC, but somehow it fails.
So Matt was my clicker. So I was supposed to control it with VNC, but somehow it fails. So Matt was my clicker.
So I was telling him, like, next slide, next slide, next slide.
And then there was, like, the audience,
and there was the slide with the expression tree.
And I asked Matt, click on the text box and write some expression.
So I got to do the cool thing with the presentation
and take all the glory
and then poor hano did all the work no no that was that was fun oh man that was fun but and that
was actually funny conference and yeah yeah it's my clicker that's that's my i peaked i peaked. I've peaked in my career now. I've been Hannah's clicker.
And I was very proud to be.
All right, friends.
Well, once again, Hannah,
thank you so much for your time and for explaining this all to me.
As I say, I think I have
a much better understanding now
than I think I've had before.
And thank you for inviting me.
Yeah, this is great.
Super interesting.
You've been listening to Two's Compliment,
a programming podcast by Ben Rady and Matt Godbolt.
Find the show transcript and notes at twoscompliment.org.
Contact us on Twitter at twoscp.
That's at T-W-O-S-C-P.
Theme music by Inverse Phase. Inversephase.com. cp