CppCast - C++ Compile Time Parser Generator
Episode Date: January 13, 2022Rob and Jason are joined by Piotr Winter. They first talk about include guards vs pragma once, testing for constexpr and the preview of Catch v3. Then they talk to Piotr Winter about CTPG, the C++ Com...pile Time Parser Generator. News Include guards or #pragma once Test an expression for constexpr friendliness Catch v3 Preview 4 Links C++ Compile Time Parser Generator Peter Winter's Blog Deadline24 2013 | Future Processing Sponsors Indicate the #cppcast hashtag and request your PVS-Studio one-month trial license here https://pvs-studio.com/try_free C++ tools evolution: static code analyzers: https://pvs-studio.com/0873
Transcript
Discussion (0)
Episode 332 of CppCast with guest Peter Winter, recorded January 11th, 2022.
Sponsor of this episode of CppCast is the PVS Studio team.
The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about guarding your includes and testing for constexpr.
Then we talk to Peter Winter.
Peter talks to us about CTPG, the Compile Time Parser Generator. Welcome to episode 332 of CppCast, the first podcast for C++ developers by C++ developers.
My host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm all right, Rob. How are you doing?
I'm doing okay. I don't think I have anything to share this morning.
How about you? Anything going on?
No, maybe just a quick
reminder, though, that we have called
the papers out for C++ on
C and C++ Now
at the moment.
And, yeah.
I mean, if you're interested in
hoping that you'll be able to go to a conference
in 2022, then I
suggest you submit a talk to one of the open conferences.
Yeah, hopefully all those conferences can operate safely this year.
Yeah.
Things are a little crazy right now, but hopefully.
All right.
Well, at the top of every episode of Lake Toyota piece of feedback, we got this tweet from Matt Fernandez referring to last episode saying, this was a great episode.
Slobodan is one of the first C++ practitioners I've heard from in a long time
who gets that modern C is a different beast from the 80s C and actually kind of good.
And, yeah, that was great talking to Slobodan.
And it's interesting learning about some of the new modern C stuff
because a lot of that was news to me.
Right.
Yeah.
Well, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook, Twitter,
or email us at feedback at cppcast.com.
And don't forget to leave us a review on iTunes
or subscribe on YouTube.
Joining us today is Peter Winter.
Peter started working as a C++ dev in 2005,
working on a system Verilog compiler.
Since 2010, he's been working for future processing in
poland he's worked on many commercial projects since then not always a c++ developer he had a
minor detour as a fullstack.net web developer which he did not like he was also an organizer
of a 24-hour competitive programming marathon deadline 24 which was last held in 2018 and his
main interest as a c++ developer was always compilers and parsers,
which eventually led to the creation of CTPG.
Peter, welcome to the show.
Hello. Glad to be here.
What is the Deadline 24 about? What was that?
Yeah, that was really cool.
So, future processing was...
I think it started 2009, so a year before I even joined future processing was, I think it started 2009,
so a year before I even joined future processing.
It was local, like competitive programming marathon.
It was always 24 hours. So, it was a team of three members and three tasks.
And you had 24 hours to do these tasks and no internet access.
Yeah.
Is it possible to program without internet access?
Yes. Apparently it is. So first, it was a local thing, so like local college and participants.
And it turned out to be something really big with people like from Google and Yandex winning the competition.
Oh, wow.
The best algorithm experts from all over the world actually joining.
And so it was kind of like,
it was a victim of its own success, let's say.
Because the costs are almost bigger than the benefits.
It's hard to, it's like,
at first it was supposed to allow us to hire people
that win these competitions,
but then when Google guys win it they won't be
working in Poland right so
and it was really expensive
for the company to actually
prepare it
it was just mainly because of
our time actually because
the venues weren't like
the big issue but it was like offline it wasn't
the most interesting uh location the venue was actually underground mine three three hundred
meters underground so oh wow no internet no internet access was actually like not chosen
as a venue just to make sure you weren't like secretly looking at some code, stack overflow post on your phone or something like that?
I think this was just a cool location.
But yeah, that's the fact.
There just was no internet access, period.
That's funny.
That sounds like fun.
Yeah, and it was, the tasks were actually like you were supposed to code
a program
we were writing servers
game servers and the programs were fighting
with each other
on an arena
so something like that
the tasks were so
heavily focused on algorithm
algorithm and
yeah but this is discontinued at the moment heavily focused on algorithm algorithm and yeah
so but this is discontinued
at the moment
so it was everyone
actually on site in that
mine location
it was 90 people
30 teams
3 people each
wow
yeah that sounds like it could be cool all right well
peter we got a couple uh news articles to discuss uh feel free to comment on any of these and we'll
start talking about your uh ctpg project okay okay all right so this first one is a uh poll on
on reddit which i thought was interesting um do you prefer include guards or using pragma once
and i i did get the uh beautiful c++ book and i haven't read through the whole thing yet but uh
this was in the first chapter where guy and kate uh recommend using the include guards because
it's standard and even though pragma once might be available in all the modern compilers uh it's standard, and even though Pragma 1s might be available in all the modern compilers,
it's not standard.
What do you two both think about this, though?
I'll let Piotr go first.
Sure. I've never used Pragma 1s
myself. I've seen it.
I don't know
why, really. Probably because
it's not standard, so I just didn't
feel like it.
I just thought that both do the same thing.
So I would
never use it intentionally,
but I didn't give it much thought, actually.
Maybe now I will.
The nice thing about Pragma 1 is you can't
make a mistake. You can't
accidentally copy,
paste, and include guard and not
get the name quite right if you're yeah and i
suppose the compiler can do a better job in terms of performance because the prep processor needs to
actually go through the file and seek for the end if right so it takes time maybe yeah that's an
interesting question i got the impression that it used to save time but maybe doesn't anymore as our pre-processors have gotten better
but now i'm curious yeah well for sure the if the guards can do at most as good as pragma ones
there's no way you can do it better than pragma yeah yeah absolutely i'd have to agree with that
i've always been on the side of it's not standard, therefore I don't use it. Plus, I've read the arguments for why it's not standard, because while it saves you from making a mistake,
the compiler could still make a mistake. Like if you include the same file two different ways,
the common example is like, if you've got two different sub libraries that each include Zlib,
but they have their own embedded Zlib then you might get zlib
included twice if it was both if they're using pragma once without include guards there but i
mean you probably have other problems in your code in the first and in that case as well but there's
a comment down here from twim thomas swirly it's at least half of the way through it. And Tom says, after reading this page,
he's concluded that the only option is to actually use both. Really? To save yourself
from yourself and to save you from the compiler, you have to use both. And then that's after
reading all of the everyone else's arguments. So this is why we can't have nice things, I guess.
Yeah.
And which order?
That's a good question.
Would that make a difference?
It would definitely have to make a difference, right?
Because if the pragma gets if-deft out, then I don't know.
I think you would want to put the pragma first.
Yeah, anyhow.
Okay.
Next thing we have is a post on Arthur O'Dwyer's blog,
and this is test an expression for constexpr friendliness.
You picked a meaty one here, Rob.
Oh, and I thought this one would be good
because of what we're going to be talking about
in a few minutes with Peter,
testing for constexpr.
But yeah, he's talking about this test you can write to see if a function is constexpr. But yeah, he's talking about this, this test you can write to see if a function is
constexpr, but then also going into how you can't do that for const deval functions, which was
interesting. Yes, I think for this, for the sake of our listeners who, you know, can't see this
code, it's effectively using SphinA to rule out an expression that cannot be evaluated
at compile time but the much more readable version of it is the uh well ernesto guiero
pena provides an alternative idiom using requires instead of sfine and a requires expression. And so that one is more readable.
If you scroll down to that one, you can say,
oh, requires that this thing results in a,
you know, that this thing is compilable.
And the only way for it to be compilable
is for the function that's passed in
to be something that can be executed.
It can constexpr context.
And if it can't be, then the require fails
and you just get false back.
Right.
But that version of it also still does not work in const eval.
Right.
Yeah.
And the thing that he then goes on to talk about
is that that seems to be intentional,
that you can't evaluate const eval.
It's just always going to be compile time
and you're going to get errors if you try to use it otherwise.
Right.
It makes sense, actually.
If we are forcing that this function should be compile time,
then it should just be an error, right?
Not just like sfina.
It makes sense, I think.
I think so, yeah.
Although I still sometimes wish that we had more ways of doing
introspection into the
code in a clean way of just asking
is this something that could be run at compile
time or whatever, you know, with
reflection
kind of mechanisms that we don't currently have.
And it doesn't look like we're getting that
in C++23, but yeah.
C++26?
29? Yeah. 32? Yeah. depends on when uh they can start doing meetings
again i guess yeah all right and then the last thing we have is an update on uh catch 2 and this
is actually is it catch 2 v3 or is it going to be catch 3? I don't know, but either way, it's the V3 preview, fourth preview.
And a big list of what's changing, what some of the breaking changes are.
I didn't have a chance to read through this too much, actually,
so I was hoping one of you could tell us a little bit more about Catch
and what's coming in V3.
I use it in CTPG.
In version 3, specifically?
I tried.
Okay.
Yeah, but I had struggles, and I just reverted to version 2.
I had struggles, because I was compiling.
So catch 2, this version 3, is a static library now.
And it's not distributed as a binary artifact.
So I built it using GCC, and I was testing a clang build of CTPG,
like the unit test,
and it just segfolded right away.
Oh, wow.
And that's because Catch 2 is using C++ interfaces under the hood.
So you have to build both Catch and whatever test
that you're writing with the same compiler and the same
options compilation options and that's something that just the maintainer told me on under discord
i believe um yeah so i just got back into using the version two but because i was like the whole
point of it being now static library is shorter compilation time.
But am I supposed to now compile it for each compiler
and each compilation option separately,
which sort of defeats the purpose?
But then I thought that maybe not really,
so maybe I'll give it another try.
Because you are really building it once for each compiler,
not for each
file that you write the tests.
So if you have some sort of unit
test project, and you have like 10
CPP files with tests,
you don't have to necessarily
compile Catch-2 10 times, right?
For each file, you just do
it once for the static library, and then
multiply it by the number of
compiler options. So
maybe it will actually save me some
compilation time in my
GitHub action, because I'm using
it on the GitHub actions.
Yeah, anyway,
it's a cool idea.
I was just going to say, Jason, it sounds like this kind of
relates to the recent discussion
we had with the two Bloomberg guys
about distributing modules, right yeah i uh it also makes me think um this is not dissimilar it is similar
to how i have how i use catch to in my own projects because you can do the uh the pound define catch main or whatever. And I actually build a static internal library once
that has the catch main in it.
And then I link to that in each of my catch tests.
So I do save some compile time there.
So it might be a similar kind of setup
with catch two version three that you would have to do that.
All right. So, Peter, could you start off by telling us about the CTPG project that you're working on?
Yeah. The unfortunate name, the acronym, they just stole it from CTRE.
So that's just guilty.
Okay.
The convention is totally stolen.
So it's CompileTimeParserGenerator.
I mean, it's self-explanatory, right?
So you want to have a parser.
You define it using a grammar.
Sort of the same way you would do this using
bison and flex.
In theory, because we are using C++,
so instead of the
operators that are in the bison syntax,
you use C++ operators
for this purpose.
yeah, and the compiler
does the job for you
and generates a bunch of static arrays.
So the parser is a constexpr object
with a ton of numbers and C arrays, basically, in it.
And then it has a parse method,
which you use on some buffer,
whether it's runtime or compile time
you can actually invoke
the parse method of that parser
in a constexpr
context
yeah
so that's basically it
so you have like flex
plus bison inside
C++ compiler
that's what it is actually of course it's um it's nowhere near
um flex plus bison like feature wise because these are like for 40 years projects and mine is two
and actually released a month ago um yeah but i'm looking forward to replace Bison.
Joke.
You just said when you give it the grammar,
the result is a static array with a bunch of numbers and letters in it, basically.
So are you building a state machine?
Yeah, exactly.
Okay.
Exactly a state machine. For both lexical analyzer and the syntax analyzer,
these are just two separate state machines.
The lexer is actually the finite state automaton.
The parser is a bit more complicated.
It has a stack of states.
But it's still a state machine,
but it has an operation to push and pop from the stack.
Basically, you'd have to look at
the algorithm of the
LR parser.
I've actually used it from the book.
It was compilers, design principles, or something like that.
The Dragon book, that one?
I have it mentioned on my GitHub.
Okay.
So the algorithm, it's comp with principles, techniques, and tools.
And the authors are Alfred Aho, Ravi Sethi, and Jeffrey Ullman.
It's a really old book.
Yeah, but I used it.
Yeah, that's the one that our listeners might know of as the dragon book,
because it has a dragon on the cover.
Yeah, okay. So the algorithm is... So the creation of the parsing table is...
So the algorithm itself is actually very...
It's complex in terms of algorithm complexity.
So that's why it was challenging to write it optimally
and in a context-per-context at the same time.
So just to put it in the perspective,
let's say a JSON grammar,
and it takes like eight seconds to compile.
I don't know if it's too much or not.
It depends what you want to achieve.
If you put that grammar in a separate compilation unit and don't change it much
then I think it's not a big deal
but I
wouldn't use it for
something like
C grammar maybe
just not yet
I would need to work on optimization
a bit more first
but anyway
I was wondering when you said you would work on optimization a bit more I but anyway sorry yeah go on i was wondering when you said you
would work on optimization a bit more i'm wondering do you mean work on the optimum
compile time optimization or the runtime optimization right runtime is like blazing fast
so it's um it's a state machine so it's and there is no backtracking. It just reads each character exactly once and does a linear time parsing every time.
So it's no backtracking, nothing like that.
It cannot be actually faster, I think.
You would like to...
So it's just if I read this character,
then jump to this state.
If I read this character, jump to this state.
Basically, that's it, yeah.
So, like, the compilation time.
There are a couple of points I could improve.
Algorithmically,
so in terms of the algorithm,
not really.
That's like, you can't do much there,
but I have a couple of ideas.
For instance, if you look at the, like,
the page with C language and the operator
precedence right and precedence
of operators in C
you look at this table and you see that
C language has like
11 assignment operators
which are basically
the same they have the same
precedence the same associativity have the same precedence,
the same associativity.
They are all, I believe, right associative.
So you could treat them as the same, basically.
So not to complicate grammars,
not to add rules,
just add one and do it in a smarter way.
So I've tried
actually putting
the grammar of the
expression syntax from C language
and it was too big.
The compiler ran out of resources.
The compilation time
was not that issue.
The big of the deal, I mean,
it went
something like a minute. I mean, it went something like
a minute. I think it will compile
it, but I'm not using
really powerful machine. I'm using virtual
machine for Linux work.
And I'm developing
CTPG on Linux. So
I would
have to increase maximum number
of constexpr operations.
There is appropriate switch
in both gcc and and clang i don't know whether why this is the maximum it's it looks arbitrary but
it does look arbitrary yeah um it's not like a special number for me nothing really comes to
mind it's something like 33 million something. Not sure why it's that number so relevant.
Anyway, when I increased
it, it still ran out
of RAM, so the compiler.
And MSVC, like,
this thing is
just eating RAM like crazy.
So
even though it is supported
in CTPG, just expect
it takes RAM at least five times more than GCC and Clang.
I don't know why.
Constexpr operations in the Microsoft compiler is much more RAM intensive.
In compilation time, they're pretty much similar.
So going back to the C grammar,
I just put just the expressions,
right, just the operators, just this grammar,
so I can do a calculator that's
similar with features that
the C operators have.
And it was a bit
too much, so I thought that
I could add
a feature that will
group similar operators in one group,
so the grammar gets less complicated.
So the next challenge would be to actually reduce the size of the final parser.
Right now, it can go up into the megabytes if you are using
too much rules.
That actual static data table?
Yeah, it's not a problem
really, but the binary gets
big.
And these arrays are really sparsely
populated, so that's
a way to optimize it, I think.
Because most of
the states are actually unaccessible.
It will be just an error.
I didn't do the optimization
step on this
thing yet.
There is room for improvement,
basically.
The sponsor of this episode of CppCast is
the PVS Studio development team.
PVS Studio is a static code analysis solution that helps enhance code quality, security, and safety.
The analyzer detects bugs and potential vulnerabilities in C, C++, C Sharp, and Java code on Windows, Linux, and macOS.
CppCast listeners can use the CppCast hashtag to get the analyzer's one-month trial version.
To request the trial, use the link in the podcast description.
C++ projects are getting increasingly complex,
too complex for people to analyze them thoroughly during code reviews.
That's where code analyzers come in.
They notice everything the human eye misses,
thus making code reviews more productive and enhancing code quality.
Want to know more about the problem?
Take a look at the recent article from the PVS Studio team,
C++ Tools Evolution, Static Code Analyzers.
The link is in the podcast description.
So right now we're talking about, like, you know,
kind of crazy edge cases, like putting in the C grammar.
What are some of the more realistic things
you've done with CTPG?
Yeah, I've done JSON parsing, both runtime and compile time.
So the talk you, Jason, gave about
constexpr, all the things, that was
actually the thing that inspired me
to move into this direction,
because originally I tried
to do this
using metaprogramming,
and it was just not usable.
It waits as far as the eye
can see.
Yeah, it's too much RAM consumption by the compiler and too much time.
So this tool was not usable really for me.
So then I tried to move into the constexpr realm,
and it worked.
So I think the biggest thing I have in my examples
is the JSON parser, yeah so it's really
cool, it compiles
like in 8 seconds like I said
on my machine
and it can
and also there is this
like compile time JSON
version which you can do with
CTPG using just 200 lines
of code
so you can have like json constants in your
c++ code and i mean the most tricky part was actually the representation of the json that
can be constexpr which is also what you face right to solve it it's and I'm doing some trick that actually makes...
So if you think of a JSON,
right, and you measure
the number of characters it has,
that's the maximum depth it can
have, right?
It's overshoot, really, but
yeah,
you can't have
more depth than characters in JSON, right?
And also you can't have more depth than characters in JSON, right?
And also you can't have more array elements than the characters.
And each of the objects cannot have more elements than there is characters in this JSON string.
So I'm using this cap to statically allocate.
I mean, in compile time, allocate meaning just static array to statically allocate,
I mean, in compile time,
allocate meaning just static array, right,
of this size,
and just operate in this term. So I don't have to, like,
parse the JSON string twice
just to see what are the dimensions
and depths and such.
Right.
Yeah.
So, okay, so that's, like,
beside the CTPG, that was the challenge that I faced you. touch. Right. Yeah. So, okay, so that's like beside
the CTPG, that
was the challenge
that I faced
you, so I can
do the JSON
parsing in
compile time.
Actually, and
then there are
like a couple
of minor
examples just
showing of
the features
of the
CTPG
itself.
It has like error recovery, for instance,
so you can put a special token in your grammar
saying that, so you don't have to parse
until the first error and then just stop.
You can have like expected errors.
I think like the C++ parsers have it,
so when it reaches like the semicolon,
it disregards all the syntax error before and starts parsing again.
So you can have multiple errors in the same compilation run.
It doesn't stop on the first error.
You can have that in CTBG generated parsers too.
You can have operator precedence specified and a couple of other features.
So just to compare it to Bison, let's say what Bison did was it took your grammar
and then you wrote the C++ code inside Bison grammar, and it was just copied into the generated
code, right?
Okay.
So how I deal with this is, because this is a C++ code, I'm using the actual functors,
right?
The function objects and lambdas.
So the syntax is constructed in such a way that after you write the rule for the grammar, you can
associate with it
some executable code, right?
In the form of a function object,
being that whatever function object.
You can put a function there
or a lambda or some std
bind or whatever.
Yeah, so...
So if you wanted to use
a bind expression
then you
probably can't execute
that grammar at compile time
I'm guessing there's some limitations if you want it to be a
compile time executable grammar
versus one that's runtime
yes but if you use
like a lambda with a capture
so actually capture can be a reference that is not a const reference at all.
You can modify things.
Okay, so the parser object needs to be a const x pro object.
So you can give it a lambda, for instance.
But if this lambda, like, modifies something in runtime,
then, of course, it cannot be used as a constexpr parsing.
Yes.
Okay.
Exactly.
And also, so the parser,
so the result of this parser is some kind of value,
because all of the non-terminal symbols, just using the terminology from Bison,
all of the non-terminal symbols in the grammar have a value type.
So it's whatever C++ type.
And so to use the parser in a constexpr constex,
all the value types need to be literal types. Right. And so to use the parser in a constexpr context,
all the value types need to be literal types.
Right.
Yeah.
So if you... Yeah, so that's it, basically.
I mean, the literal type, I don't know if it's in the standard,
like if the term is actually standardized.
I think it's no longer, but I may be wrong.
So the limitation is basically the types need to be default constructible
and I think trivially destructible.
Right.
I don't know if the standard actually defines something like literal type.
Yeah, I think they actually did remove that definition in C++20
because of the weirdness of constexpr destructors in C++20.
Yeah, I think so too.
Yeah, so the limitation for the CTPG types
is default constructible and trivially destructible.
Right.
So if you just adhere to this concept,
then you're fine just parsing in compile time.
So you said that the JSON parser that you've written is blazingly fast,
but have you actually done any kind of comparison to existing JSON parsers?
Not really.
And I mean the runtime parser
the usual one
not the constexpr parser
so the runtime parser
it is fast because
it's a state machine
but I wasn't focused on
making it really fast
so I'm using standard containers
like maps and vectors
and this is actually
the most time it spends.
It's just allocating
things, right?
First, I would need to optimize
that to actually make any comparisons.
And
well, the obvious solution for
that is using PMR,
right? So that will just...
So actually, it's a good idea.
I think I could try to compare it.
I don't suppose it will be faster
than something that's really done
just to be the fastest JSON parser
because that would be for sure handcrafted
and written in C,
or I cannot beat that.
But does it have a hand-rolled, highly optimized state machine
would be the question, which I'm guessing people don't do that.
That's an interesting question.
I need to check it.
How fast would it be?
But let's say it's only 20% slower,
but what you get in return is a JSON parser
that you basically don't have to maintain because it's generated from the grammar, right?
So it's for sure, it's less error prone.
It's going to have just less bugs.
Like, I mean, it's better to define it
as a grammar right than just
handcraft the parser
I suppose
so the code base will be really small
I mean I don't know
how many lines right now
it has I can check just from my
github page
so
the JSON parser has 292 lines, and it just works.
That's pretty cool.
And most of the code is actually Unicode handling,
so you can store Unicode strings in std strings.
And I think it's like two-thirds of this code.
So it's like 100 lines to have a JSON parser
that's not dealing with Unicode.
It's really cool.
Yeah.
I think that's an interesting thing
that I so often have difficulty
getting people to understand
that there's so much that you can do at compile time
and they think of
compile time programming as meta programming and you said you started this project as a meta
programming project and said this is pointless and instead now you're building a a uh state
machine at compile time using just regular constexpr right i just want yeah it wasn't
pointless i had great great time doing it
a few times maybe
when I'm looking at complicated
templates in C++
it warms my heart
but it was unusable
it worked
it actually worked
but anything like 10 grammar rules in a compiler is out of heat.
I just think this concept of building a state machine or a jump table or whatever
using normal programming techniques and then being able to replay that at runtime,
generate a compile time, replay it at runtime,
it could have a huge application
for our listeners who maybe just haven't considered that before.
Yes.
And with the C++20 vectors and strings being constexpr,
it's a huge step, I think.
I wouldn't actually benefit from it,
just because I
you can have a constexpr
std vector but you have to
basically deallocate it before you
go out of this. It has to
stay in the constexpr context, right?
Yeah. And I need to have
the state machine there. It's not
like I can just calculate
something and give back
like a result and disregard the vector.
I need to actually calculate the vector.
So I don't think I would benefit in CTBG from the STD vector being constexpr.
So are you still on C++17 then?
Yeah, I don't intend to change it, just to make it like the audience a bit bigger
because not everyone can use
bleeding edge
compilers. It runs on
GCC 10.
Okay. I think 10.2
and 10.3
is working.
The clunk is clunk 12, I think.
Just because the const
expo, even though it was in C++ 17, right?
But it wasn't fully supported until most recent versions.
And also the performance was lacking.
There were some compiler bugs
if you tried something in a constexpr context.
So, yeah.
That's, yeah,
a lot has changed.
I'm sorry, did you say which version of Visual Studio
is supported?
I think it's 19.30.
I mean the compiler.
So I think it's the
newest one, 2022.
Before that, the constexpr in the compiler, right? So I think it's the newest one, 2022. Okay.
Before that, the constexpr in Microsoft's
compiler
was just
not working properly.
There were just too many bugs.
Yeah.
So when I
was using the Visual Studio 2019,
it wasn't compiling.
Now it does.
We need to now push for as much as we can push.
I hope, I guess, that our compiler vendors start making compile time
on Sexper evaluation faster, right?
Like, this would be a big win for a lot of projects
if we could do more things at compile time with less pain exactly
so just to improve the performance of the compiler so you mentioned that you're sticking
with c++ 17 to make your audience bigger and then you said you don't think you would gain anything
from constexpr string constexpr vector and i've been spending time with that myself lately so i'm
not entirely sure either at the moment but is there any other features from 20 or 23
that you would make use of if you could?
I would benefit from std variant going full constexpr for sure.
Okay.
std pair.
Oh, okay.
In 20, it's full constexpr, also the variant, right?
Right.
Because now I'm doing some strange hacks.
Because for some reason, the constructor is constexpr
and the assignment operator is not.
Right.
Yeah.
So I'm using some strange hacks to just make it constexpr.
Also, I believe the optional, std optional, is the same story.
More constexpr also. Yeah. Also, I believe the optional, std optional, is the same story.
More constexpr also, yeah.
Yeah, so it will just make the code a bit cleaner, I think,
but nothing really breaking.
Yeah, so...
And the one thing I would need...
Yes, and I'm actually doing the regex, right, in the CTPG
for the lexical analyzer.
So I think it would benefit because you could have, like,
so the regex gets its regular expression as a template parameter, right?
So it's a string.
So in C++20, you can have it as a string constant,
right? In C++17 you need to have
a static linkage object, so
I have this limitation, right? This would go away
in C++20. You could use an
actual string constant there. You don't have to define it
elsewhere. That's the same limitation that
CTRE has, right?
So if you're using C++17,
then you need this static
linkage so it can have...
so it can be used, the string can
be used as a template parameter.
So do you also
implement your own regular
expression library as part
of CTPG?
Yes. I mean, I didn't implement it. I have CTPG? Yeah. Okay.
I mean, I didn't implement it.
I have CTPG, so I could write a regular expression parser using CTPG.
And then use that inside of your CTPG?
Yeah. CTPG.
It's sort of an inception thing, yeah.
Okay.
So that's exactly what I do, actually.
There is a grammar for the regular expressions
inside the CTBG header
it has a custom lexical analyzer
because you cannot use regex
because that would just be
circular logic or something
I have just a parser for regular expressions
but the lexical analyzer So I have just a parser for regular expressions,
but the lexical analyzer for regular expressions is a custom-written one.
But it's not complicated because it's just character by character.
It's just a couple of special characters in regexes,
and the rest of it is treated literally.
We've mentioned HANA's CTRE library a few times.
Have you compared your regex to CTRE?
Well, certainly it doesn't have the amount of features.
This one is really basic.
This is just the subset of regex features that you would usually need
for your custom
domain specific language right or Jason there are no like obscure all the obscure
regex features that are rarely used are just not not implemented I didn't just
focus on them it will they will just unnecessarily complicate
the grammar and just increase
the compilation times
for your actual grammars.
So I think
it's fine this way.
And also, well,
I think that Hannah's
library is using the
LL parser, and mine is
LR, so the LL is a top-down parser and mine is LR so the LL is
it's top-down parser, right?
And this LL stands for
like left-to-right
and leftmost derivation
which an LL parser is
like bottom-up parser
and it uses the rightmost derivation
so it basically
tries to
from the leaves, like the
terminal symbols, and it tries to make a bigger concept, like for instance...
So the bottom-up parser goes from the most specific language terms, like the identifiers, operators
and so on into the
higher level concepts
whether this
the top-down parser does the opposite
so let's say
it basically parses me a class
that's what the top-down
parser would do and then it tries
okay so it sees
a header let's say for a class and, okay, so it sees a header,
let's say, for a class, and okay,
so I'm parsing a header now.
And then I see the identifier, okay,
so this one is the opposite.
That's why it's a state machine, right?
So the LR parser is,
the downturn of using such a parser
is the step of of using such a parser is the step of
generating actually the
state machine, which is time
consuming. It's an algorithm
that has some complexity to it.
But the final result
is just a faster
parser. And also the
grammars, the set of grammars
that you can give it is
much bigger.
You can have left recurrence, for instance,
which you cannot have in the LL parser.
So you can define expression as an expression,
plus sign, and then another expression, right?
You don't care that this is a recurrence.
LL parser deals with it just fine.
So, yeah, plus the C theory,
this is actually the handcrafted parser for regular expressions, right?
And the CTPG is just a generator.
You can have any kind of grammar, right?
The regular expression grammar would be just an example.
So I don't know a lot about parsers and how grammars are defined.
It's been a long time since I did any of that.
But I'm curious, you mentioned the Flex and Bison that came up earlier.
You said you, of course, don't have nearly as many features as those.
But does it work in approximately the same way?
Do they also generate a state machine internally and they are also L LLR
parsers or whatever,
like you're doing.
Okay.
Bison can,
you can specify what kind of algorithm you want at the output,
the LR being actually the default,
I think.
And then you can specify that you can specify default, I think, and then you can specify that, you can specify
several, I think,
mine is just LR,
so it's the most
powerful one, exactly.
So,
yeah, and it basically does
exactly the same thing.
It generates a state machine,
but, yeah,
Python does it actually
just spitting out C or C++ code with a huge state machine. But, yeah, but, like, Bison does it actually just
spitting out C
or C++
code with a
huge switch
statement or
something.
And the code
of Bison is
actually unreadable.
Or it was
when I was
using it,
like, 15
years ago.
But that's
not the point
of generated
code, right? It generated code I was using
Python and Flex
when I was starting my career
as a
system verilic compiler
developer
so we were dealing with
Flex and Python
because it was generated code
and it was the early days of version control.
So it was a pain because the generated code ended up in the repository.
So if two people wanted to do something else inside the grammar,
change it somehow, it was really hard to maintain.
So eventually we ended up having the grammar files in a repository
and the generated code outside of the repository.
But that has a drawback that you need a tool,
like the Bison tool needs to be a part of your build system.
And it can be a pain too because it was running on
Linux fine, on Windows not
so fine.
So we ended up porting
Bison for Windows
for our purposes. I believe it runs
on Windows fine today.
So we
ended up... So actually
each automated build
started by building Bison.
So you can imagine.
With CTPG, you just have C++ compiler, right?
And it just works.
Of course, I wouldn't do, like I said, C grammar with CTPG,
but it's just a different approach.
Have you made use of CTPG in any
production projects
or just the examples
you have? Not yet.
I'm just judging by
GitHub issues.
People are using it in places.
That's cool.
It has 270
stars on GitHub right now.
Wow.
And you said you just put it out like a month or two ago, right?
Yeah.
Wow.
Some people on some universities use it just to teach parsers and compilers
because it has a really nice feature of diagnostic messages.
So you can see your parser
as a state machine
and what is done on what
terminal symbol and where are
the conflicts
and people just
trying to adopt it somehow into
their projects, usually
probably
no one is touching with me
and saying that they're using it in their commercial projects or so.
It's MIT, so licensing, so they can just do it.
It has like nine forks already.
Wow.
I'm not sure it's the first open source project of mine,
and I'm not sure if it's a big number or not.
I don't have any comparison.
I mean, Catch 2 has several
thousands.
It sounds like it's
doing pretty well for something that's only been
out for a short period of time.
It's since November, end of November
so like a month.
Well, it definitely sounds like listeners
if they're interested in learning more about
parsers and how parsers work that this is a good project to go look at and play around with.
Yeah, I would love to hear from someone if he's trying to use it in his commercial project.
In my experience, you'll never learn.
I've had people come up to me at conferences and say, oh, yeah, I've been using ChaiScript in my commercial project for the last five years.
And I'm like, you could have told me sooner.
Would have been nice to know.
It's not like I would charge you. I cannot.
I can't. It's a BSD license.
But if you want to be a sponsor on GitHub or something, I could use some money.
That's right. GitHub sponsorship doesn't exist.
If you want some feature done, let's say,
I wouldn't mind doing it.
It's a project that I'm doing in my spare time.
So, of course, yeah.
Well, I definitely encourage all our listeners
to go check it out.
And Peter, it was great talking to you.
And thanks for telling us all about CTPG.
Thanks for coming on.
Yeah, thank you.
Thank you.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in,
or if you have a suggestion for a topic, we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. Thank you. And of course you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode was provided by podcastthemes.com.