CppCast - C++ Compile Time Parser Generator

Starting point is 00:00:00 Episode 332 of CppCast with guest Peter Winter, recorded January 11th, 2022. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about guarding your includes and testing for constexpr. Then we talk to Peter Winter. Peter talks to us about CTPG, the Compile Time Parser Generator. Welcome to episode 332 of CppCast, the first podcast for C++ developers by C++ developers. My host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing?

Starting point is 00:01:19 I'm doing okay. I don't think I have anything to share this morning. How about you? Anything going on? No, maybe just a quick reminder, though, that we have called the papers out for C++ on C and C++ Now at the moment. And, yeah.

Starting point is 00:01:38 I mean, if you're interested in hoping that you'll be able to go to a conference in 2022, then I suggest you submit a talk to one of the open conferences. Yeah, hopefully all those conferences can operate safely this year. Yeah. Things are a little crazy right now, but hopefully. All right.

Starting point is 00:01:56 Well, at the top of every episode of Lake Toyota piece of feedback, we got this tweet from Matt Fernandez referring to last episode saying, this was a great episode. Slobodan is one of the first C++ practitioners I've heard from in a long time who gets that modern C is a different beast from the 80s C and actually kind of good. And, yeah, that was great talking to Slobodan. And it's interesting learning about some of the new modern C stuff because a lot of that was news to me. Right. Yeah.

Starting point is 00:02:23 Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cppcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Peter Winter. Peter started working as a C++ dev in 2005, working on a system Verilog compiler.

Starting point is 00:02:43 Since 2010, he's been working for future processing in poland he's worked on many commercial projects since then not always a c++ developer he had a minor detour as a fullstack.net web developer which he did not like he was also an organizer of a 24-hour competitive programming marathon deadline 24 which was last held in 2018 and his main interest as a c++ developer was always compilers and parsers, which eventually led to the creation of CTPG. Peter, welcome to the show. Hello. Glad to be here.

Starting point is 00:03:13 What is the Deadline 24 about? What was that? Yeah, that was really cool. So, future processing was... I think it started 2009, so a year before I even joined future processing was, I think it started 2009, so a year before I even joined future processing. It was local, like competitive programming marathon. It was always 24 hours. So, it was a team of three members and three tasks. And you had 24 hours to do these tasks and no internet access.

Starting point is 00:03:48 Yeah. Is it possible to program without internet access? Yes. Apparently it is. So first, it was a local thing, so like local college and participants. And it turned out to be something really big with people like from Google and Yandex winning the competition. Oh, wow. The best algorithm experts from all over the world actually joining. And so it was kind of like, it was a victim of its own success, let's say.

Starting point is 00:04:30 Because the costs are almost bigger than the benefits. It's hard to, it's like, at first it was supposed to allow us to hire people that win these competitions, but then when Google guys win it they won't be working in Poland right so and it was really expensive for the company to actually

Starting point is 00:04:55 prepare it it was just mainly because of our time actually because the venues weren't like the big issue but it was like offline it wasn't the most interesting uh location the venue was actually underground mine three three hundred meters underground so oh wow no internet no internet access was actually like not chosen as a venue just to make sure you weren't like secretly looking at some code, stack overflow post on your phone or something like that?

Starting point is 00:05:27 I think this was just a cool location. But yeah, that's the fact. There just was no internet access, period. That's funny. That sounds like fun. Yeah, and it was, the tasks were actually like you were supposed to code a program we were writing servers

Starting point is 00:05:50 game servers and the programs were fighting with each other on an arena so something like that the tasks were so heavily focused on algorithm algorithm and yeah but this is discontinued at the moment heavily focused on algorithm algorithm and yeah

Starting point is 00:06:05 so but this is discontinued at the moment so it was everyone actually on site in that mine location it was 90 people 30 teams 3 people each

Starting point is 00:06:21 wow yeah that sounds like it could be cool all right well peter we got a couple uh news articles to discuss uh feel free to comment on any of these and we'll start talking about your uh ctpg project okay okay all right so this first one is a uh poll on on reddit which i thought was interesting um do you prefer include guards or using pragma once and i i did get the uh beautiful c++ book and i haven't read through the whole thing yet but uh this was in the first chapter where guy and kate uh recommend using the include guards because it's standard and even though pragma once might be available in all the modern compilers uh it's standard, and even though Pragma 1s might be available in all the modern compilers,

Starting point is 00:07:05 it's not standard. What do you two both think about this, though? I'll let Piotr go first. Sure. I've never used Pragma 1s myself. I've seen it. I don't know why, really. Probably because it's not standard, so I just didn't

Starting point is 00:07:21 feel like it. I just thought that both do the same thing. So I would never use it intentionally, but I didn't give it much thought, actually. Maybe now I will. The nice thing about Pragma 1 is you can't make a mistake. You can't

Starting point is 00:07:40 accidentally copy, paste, and include guard and not get the name quite right if you're yeah and i suppose the compiler can do a better job in terms of performance because the prep processor needs to actually go through the file and seek for the end if right so it takes time maybe yeah that's an interesting question i got the impression that it used to save time but maybe doesn't anymore as our pre-processors have gotten better but now i'm curious yeah well for sure the if the guards can do at most as good as pragma ones there's no way you can do it better than pragma yeah yeah absolutely i'd have to agree with that

Starting point is 00:08:19 i've always been on the side of it's not standard, therefore I don't use it. Plus, I've read the arguments for why it's not standard, because while it saves you from making a mistake, the compiler could still make a mistake. Like if you include the same file two different ways, the common example is like, if you've got two different sub libraries that each include Zlib, but they have their own embedded Zlib then you might get zlib included twice if it was both if they're using pragma once without include guards there but i mean you probably have other problems in your code in the first and in that case as well but there's a comment down here from twim thomas swirly it's at least half of the way through it. And Tom says, after reading this page, he's concluded that the only option is to actually use both. Really? To save yourself

Starting point is 00:09:12 from yourself and to save you from the compiler, you have to use both. And then that's after reading all of the everyone else's arguments. So this is why we can't have nice things, I guess. Yeah. And which order? That's a good question. Would that make a difference? It would definitely have to make a difference, right? Because if the pragma gets if-deft out, then I don't know.

Starting point is 00:09:37 I think you would want to put the pragma first. Yeah, anyhow. Okay. Next thing we have is a post on Arthur O'Dwyer's blog, and this is test an expression for constexpr friendliness. You picked a meaty one here, Rob. Oh, and I thought this one would be good because of what we're going to be talking about

Starting point is 00:09:56 in a few minutes with Peter, testing for constexpr. But yeah, he's talking about this test you can write to see if a function is constexpr. But yeah, he's talking about this, this test you can write to see if a function is constexpr, but then also going into how you can't do that for const deval functions, which was interesting. Yes, I think for this, for the sake of our listeners who, you know, can't see this code, it's effectively using SphinA to rule out an expression that cannot be evaluated at compile time but the much more readable version of it is the uh well ernesto guiero pena provides an alternative idiom using requires instead of sfine and a requires expression. And so that one is more readable.

Starting point is 00:10:46 If you scroll down to that one, you can say, oh, requires that this thing results in a, you know, that this thing is compilable. And the only way for it to be compilable is for the function that's passed in to be something that can be executed. It can constexpr context. And if it can't be, then the require fails

Starting point is 00:11:05 and you just get false back. Right. But that version of it also still does not work in const eval. Right. Yeah. And the thing that he then goes on to talk about is that that seems to be intentional, that you can't evaluate const eval.

Starting point is 00:11:21 It's just always going to be compile time and you're going to get errors if you try to use it otherwise. Right. It makes sense, actually. If we are forcing that this function should be compile time, then it should just be an error, right? Not just like sfina. It makes sense, I think.

Starting point is 00:11:41 I think so, yeah. Although I still sometimes wish that we had more ways of doing introspection into the code in a clean way of just asking is this something that could be run at compile time or whatever, you know, with reflection kind of mechanisms that we don't currently have.

Starting point is 00:11:58 And it doesn't look like we're getting that in C++23, but yeah. C++26? 29? Yeah. 32? Yeah. depends on when uh they can start doing meetings again i guess yeah all right and then the last thing we have is an update on uh catch 2 and this is actually is it catch 2 v3 or is it going to be catch 3? I don't know, but either way, it's the V3 preview, fourth preview. And a big list of what's changing, what some of the breaking changes are. I didn't have a chance to read through this too much, actually,

Starting point is 00:12:35 so I was hoping one of you could tell us a little bit more about Catch and what's coming in V3. I use it in CTPG. In version 3, specifically? I tried. Okay. Yeah, but I had struggles, and I just reverted to version 2. I had struggles, because I was compiling.

Starting point is 00:12:56 So catch 2, this version 3, is a static library now. And it's not distributed as a binary artifact. So I built it using GCC, and I was testing a clang build of CTPG, like the unit test, and it just segfolded right away. Oh, wow. And that's because Catch 2 is using C++ interfaces under the hood. So you have to build both Catch and whatever test

Starting point is 00:13:22 that you're writing with the same compiler and the same options compilation options and that's something that just the maintainer told me on under discord i believe um yeah so i just got back into using the version two but because i was like the whole point of it being now static library is shorter compilation time. But am I supposed to now compile it for each compiler and each compilation option separately, which sort of defeats the purpose? But then I thought that maybe not really,

Starting point is 00:13:59 so maybe I'll give it another try. Because you are really building it once for each compiler, not for each file that you write the tests. So if you have some sort of unit test project, and you have like 10 CPP files with tests, you don't have to necessarily

Starting point is 00:14:15 compile Catch-2 10 times, right? For each file, you just do it once for the static library, and then multiply it by the number of compiler options. So maybe it will actually save me some compilation time in my GitHub action, because I'm using

Starting point is 00:14:32 it on the GitHub actions. Yeah, anyway, it's a cool idea. I was just going to say, Jason, it sounds like this kind of relates to the recent discussion we had with the two Bloomberg guys about distributing modules, right yeah i uh it also makes me think um this is not dissimilar it is similar to how i have how i use catch to in my own projects because you can do the uh the pound define catch main or whatever. And I actually build a static internal library once

Starting point is 00:15:09 that has the catch main in it. And then I link to that in each of my catch tests. So I do save some compile time there. So it might be a similar kind of setup with catch two version three that you would have to do that. All right. So, Peter, could you start off by telling us about the CTPG project that you're working on? Yeah. The unfortunate name, the acronym, they just stole it from CTRE. So that's just guilty.

Starting point is 00:15:46 Okay. The convention is totally stolen. So it's CompileTimeParserGenerator. I mean, it's self-explanatory, right? So you want to have a parser. You define it using a grammar. Sort of the same way you would do this using bison and flex.

Starting point is 00:16:07 In theory, because we are using C++, so instead of the operators that are in the bison syntax, you use C++ operators for this purpose. yeah, and the compiler does the job for you and generates a bunch of static arrays.

Starting point is 00:16:28 So the parser is a constexpr object with a ton of numbers and C arrays, basically, in it. And then it has a parse method, which you use on some buffer, whether it's runtime or compile time you can actually invoke the parse method of that parser in a constexpr

Starting point is 00:16:51 context yeah so that's basically it so you have like flex plus bison inside C++ compiler that's what it is actually of course it's um it's nowhere near um flex plus bison like feature wise because these are like for 40 years projects and mine is two

Starting point is 00:17:16 and actually released a month ago um yeah but i'm looking forward to replace Bison. Joke. You just said when you give it the grammar, the result is a static array with a bunch of numbers and letters in it, basically. So are you building a state machine? Yeah, exactly. Okay. Exactly a state machine. For both lexical analyzer and the syntax analyzer,

Starting point is 00:17:48 these are just two separate state machines. The lexer is actually the finite state automaton. The parser is a bit more complicated. It has a stack of states. But it's still a state machine, but it has an operation to push and pop from the stack. Basically, you'd have to look at the algorithm of the

Starting point is 00:18:21 LR parser. I've actually used it from the book. It was compilers, design principles, or something like that. The Dragon book, that one? I have it mentioned on my GitHub. Okay. So the algorithm, it's comp with principles, techniques, and tools. And the authors are Alfred Aho, Ravi Sethi, and Jeffrey Ullman.

Starting point is 00:18:54 It's a really old book. Yeah, but I used it. Yeah, that's the one that our listeners might know of as the dragon book, because it has a dragon on the cover. Yeah, okay. So the algorithm is... So the creation of the parsing table is... So the algorithm itself is actually very... It's complex in terms of algorithm complexity. So that's why it was challenging to write it optimally

Starting point is 00:19:23 and in a context-per-context at the same time. So just to put it in the perspective, let's say a JSON grammar, and it takes like eight seconds to compile. I don't know if it's too much or not. It depends what you want to achieve. If you put that grammar in a separate compilation unit and don't change it much then I think it's not a big deal

Starting point is 00:19:49 but I wouldn't use it for something like C grammar maybe just not yet I would need to work on optimization a bit more first but anyway

Starting point is 00:20:03 I was wondering when you said you would work on optimization a bit more I but anyway sorry yeah go on i was wondering when you said you would work on optimization a bit more i'm wondering do you mean work on the optimum compile time optimization or the runtime optimization right runtime is like blazing fast so it's um it's a state machine so it's and there is no backtracking. It just reads each character exactly once and does a linear time parsing every time. So it's no backtracking, nothing like that. It cannot be actually faster, I think. You would like to... So it's just if I read this character,

Starting point is 00:20:38 then jump to this state. If I read this character, jump to this state. Basically, that's it, yeah. So, like, the compilation time. There are a couple of points I could improve. Algorithmically, so in terms of the algorithm, not really.

Starting point is 00:20:56 That's like, you can't do much there, but I have a couple of ideas. For instance, if you look at the, like, the page with C language and the operator precedence right and precedence of operators in C you look at this table and you see that C language has like

Starting point is 00:21:18 11 assignment operators which are basically the same they have the same precedence the same associativity have the same precedence, the same associativity. They are all, I believe, right associative. So you could treat them as the same, basically. So not to complicate grammars,

Starting point is 00:21:39 not to add rules, just add one and do it in a smarter way. So I've tried actually putting the grammar of the expression syntax from C language and it was too big. The compiler ran out of resources.

Starting point is 00:21:59 The compilation time was not that issue. The big of the deal, I mean, it went something like a minute. I mean, it went something like a minute. I think it will compile it, but I'm not using really powerful machine. I'm using virtual

Starting point is 00:22:11 machine for Linux work. And I'm developing CTPG on Linux. So I would have to increase maximum number of constexpr operations. There is appropriate switch in both gcc and and clang i don't know whether why this is the maximum it's it looks arbitrary but

Starting point is 00:22:33 it does look arbitrary yeah um it's not like a special number for me nothing really comes to mind it's something like 33 million something. Not sure why it's that number so relevant. Anyway, when I increased it, it still ran out of RAM, so the compiler. And MSVC, like, this thing is just eating RAM like crazy.

Starting point is 00:22:59 So even though it is supported in CTPG, just expect it takes RAM at least five times more than GCC and Clang. I don't know why. Constexpr operations in the Microsoft compiler is much more RAM intensive. In compilation time, they're pretty much similar. So going back to the C grammar,

Starting point is 00:23:26 I just put just the expressions, right, just the operators, just this grammar, so I can do a calculator that's similar with features that the C operators have. And it was a bit too much, so I thought that I could add

Starting point is 00:23:41 a feature that will group similar operators in one group, so the grammar gets less complicated. So the next challenge would be to actually reduce the size of the final parser. Right now, it can go up into the megabytes if you are using too much rules. That actual static data table? Yeah, it's not a problem

Starting point is 00:24:10 really, but the binary gets big. And these arrays are really sparsely populated, so that's a way to optimize it, I think. Because most of the states are actually unaccessible. It will be just an error.

Starting point is 00:24:27 I didn't do the optimization step on this thing yet. There is room for improvement, basically. The sponsor of this episode of CppCast is the PVS Studio development team. PVS Studio is a static code analysis solution that helps enhance code quality, security, and safety.

Starting point is 00:24:49 The analyzer detects bugs and potential vulnerabilities in C, C++, C Sharp, and Java code on Windows, Linux, and macOS. CppCast listeners can use the CppCast hashtag to get the analyzer's one-month trial version. To request the trial, use the link in the podcast description. C++ projects are getting increasingly complex, too complex for people to analyze them thoroughly during code reviews. That's where code analyzers come in. They notice everything the human eye misses, thus making code reviews more productive and enhancing code quality.

Starting point is 00:25:18 Want to know more about the problem? Take a look at the recent article from the PVS Studio team, C++ Tools Evolution, Static Code Analyzers. The link is in the podcast description. So right now we're talking about, like, you know, kind of crazy edge cases, like putting in the C grammar. What are some of the more realistic things you've done with CTPG?

Starting point is 00:25:41 Yeah, I've done JSON parsing, both runtime and compile time. So the talk you, Jason, gave about constexpr, all the things, that was actually the thing that inspired me to move into this direction, because originally I tried to do this using metaprogramming,

Starting point is 00:26:00 and it was just not usable. It waits as far as the eye can see. Yeah, it's too much RAM consumption by the compiler and too much time. So this tool was not usable really for me. So then I tried to move into the constexpr realm, and it worked. So I think the biggest thing I have in my examples

Starting point is 00:26:23 is the JSON parser, yeah so it's really cool, it compiles like in 8 seconds like I said on my machine and it can and also there is this like compile time JSON version which you can do with

Starting point is 00:26:40 CTPG using just 200 lines of code so you can have like json constants in your c++ code and i mean the most tricky part was actually the representation of the json that can be constexpr which is also what you face right to solve it it's and I'm doing some trick that actually makes... So if you think of a JSON, right, and you measure the number of characters it has,

Starting point is 00:27:12 that's the maximum depth it can have, right? It's overshoot, really, but yeah, you can't have more depth than characters in JSON, right? And also you can't have more depth than characters in JSON, right? And also you can't have more array elements than the characters.

Starting point is 00:27:36 And each of the objects cannot have more elements than there is characters in this JSON string. So I'm using this cap to statically allocate. I mean, in compile time, allocate meaning just static array to statically allocate, I mean, in compile time, allocate meaning just static array, right, of this size, and just operate in this term. So I don't have to, like, parse the JSON string twice

Starting point is 00:27:56 just to see what are the dimensions and depths and such. Right. Yeah. So, okay, so that's, like, beside the CTPG, that was the challenge that I faced you. touch. Right. Yeah. So, okay, so that's like beside the CTPG, that was the challenge

Starting point is 00:28:07 that I faced you, so I can do the JSON parsing in compile time. Actually, and then there are like a couple

Starting point is 00:28:16 of minor examples just showing of the features of the CTPG itself. It has like error recovery, for instance,

Starting point is 00:28:27 so you can put a special token in your grammar saying that, so you don't have to parse until the first error and then just stop. You can have like expected errors. I think like the C++ parsers have it, so when it reaches like the semicolon, it disregards all the syntax error before and starts parsing again. So you can have multiple errors in the same compilation run.

Starting point is 00:28:53 It doesn't stop on the first error. You can have that in CTBG generated parsers too. You can have operator precedence specified and a couple of other features. So just to compare it to Bison, let's say what Bison did was it took your grammar and then you wrote the C++ code inside Bison grammar, and it was just copied into the generated code, right? Okay. So how I deal with this is, because this is a C++ code, I'm using the actual functors,

Starting point is 00:29:37 right? The function objects and lambdas. So the syntax is constructed in such a way that after you write the rule for the grammar, you can associate with it some executable code, right? In the form of a function object, being that whatever function object. You can put a function there

Starting point is 00:29:56 or a lambda or some std bind or whatever. Yeah, so... So if you wanted to use a bind expression then you probably can't execute that grammar at compile time

Starting point is 00:30:13 I'm guessing there's some limitations if you want it to be a compile time executable grammar versus one that's runtime yes but if you use like a lambda with a capture so actually capture can be a reference that is not a const reference at all. You can modify things. Okay, so the parser object needs to be a const x pro object.

Starting point is 00:30:40 So you can give it a lambda, for instance. But if this lambda, like, modifies something in runtime, then, of course, it cannot be used as a constexpr parsing. Yes. Okay. Exactly. And also, so the parser, so the result of this parser is some kind of value,

Starting point is 00:31:06 because all of the non-terminal symbols, just using the terminology from Bison, all of the non-terminal symbols in the grammar have a value type. So it's whatever C++ type. And so to use the parser in a constexpr constex, all the value types need to be literal types. Right. And so to use the parser in a constexpr context, all the value types need to be literal types. Right. Yeah.

Starting point is 00:31:31 So if you... Yeah, so that's it, basically. I mean, the literal type, I don't know if it's in the standard, like if the term is actually standardized. I think it's no longer, but I may be wrong. So the limitation is basically the types need to be default constructible and I think trivially destructible. Right. I don't know if the standard actually defines something like literal type.

Starting point is 00:31:56 Yeah, I think they actually did remove that definition in C++20 because of the weirdness of constexpr destructors in C++20. Yeah, I think so too. Yeah, so the limitation for the CTPG types is default constructible and trivially destructible. Right. So if you just adhere to this concept, then you're fine just parsing in compile time.

Starting point is 00:32:28 So you said that the JSON parser that you've written is blazingly fast, but have you actually done any kind of comparison to existing JSON parsers? Not really. And I mean the runtime parser the usual one not the constexpr parser so the runtime parser it is fast because

Starting point is 00:32:54 it's a state machine but I wasn't focused on making it really fast so I'm using standard containers like maps and vectors and this is actually the most time it spends. It's just allocating

Starting point is 00:33:09 things, right? First, I would need to optimize that to actually make any comparisons. And well, the obvious solution for that is using PMR, right? So that will just... So actually, it's a good idea.

Starting point is 00:33:26 I think I could try to compare it. I don't suppose it will be faster than something that's really done just to be the fastest JSON parser because that would be for sure handcrafted and written in C, or I cannot beat that. But does it have a hand-rolled, highly optimized state machine

Starting point is 00:33:48 would be the question, which I'm guessing people don't do that. That's an interesting question. I need to check it. How fast would it be? But let's say it's only 20% slower, but what you get in return is a JSON parser that you basically don't have to maintain because it's generated from the grammar, right? So it's for sure, it's less error prone.

Starting point is 00:34:17 It's going to have just less bugs. Like, I mean, it's better to define it as a grammar right than just handcraft the parser I suppose so the code base will be really small I mean I don't know how many lines right now

Starting point is 00:34:40 it has I can check just from my github page so the JSON parser has 292 lines, and it just works. That's pretty cool. And most of the code is actually Unicode handling, so you can store Unicode strings in std strings. And I think it's like two-thirds of this code.

Starting point is 00:35:06 So it's like 100 lines to have a JSON parser that's not dealing with Unicode. It's really cool. Yeah. I think that's an interesting thing that I so often have difficulty getting people to understand that there's so much that you can do at compile time

Starting point is 00:35:24 and they think of compile time programming as meta programming and you said you started this project as a meta programming project and said this is pointless and instead now you're building a a uh state machine at compile time using just regular constexpr right i just want yeah it wasn't pointless i had great great time doing it a few times maybe when I'm looking at complicated templates in C++

Starting point is 00:35:51 it warms my heart but it was unusable it worked it actually worked but anything like 10 grammar rules in a compiler is out of heat. I just think this concept of building a state machine or a jump table or whatever using normal programming techniques and then being able to replay that at runtime, generate a compile time, replay it at runtime,

Starting point is 00:36:22 it could have a huge application for our listeners who maybe just haven't considered that before. Yes. And with the C++20 vectors and strings being constexpr, it's a huge step, I think. I wouldn't actually benefit from it, just because I you can have a constexpr

Starting point is 00:36:48 std vector but you have to basically deallocate it before you go out of this. It has to stay in the constexpr context, right? Yeah. And I need to have the state machine there. It's not like I can just calculate something and give back

Starting point is 00:37:04 like a result and disregard the vector. I need to actually calculate the vector. So I don't think I would benefit in CTBG from the STD vector being constexpr. So are you still on C++17 then? Yeah, I don't intend to change it, just to make it like the audience a bit bigger because not everyone can use bleeding edge compilers. It runs on

Starting point is 00:37:32 GCC 10. Okay. I think 10.2 and 10.3 is working. The clunk is clunk 12, I think. Just because the const expo, even though it was in C++ 17, right? But it wasn't fully supported until most recent versions.

Starting point is 00:37:55 And also the performance was lacking. There were some compiler bugs if you tried something in a constexpr context. So, yeah. That's, yeah, a lot has changed. I'm sorry, did you say which version of Visual Studio is supported?

Starting point is 00:38:16 I think it's 19.30. I mean the compiler. So I think it's the newest one, 2022. Before that, the constexpr in the compiler, right? So I think it's the newest one, 2022. Okay. Before that, the constexpr in Microsoft's compiler was just

Starting point is 00:38:33 not working properly. There were just too many bugs. Yeah. So when I was using the Visual Studio 2019, it wasn't compiling. Now it does. We need to now push for as much as we can push.

Starting point is 00:38:50 I hope, I guess, that our compiler vendors start making compile time on Sexper evaluation faster, right? Like, this would be a big win for a lot of projects if we could do more things at compile time with less pain exactly so just to improve the performance of the compiler so you mentioned that you're sticking with c++ 17 to make your audience bigger and then you said you don't think you would gain anything from constexpr string constexpr vector and i've been spending time with that myself lately so i'm not entirely sure either at the moment but is there any other features from 20 or 23

Starting point is 00:39:27 that you would make use of if you could? I would benefit from std variant going full constexpr for sure. Okay. std pair. Oh, okay. In 20, it's full constexpr, also the variant, right? Right. Because now I'm doing some strange hacks.

Starting point is 00:39:48 Because for some reason, the constructor is constexpr and the assignment operator is not. Right. Yeah. So I'm using some strange hacks to just make it constexpr. Also, I believe the optional, std optional, is the same story. More constexpr also. Yeah. Also, I believe the optional, std optional, is the same story. More constexpr also, yeah.

Starting point is 00:40:07 Yeah, so it will just make the code a bit cleaner, I think, but nothing really breaking. Yeah, so... And the one thing I would need... Yes, and I'm actually doing the regex, right, in the CTPG for the lexical analyzer. So I think it would benefit because you could have, like, so the regex gets its regular expression as a template parameter, right?

Starting point is 00:40:38 So it's a string. So in C++20, you can have it as a string constant, right? In C++17 you need to have a static linkage object, so I have this limitation, right? This would go away in C++20. You could use an actual string constant there. You don't have to define it elsewhere. That's the same limitation that

Starting point is 00:41:04 CTRE has, right? So if you're using C++17, then you need this static linkage so it can have... so it can be used, the string can be used as a template parameter. So do you also implement your own regular

Starting point is 00:41:20 expression library as part of CTPG? Yes. I mean, I didn't implement it. I have CTPG? Yeah. Okay. I mean, I didn't implement it. I have CTPG, so I could write a regular expression parser using CTPG. And then use that inside of your CTPG? Yeah. CTPG. It's sort of an inception thing, yeah.

Starting point is 00:41:39 Okay. So that's exactly what I do, actually. There is a grammar for the regular expressions inside the CTBG header it has a custom lexical analyzer because you cannot use regex because that would just be circular logic or something

Starting point is 00:42:00 I have just a parser for regular expressions but the lexical analyzer So I have just a parser for regular expressions, but the lexical analyzer for regular expressions is a custom-written one. But it's not complicated because it's just character by character. It's just a couple of special characters in regexes, and the rest of it is treated literally. We've mentioned HANA's CTRE library a few times. Have you compared your regex to CTRE?

Starting point is 00:42:35 Well, certainly it doesn't have the amount of features. This one is really basic. This is just the subset of regex features that you would usually need for your custom domain specific language right or Jason there are no like obscure all the obscure regex features that are rarely used are just not not implemented I didn't just focus on them it will they will just unnecessarily complicate the grammar and just increase

Starting point is 00:43:07 the compilation times for your actual grammars. So I think it's fine this way. And also, well, I think that Hannah's library is using the LL parser, and mine is

Starting point is 00:43:23 LR, so the LL is a top-down parser and mine is LR so the LL is it's top-down parser, right? And this LL stands for like left-to-right and leftmost derivation which an LL parser is like bottom-up parser and it uses the rightmost derivation

Starting point is 00:43:41 so it basically tries to from the leaves, like the terminal symbols, and it tries to make a bigger concept, like for instance... So the bottom-up parser goes from the most specific language terms, like the identifiers, operators and so on into the higher level concepts whether this

Starting point is 00:44:11 the top-down parser does the opposite so let's say it basically parses me a class that's what the top-down parser would do and then it tries okay so it sees a header let's say for a class and, okay, so it sees a header, let's say, for a class, and okay,

Starting point is 00:44:27 so I'm parsing a header now. And then I see the identifier, okay, so this one is the opposite. That's why it's a state machine, right? So the LR parser is, the downturn of using such a parser is the step of of using such a parser is the step of generating actually the

Starting point is 00:44:47 state machine, which is time consuming. It's an algorithm that has some complexity to it. But the final result is just a faster parser. And also the grammars, the set of grammars that you can give it is

Starting point is 00:45:04 much bigger. You can have left recurrence, for instance, which you cannot have in the LL parser. So you can define expression as an expression, plus sign, and then another expression, right? You don't care that this is a recurrence. LL parser deals with it just fine. So, yeah, plus the C theory,

Starting point is 00:45:33 this is actually the handcrafted parser for regular expressions, right? And the CTPG is just a generator. You can have any kind of grammar, right? The regular expression grammar would be just an example. So I don't know a lot about parsers and how grammars are defined. It's been a long time since I did any of that. But I'm curious, you mentioned the Flex and Bison that came up earlier. You said you, of course, don't have nearly as many features as those.

Starting point is 00:46:03 But does it work in approximately the same way? Do they also generate a state machine internally and they are also L LLR parsers or whatever, like you're doing. Okay. Bison can, you can specify what kind of algorithm you want at the output, the LR being actually the default,

Starting point is 00:46:22 I think. And then you can specify that you can specify default, I think, and then you can specify that, you can specify several, I think, mine is just LR, so it's the most powerful one, exactly. So, yeah, and it basically does

Starting point is 00:46:37 exactly the same thing. It generates a state machine, but, yeah, Python does it actually just spitting out C or C++ code with a huge state machine. But, yeah, but, like, Bison does it actually just spitting out C or C++ code with a

Starting point is 00:46:48 huge switch statement or something. And the code of Bison is actually unreadable. Or it was when I was

Starting point is 00:46:58 using it, like, 15 years ago. But that's not the point of generated code, right? It generated code I was using Python and Flex

Starting point is 00:47:11 when I was starting my career as a system verilic compiler developer so we were dealing with Flex and Python because it was generated code and it was the early days of version control.

Starting point is 00:47:30 So it was a pain because the generated code ended up in the repository. So if two people wanted to do something else inside the grammar, change it somehow, it was really hard to maintain. So eventually we ended up having the grammar files in a repository and the generated code outside of the repository. But that has a drawback that you need a tool, like the Bison tool needs to be a part of your build system. And it can be a pain too because it was running on

Starting point is 00:48:06 Linux fine, on Windows not so fine. So we ended up porting Bison for Windows for our purposes. I believe it runs on Windows fine today. So we ended up... So actually

Starting point is 00:48:21 each automated build started by building Bison. So you can imagine. With CTPG, you just have C++ compiler, right? And it just works. Of course, I wouldn't do, like I said, C grammar with CTPG, but it's just a different approach. Have you made use of CTPG in any

Starting point is 00:48:48 production projects or just the examples you have? Not yet. I'm just judging by GitHub issues. People are using it in places. That's cool. It has 270

Starting point is 00:49:04 stars on GitHub right now. Wow. And you said you just put it out like a month or two ago, right? Yeah. Wow. Some people on some universities use it just to teach parsers and compilers because it has a really nice feature of diagnostic messages. So you can see your parser

Starting point is 00:49:26 as a state machine and what is done on what terminal symbol and where are the conflicts and people just trying to adopt it somehow into their projects, usually probably

Starting point is 00:49:40 no one is touching with me and saying that they're using it in their commercial projects or so. It's MIT, so licensing, so they can just do it. It has like nine forks already. Wow. I'm not sure it's the first open source project of mine, and I'm not sure if it's a big number or not. I don't have any comparison.

Starting point is 00:50:06 I mean, Catch 2 has several thousands. It sounds like it's doing pretty well for something that's only been out for a short period of time. It's since November, end of November so like a month. Well, it definitely sounds like listeners

Starting point is 00:50:21 if they're interested in learning more about parsers and how parsers work that this is a good project to go look at and play around with. Yeah, I would love to hear from someone if he's trying to use it in his commercial project. In my experience, you'll never learn. I've had people come up to me at conferences and say, oh, yeah, I've been using ChaiScript in my commercial project for the last five years. And I'm like, you could have told me sooner. Would have been nice to know. It's not like I would charge you. I cannot.

Starting point is 00:50:54 I can't. It's a BSD license. But if you want to be a sponsor on GitHub or something, I could use some money. That's right. GitHub sponsorship doesn't exist. If you want some feature done, let's say, I wouldn't mind doing it. It's a project that I'm doing in my spare time. So, of course, yeah. Well, I definitely encourage all our listeners

Starting point is 00:51:17 to go check it out. And Peter, it was great talking to you. And thanks for telling us all about CTPG. Thanks for coming on. Yeah, thank you. Thank you. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast.

Starting point is 00:51:32 Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. Thank you. And of course you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - C++ Compile Time Parser Generator

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.