CppCast - LLVM Hacking And CPU Instruction Sets
Episode Date: July 16, 2020Rob and Jason are joined by Bruno Cardoso Lopes. They first discuss an update to Mesonbuild and CppCon going virtual. Then they talk about Bruno's work on Clang including clang modules and work on a p...attern matching implementation. News Meson Release Notes 0.55 Writing an LLVM Optimization CppCon Going Virtual Links SHRINK: Reducing the ISA complexity via instruction recycling SPARC16: A New Compression Approach for the SPARC Architecture P1247R0 - Disabling static destructors Sponsors Clang Power Tools
Transcript
Discussion (0)
Episode 256 of CppCast with guest Bruno Cardoso-Lopez, recorded July 16th, 2020.
This episode of CppCast is sponsored by Clang Power Tools, the open-source Visual Studio extension on a mission to bring Clang LLVM magic to C++ developers on Windows.
Increase your productivity and automatically modernize your code now.
Get ahead of bugs by using LLVM static analyzers and CBP core guidelines checks from the comfort of your IDE.
Start today at clangpowertools.com.
In this episode, we discuss CBPCon going virtual this year.
Then we talk to Bruno Cardoso Lopez from Facebook.
He tells us about his work on Clang and much more. Welcome to episode 256 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm okay, Rob. How are you doing?
Doing okay.
You know, a little pandemic update.
We are planning on sending our kids to the online school option for next year.
So I say send, but they'll be staying here for the next year uh
there there's options to either you know do part-time schooling or do fully virtual online
so we just think the virtual online is going to be safer with the way things are going interesting
that your school district is giving you all those options i don't honestly know what's happening
here because i don't have kids sure uh it it seems like that could still change like some of the other uh counties nearby have
announced they're going to go fully virtual but as of now our school is still giving you the option
interesting yeah well at the top of episode i'd like to read a piece of feedback uh this week we
got a comment on reddit about last week's episode from Spirited Being saying,
Really interesting episode.
Makes me want to complete the other books in the Ray Tracing in One Weekend series.
Have we talked about that before?
Ray Tracing in One Weekend?
That sounds interesting.
I believe some guests from time to time may have mentioned it.
Okay.
But I do know that it's come up several times in my meetup.
So honestly, no, I don't remember from where I've gotten the most exposure to it. Okay. But I do know that it's come up several times in my meetup. So honestly,
now I don't remember from where I've gotten the most exposure to it.
Okay, we'll have to find a link for it and put that in the show notes.
All right. Well, we'd love to hear your thoughts about the show. You can always reach out to us
on Facebook, Twitter, or email us at feedback at cps.com. And don't forget to leave us a review
on iTunes or subscribe on YouTube. Joining us today is Bruno Cardoso-Lopez.
Bruno has been contributing to Clang and LLVM-related technologies for the past decade,
spending the last four years on the Clang front end.
He's passionate about C++ and joined the C++ Standards Committee in 2017.
Bruno currently works for Facebook.
Welcome to the show.
Thank you.
Thank you guys for inviting me.
Very nice to be here.
Thanks for coming on.
All right, so Bruno, we got a couple news articles to discuss.
Feel free to comment on any of these,
and then we'll start talking more about your work on Clang and compilers, okay?
Okay.
All right, so this first one we have is an update to Mison build,
and we had him on earlier this year, right,
to talk about the latest with Mison.
So what's new in this update, Jason?
Well, there's, I mean, mostly I just wanted to call it the fact that it is still an active development.
There's still a lot of stuff going on.
But gtest-specific support and, yeah, that was the main thing that jumped out at me but lots and lots of small fixes and updates
so clearly this you know build system is still in active development yeah it's it's a pretty long
changelog the gtest definitely sticks out he's also added ability to specify targets if you want
to just compile a single target i guess you can that now, as opposed to building the entire project.
So I must admit, I still have not actually used Mison myself. Any chance you have, Bruno?
I haven't, but I found it really cool that now you can specify targets, because at least for clang development and things like that, it's pretty helpful not to have to build everything
all the time, be specific with their targets.
And I also found cool that now they have this,
they can use LLVM cove to generate coverage information,
kind of automatically sounds like.
It's pretty cool.
That sounds really helpful, yeah.
Yeah.
Okay, next thing we have is this YouTube video
that I saw on Reddit,
and it's writing an LLVM optimization.
And it's, I think, like an hour, hour and a half long video.
I did not watch the whole thing, but I don't see much about, you know, tutorials on how to get started in LLVM.
So I thought this seemed like it could be a valuable resource.
From my perspective, it definitely seemed very, what am I, it was done like a course,
like, you know, introduction, how to build LLVM, like all of the details, all of the
function calls and the types and the stuff that you're going to interact with.
And it's very rigorously done, I thought.
But I don't have a lot of in-depth experience with LLVM, so it's possible that some of it's
wrong and I just didn't, you know, see it. Yeah. Do you have any thoughts on this one, Bruno? As someone who does work on LLVM, so it's possible that some of it's wrong and I just didn't see it.
Yeah, do you have any thoughts on this one, Bruno? As someone who does work on LLVM?
Right. First thing, I thought it was using a pretty cool hat.
And I like the references for IR instructions and other tables he put on the slides. It was
pretty cool as well. And I think the example that he chose which was about like
float point comparison to kind of apply his optimization on top i i think that was an
interesting choice because it's usually like whatever guide it is it's exploring i don't know
some more kind of canonical or simple things to do not that this is complex but i especially i like the choice i thought it was a
cool choice for for a video awesome yeah okay and then the last uh news thing to discuss uh
is cpp cons announcement that uh cpp con 2020 uh will be going virtual just like the cpp on c
uh conference which i think is this week right jason that sounds right
yeah i lost track yeah i've had a lot going on this past year so yeah definitely oh yeah right
now yeah the uh cp con will be virtual um tickets are now on sale and it's at a much reduced price
because it will be uh virtual i think it's $200 for early bird tickets.
And that's available until August 5th.
They're going to be keeping the same time of day based on Colorado time
is when they'll be doing the virtual sessions.
And it will still be multi-track.
They haven't announced what they're going to be using
as far as software to host the conference
or what website it will be on or anything like that.
But they're investigating different tools.
Maybe they'll use the same thing that CPP-NC uses.
Yeah, anything else to say about this, Jason?
Well, I think for just our listeners
who haven't been necessarily following along with all this,
CPPCon was waiting as late as possible
to see what their options were
to try to be still at least partially in-person.
So the big news, and this is just of three days ago,
that there will be no in-person portion at all.
It's 100% online.
I've seen some interesting feedback
from people who are at C++ on C right now
about how this kind of virtual conference
environment like we were talking about last week is actually pretty cool and you can like wander
around and chat with people and stuff in it so yeah we'll see we'll see how it goes i am wondering
uh you know is there going to be a lot more room for people to join the conference who maybe otherwise wouldn't have just because that price point is so much lower. Barrier to entry is so much lower. I'm
not sure if they're putting any type of limit as to how many people can sign up.
Yeah, I don't know either, obviously. I have one person told me that they're like,
how can I get my company to pay for me to go to a conference when we know
it's that's just virtual when we know a week later, all the videos will be on YouTube anyhow.
Yeah, that's an interesting point. I mean, you do, you know, they're gonna have this online
software to try to capture the hallway track, you'll be able to ask questions directly of the
speakers. So certainly, it's, you know's still going to be benefit to buying a ticket.
But I understand that concern with trying to get your company to pay for it.
Yeah, and I have no idea.
I am planning to buy a ticket, which might sound funny.
But at the moment, I'm not giving any talks or teaching any classes at the conference.
So it'll be a different experience for me.
Yeah, I plan to do so as well.
How about you, Bruno?
Are you planning on virtually attending this one?
So I haven't thought about that yet,
but it sounds like it's going to be.
I mean, I like the trend of keeping things virtual.
That's a good thing.
The session hours were pretty reasonably scheduled
across the different time zones,
and that was very interesting as well.
But let's see.
Yeah, I'm definitely interested in that.
Yeah, that's a good point.
Even for Central European time, it's 4 p.m. to 10 p.m.,
which isn't maybe ideal, but also not terrible.
Of course, our oceanic friends, Oceana friends,
will, as always, they're the ones that get left out when we try to schedule things
for as much of the world as we can.
But sorry, New Zealand and Australia, you're just going to have to watch it online.
Yeah.
All right.
So, Bruno, could we start and talk a little bit more about
how you got involved with working on compilers in the first place?
Sure.
So I had this colleague
back in school, kept talking
about LLVM. This was like
2007.
There's this new awesome compiler
technology, etc.
And I was still an undergrad student
and I
was looking for things I could
do on LLVM
that were not already taken at the time.
Luckily, since this is a decade ago, there were a few things available.
In this specific case, I was very interested in backends.
And I noticed that the MIPS backend was something that was not yet done for LLVM. So I started looking around and, like, cook up a skeleton patch
and send out to the mailing list and also send a proposal
that the timing was good because they were doing Google Summer of Code
at that point already.
So I think that kind of helped me get through the process
because I already had a patch,
and then I started working on it as part of Google Summer of Code. I think that kind of helped me get through the process because I already had a patch.
And then I started working on it as part of Google Summer of Code.
That's how I started my compiler journey.
And it was actually my first meaningful use of C++.
That's when I saw a bunch of things I've never seen before.
And I was pretty amazed with all the things you could do, etc.
I'm curious, why MIPS? Why did that catch your attention?
Because there was already an ARM backend and x86 already existed.
But Spark already was a thing there too.
So I guess MIPS was the natural next choice since at the time there was already like EMU support
and other things I could use for
developing and testing
it pretty easily. And it was cool, like
I was able also to get a
MIPS multiboard that at the
time I got like someone from Sony gave
me. And that was
they shipped to me and then I used some
of that as well to test that the back end was working and everything. And that was, they shipped to me and then I used some of that as well to test
that the backend was working and everything. And also because it was an example of a RISC
architecture that is like pretty simple and that sounded like a good scope for starting.
Once you get into the real developing for MIPS things, things are a bit more complex than they seem
because MIPS has like a bunch of different ABIs,
a bunch of different architecture revisions.
So, but yeah, it was a nice way to start.
I've only ever worked with MIPS
when I was writing some software
that the organization I was working for
wanted it to
be ported to run on Linksys routers, you know, Linksys routers that had already been hacked to
run Linux. And so I just kind of was curious if that's related at all, because it's about the
same timeframe you're talking about. Right. One thing for sure that I remember when I look over the same set of Inksys stuff you're mentioning,
I remember seeing how they explored so they could update the BIOS or something like that.
And I remember seeing a MIPS assembly.
And it was convenient at the time because it was like, oh, I can actually read this.
That's awesome.
So that's how you got started.
What have you been working on in clang more
recently so right more recently i've been working on clang modules so for um there's a there's a
difference between clang modules and c++ training modules okay um so just for the people who don't
know more about it i'm going to do a short explanation.
So Clang Modules is basically what I call a header modules,
because it's basically the capability of building modules out of collections of headers.
So the nice thing about packaging headers in a way,
packaging might not be the right word here,
but like putting headers together and building that
and putting them on disk and having that on a cache and being able to reuse that.
This is basically how plain modules work. It's an AST. You get the AST of all the headers that
comprise the modules and put them in a cache. The good thing there is that you get very interesting
build time speedups because you don't have to reparse headers
or anything like that.
So it's very efficient for that kind of thing.
You can see an average speedup from 15% to 20%.
That's on build time.
That's what I've usually seen when projects adopt that.
Clang itself can compile with modules on.
This is accomplished by an extra file called module maps. So in the module map, you put the
name of the module you want to have and list inside that description all the headers that are part of it. If you
compare to C++ modules, you still have some of that. When you see people
talking about header units in C++ modules, that is basically what Clang
modules is. Yeah, so I've been working on that piece of technology for the past three or four years.
And it's a very interesting process when people go adopt Clang modules
because it's not about only writing a module's description and the headers that are part of it.
There's more also the part of like where you have to fix your headers so that they become something like the headers are standalone in a way,
which means that they include whatever they use, that they do good practices,
that you're not doing like X macros and things like that. Basically they need to
be somewhat well behaved. And when that the case uh you can easily transform in your your piece of
set of headers into a module and yeah that's that's my that's the thing i've been working
the latest how does that it sounds uh very similar to how pre-compiled headers work is that a fair
comparison i would say like yes especially especially when you think about the disk representation.
So because, like, at least in Clang,
the technology for PCHs and modules is basically the same.
It's like you're serializing ASTs.
The only difference here is that there's a bit less semantics involved
because the way that Clang modules work
is that when you do like
a pound include a header name, Clang has extra logic to go look around that header and see if
there's a module map file that responds for that header. Whereas when you're using PCHs,
it's usually like you have to pass a dash including the command line pointing to the PCH you want to
use. So in the beginning of your translation unit you're gonna have like all the content from
the PCH coming, whereas in modules you control, you have fine-grained control because you do
pound include the name of the header and you're gonna get the visibility of the symbols of that
specific set of things in that header or module.
And if that specific header include or import other modules,
you might be able to restrict what's coming in from that import.
It doesn't mean that you're always transitively getting everything.
So there's more of a fine-grained control there.
PCHs are more brittle in that sense.
Right, yeah.
For people who haven't used PCH,
it's easy to break your build
between non-PCH and PCH versions of the build
if you have a header or don't have a header
that you expect to have in place, right?
Right.
That's what I've observed.
And the same could happen,
like you can even mix modules and PCHs,
and that could also be a big source of problems.
Okay.
So is the plan moving forward for the modules that you've been working on
to morph into the C++20 modules,
or do you expect to maintain both sets of functionality?
I think we're going to have to maintain both functionality for a long time, especially because claim modules work for C, work for
Objective-C. It's not specific to C++. And there has been different vendors and folks around that
are relying on claim modules as well. So I expect it to live for a long time.
Ideally, the core technology is the same if you think about how it's implemented in the compiler.
There are differences, but it shares a lot of the things.
So I expect it to still be there
because it won't be a huge burden also to maintain.
Did your work on the Clang modules
have any impact on the ISO modules?
I know you're a member of the ISO committee.
Did you do any work on ISO modules?
Right, yeah.
I wrote a few papers back in 2018, I think.
There were more towards having C++ standard module, having things
from the Clang modules world and things that would be nice to have so that vendors that
were relying on Clang modules could have like a path of adoption and things like that, that
they could start with their headers maybe and then slowly migrate to actually having like
module implementation units and other like things that are more C++ like modules.
So yeah, I try to contribute in that direction to have some of the things that Clang modules had.
Okay. So when we were getting ready for this interview i was looking at your linkedin
page and i saw you've got some patents and stuff on there and one of them really caught my eye
um the which i cannot even pretend to pronounce the portuguese uh but the method and system for
emulating instructions and executing legacy code is that a proper translation of this patent?
Okay. It caught my eye. I have no idea what it's about. Is it something that you want to talk about?
Right. Sure. So this is the result of the work I did during my PhD, which is basically around
the areas of like computer architecture and code compression.
But it actually has the roots in the compiler work. So one of the things I did in LLVM was in
2010, 2011, I implemented support for AVX. AVX is a multimedia instruction set that x86 has that came after SSE.
Nowadays, we have AVX2, AVX512, and other things like that.
And while writing that, like doing that support,
I've noticed that instructions on AVX,
they were bigger than regular x86 instructions.
They introduced some new prefix headers and other things so that they could encode more information. And then I started thinking about, well, what's going to happen in the future
with the size of these instructions, right? Because it seems that it's growing and there's
announcements of other things coming in the future. So I got curious about, really curious about that. And then I joined
other researchers back in Brazil in the university I was studying. And we started looking at
things around that, because that sounded compelling. And at the time, x86 had like Atom,
and they were talking about Quark as well. And those were basically like would be x86 stuff
for low-end embedded, which means,
and that kind of like goes hand-to-hand
with the research we had back then,
which was about code compression
and how you could like compress code so it run faster.
The idea basically here is like,
if you have smaller code for the same amount of cache,
you're going to have less cache misses. And if you're especially on a low-end embedded device,
that's pretty good for performance. So we start looking into that, right? And then we noticed like,
well, we did a bunch of static and dynamic analysis. Like we run, I don't know, in virtual machines, Windows 95, Windows 98, old versions of Linux,
a bunch of different OSs doing basic tasks like open a spreadsheet or navigating the
browser, a bunch of different things just to catch the usage in profilo of those instructions to see more or
less how things behave and then we find out that most part of the instructions there were a lot of
instructions in the x86 like a byte or two that were never used and things like that so well
what if backward compatibility is very important for x86 but what if we re-encode this?
You know, like we can't.
It's CISC.
It's more complex to do something like risk architectures do,
like ARM has like thumb and MIPS had micro MIPS and et cetera.
So we start exploring like, well, what if we re-encode some of this? What, you know, what would be the fallouts of that?
So we end up with this really complex page table-based annotation of versions for backward compatibility and OS-based emulation.
And the idea was that we would bring code, the most used instructions, using smaller opcodes,
especially picking opcodes that were not usually used
over the years when we found in our analysis, so that whenever you
would encounter those things, depending on the mode you were, how your
executable has been annotated, you would then fall back
to emulate in case you needed to emulate that instructions for a
previous version of the software or anything like that so in general was a mechanism to like emulate whenever you need
to emulate but bring code and have faster execution from when it was not the case i want to if you
don't mind go back to you said earlier compressing the code and to be clear and make sure i understand
you're not saying like compress like zlib you're saying compress as in just make smaller code like compile to a smaller
representation of the same code right exactly it's not on the the zlib or anything it's more like
the intrinsic uh uh uh the intrinsic representation of encoding and all of that kind of stuff.
And so is this directly related to the shrink research that you also did?
Exactly. Shrink was the funny name for the academic purposes for the paper title.
I never stopped and think about it, but having you just explain that,
it does sound highly likely that there's older x86 instructions
that just simply aren't used anymore.
Right.
And out of curiosity, like, do you have some estimate,
like what percentage of the possible x86 instructions
are actually still in use or something like that?
That's a good question
i mean this dates back to like 2010 2011 so i my memory is a bit fuzzy on that but yeah certainly
like those there there feel like um trying to remember here i mean i don't know about current
usage but at the time was like all those uh those instructions for
converting what was the name again abc abc what's that that kind of representation uh
bcd bcd exactly there's like a bunch of instructions for that thing and even on old
machines or old os's it was not something that triggered a lot maybe we didn't try with the
right software so it didn't ever show up,
but we didn't see much of those, for example.
And they're very short.
That's a funny one for you to bring up
because I actually just did a Twitter poll
asking if anyone in my Twitter network
had used BCD instructions for anything.
And for our listeners, that is binary coded decimal.
And it was a very low percentage of people who
had said that they had actually used it. And almost every single one who responded said that
they had used it when working on a 6502, either emulating it or writing software for it. And it
came down to displaying the score and the game, basically, that's like what everyone had to say now i know that uh
big iron like ibm stuff it's still used in some financial systems but otherwise it seems yeah it
was not very much used yeah like i i for sure never use it how about you jason no i've never
used it and and for just a little bit of history here uh nintendo
removed that portion from the 6502 and they put their sound um code in the same place in the die
so nintendo didn't i think that's right i'm pretty sure it's nintendo's 6502 didn't have bcd
instructions at all wow so the 6502 is is that the Motorola series of processors?
No, not Motorola. That's Moss.
It was the chip that was used in the Nintendo, the BBC Micro, the Commodore 64.
And then there's some, what's it, Ben Eater, I think is his name, who's doing a series of YouTube videos on programming 6502 from scratch right now,
like building a computer from scratch at the moment.
Oh, that's pretty cool.
I was looking recently into Genesis and Mega Drive,
and I saw that they used the M68K.
So I thought the numbers were similar.
They're similar.
Yeah, it was the 6502 was originally binary compatible
with the 6800, I believe.
Yes, because just like, yeah.
And then the Z80 and the x86 share a very similar instruction set as well
from back in the day.
Anyhow.
This is so over my head.
Did the shrink research get put into use by any CPU vendors
or anything like that?
So not to my knowledge, actually.
We presented that in several venues at the time,
got feedback from some CISC vendors.
That was pretty cool.
But we don't see a lot of CISC machines around
where this kind of technique would be profitable to be applied.
And the ones that are currently in the market,
like I don't see them following up with low-end embedded stuff as they used to.
So maybe in the future, the paradigm will shift again.
Might have an opportunity, but no, not to my knowledge.
So you've mentioned the AVX instruction set a few times in your work in LLVM.
It looks like you've done work specifically on implementing AVX stuff in LLVM.
Right. That's true.
And I see this.
I saw a mention on your work to the table gen thing
and bringing that back to the video that we were just talking about
on implementing your own optimizer.
That talks about TableGen as well.
I don't have any idea what that's about, though.
So basically, TableGen is a DSL,
and it's using the compiler for a bunch of different things.
So for example, if you're writing backends,
you use TableGen to describe
your instructions. And not only the encoding of those instructions, but you can actually like
create several layers of abstraction, you can like define instruction format, and then you can
write a instruction group based on instruction format, and then you can actually go and describe
the instruction. So it's very convenient for doing that kind of work. It's also capable of...
there are table jam representations for some of the LLVMIR-ish kind of things. So you can further even describe a match between a target instructions and actually an LLVM instruction.
So some of the code generation you can also write in TableGen.
So it's kind of very, very powerful in that sense.
And also you can write when you're trying to match things between the
intermediate representation and the target. You can also write custom patterns in C++ and attach
those things on the instruction you're declaring. So it can match kind of conditionally if it,
you know, meets some of the criteria you're writing code for.
And you still have all the C++ logic and everything for writing your backend,
but it's very handy to have the TableGen around for facilitating some of that description.
And not only that, but for example, if you move to the frontend,
you're going to see that TableGen is also used for describing like language options, warnings, errors, and all things like that.
So it's pretty flexible for basically everything that you have to write.
A lot of code to have, like generate tables and, you know, do all the the table search and match things it's
pretty convenient to have and have all that code uh and of course like tablegen uh is this dsl and
it the the compiler uh quote compiler for it it's inside uh lvmM itself, so it's well-targeted for LLVM or Clang things.
But you could use it for anything like that.
If you build LLVM and you go looking for.inc files in your build directory,
you're going to find a bunch of tables and a bunch of code using those tables,
and those are basically C++ code that gets generated by TableGen.
Okay.
So you said like the instruction sets are defined in TableGen as well, right?
Okay.
Now, I just feel like I have to bring this full circle now.
So is the entire x86 instruction set defined in TableGen?
It is. every possible instruction i mean assuming
that uh we're like ideally yes uh i don't know how much complete it is in terms of like if you
get avx 512 or some other x86 thing that hasn't been described there yet, but potentially you can.
Because in the worst case,
you define a very shallow instruction and then with real C++ code in the backend,
you go and fulfill that gap.
So it could out, like, let's say,
you're trying to represent something on x86
that is really complex.
You're just going to describe the easy part in TableGen
and do the weightlifting later in C++.
But you're going to at least have an entry for an instruction there.
Okay.
Because that's also...
Sorry.
No, please, please go ahead.
That's also...
Because that is also going to be used
for auto-generating parts of the disassembler and the assembler.
So even if you're not co-generating,
it's pretty useful to have around
because it can also be used for all the other,
even lower-level things.
I just found myself thinking, like, you know,
bringing this, like, all the way around
and trying to imagine, like, how does the compiler pick
what instruction to use in a given situation.
And I'm like, well, if you were doing a BCD-like operation,
would the compiler be able to find a BCD instruction that matched that?
Right. This is a good question.
So it won't generate code using those BCD things.
But if you open
the, if you go look into
TableGen, the instruction is probably there
because it was useful
for someone at some point while like
trying to disassemble or assemble
things, like if you're using the Clang
integrated assembler
to write assembly, you might
want to use that instruction and it's
not, it should be there because of that kind of thing.
So it won't be used for code generation,
but it's going to be used for the other lower level tools.
So there's a very high chance that it's there just because of that.
I feel like I should take just a second,
since we just mentioned BCD like 12 times to mention what it actually is.
So that's instead of, you know, a typical byte can hold like, you know, 0 through 15 or sorry, 0 through 255.
There's packed BCD and not.
But anyhow, OK, so it uses four bits to represent the value zero through nine we don't ever like you can't go to to 15
in those four bits so that it's literally just encoding decimal in a binary representation
then uh when you go to read those values back out then you just have the decimal and it's easy to
convert it to ascii to display which is why it was used for scoring in video games back in the day
that's a very terrible explanation,
but it's there for our listeners in the future.
That's pretty good, actually.
Want to interrupt the discussion for just a moment
to bring you a word from our sponsors.
Clang Power Tools is the open-source Visual Studio extension
on a mission to bring LLVM tools like Clang++, ClangTidy, and ClangFormat
to C++ developers on Windows.
Increase your productivity and modernize your C++ code
with automatic code transformations powered by ClangTidy.
Find subtle latent bugs by using LLVM Static Analyzer
and CppCore Guidelines checks.
Don't fiddle with command line switches
or learn hundreds of compiler flags.
Clang Power Tools comes with a powerful user interface
to help you manage the complexity
directly from the comfort of your IDE.
Never miss a potential issue by integrating Clang Power Tools into your workflows with a
configurable PowerShell script for CI CD automation. Start experimenting on your code today.
Check it out at clangpowertools.com. I wanted to talk about some of these other papers you
worked on recently at the ISO committee. There's one about disabling static destructors, which I
don't think we've mentioned before on the show. Can you tell us a little bit about that?
Yeah. So we've been hearing from especially game developers for a long time
that they are the hell out annoyed about the fact of having to deal with the sort of like initialization fiasco that we have in C++.
And they've been wanting to have a way to say like,
my program is never going to end, theoretically, right?
So why do I need to care about those things? I mean, there's a lot of philosophical discussion behind that kind of stuff,
but there are basically users that want that,
and that was the idea of the paper.
So the way we were going to convey that is through an attribute,
like a no destroy attribute,
that basically specifies that a variable,
let's say with static or thread storage duration,
will not have its destructor run.
This is a very contentious topic to discuss because
people have very strong feelings about those kind of things.
But just thinking standard or
committee-wise. But in practice, it's something
that I had some really good feedback on people that
wanted at the time that we proposed it. So yeah, it's that simple in the sense of just having an
attribute that would allow you to do those kinds of things. One question though is that some,
there's this, like, if you think about attributes in the C++ standard,
there is this expectation that the attribute could be, like, if you're not using the attribute,
you're not changing the semantics of the program too much or basically at all.
But let's say that that's not exactly true. So yeah, to get that into the standard,
I think it's going to require a bit more of convincing,
but because that's one of concern, right?
Like if you, the idea is that if you're having
this kind of attribute being put,
then you're going to, let's say,
something that had a destructor that could like assert assert or have any other side effect in your program won't have any more because you're not required to emit it.
So that's definitely a change in semantics.
But yeah, but the idea, the core idea is simple.
It's more like don't even generate the destructor
if no one else is using it.
Yeah, that's interesting.
Yeah, that could be some code saving as well there.
But I guess more of like also not messing up
with the order where those things run.
I would definitely fall in the camp of people
that says that sounds like a terrible idea
because well-defined object lifetime is is c++ but uh
i can also appreciate the arguments where people say no no really this object i don't care what
happens to it on destruction just let it free or when the unprogrammer exit right yeah i i can i
can definitely see the point about like the on on the realms of, like, tier down ordering kind of thing.
Yeah, like, especially if you're maintaining a large piece of software,
might have, like, interesting side effects of, like,
changing that kind of thing under the hood, but yeah.
It seems like it should not be allowed on thread local objects.
That's also a very interesting perspective.
So what's the status of the paper currently, then?
It's kind of on hold.
One of the authors needs to move with it.
Let's see.
We haven't discussed it ever since again.
Okay.
Maybe we need to push it a bit more for the next cycle.
At least get more feedback on it.
Yeah.
You were also working with the pattern matching paper writers?
Right.
So, yeah, I haven't been there from the beginning, but three or four months ago, I started a new branch and I started putting support into Clang for the current proposal. it's going to be really helpful and useful for, you know, kind of bulletproofing all the ideas
and all the changes that are going to go through that paper. Especially because there's been this,
it should be a state like pattern, like with pattern matching you have this inspect thing,
which is the, will be like comparable to switch, right? Like a switch on steroids, that's how I like to look at it.
And there's a desire that that thing can be like an expression,
but it could also be used in a statement.
And there's multiple discussions around that.
So having a prototype and being able to evaluate this early,
I think it's going to be really useful for at least ruling out a bunch of things that we hear or concerns that we hear that might not be true.
Or even things that we are going to find out that, okay, we can't actually parse this as we thought we would be able to.
Because there's this and that caveat for this other feature that we're going
to have a bit of trouble.
So yeah, I'm really excited to see pattern matching happen.
And I think it can help somewhat with the compiler
part of the work.
What kinds of things would pattern matching
allow that right now are either impossible or extremely difficult in C++?
Right. The way I see it, at least, maybe the original authors would have more rationale to put here.
But my way to see that is that it's not about performance, but about allowing the C++ programmer to actually express more
things easily. So for example, since we were talking about embedded things, if you're writing
an emulator, or if you're trying to write a decoder for a processor or anything like that,
it's usually, let's say it's a complex machine and you don't really have like your upcodes when you're going to switch on that upcode.
So you can like catch the sub upcodes or things like that.
You're not getting a byte.
Like, let's say you need some bits from this part of a byte and another bit from another part of the byte.
From, you know, like you get different in a four bytes instruction.
You're getting
different bits from different parts so you can match the right thing. You cannot write like a
flat switch with all the things and they are going to look nice, you know, like sometimes you're
going to say, you want to say, well, there's, if you see, if you see you have this opcode,
if you're matched this number here, you're fine, But then you can do extra testings on top of that to actually get into that case.
So with pattern matching, let's say you do inspect.
And then once you're inspecting that value, you can do additional things just to say,
well, turns out that I'm not a really good fit.
Just look into the next one now.
Because you can have like on the case on the on if you're comparing to
a switch right on the case part you can actually have a pattern guard that does an extra checking
on top of what you're trying to match so and if that doesn't match you could go to the next one
but if that matches then you have like you can bring on a new scope and actually write things on that scope.
So I think for expressiveness of things like that, where you want to be able to write more
with the example of the decoder, for example, lots of examples used.
Yeah, you can definitely be more expressive.
I was playing with the discoder for Motorola 68K,
as we were previously discussing,
and I was like, oh, we need to see pattern matching happen soon
because I would really like to use this here.
You know, like...
But yeah, that's the way I see it.
I think it's pretty handy for being really clear on what you mean
while you're going through a switch, especially if it's a complicated one.
And of course, you have all the...
This is just my low-level intake,
but it's going to be way more powerful than a switch, right?
Because you could inspect on a type,
you could also inspect on tuples,
and all the things that are more recent to C++.
And the idea is that the pattern match will help you out express using all those new things.
And I think that's pretty cool. Also, it's all constexpr proof, so you could theoretically write
a very simple switch like thing but that is
completely constexpr for instance if everything is constexpr we like constexpr i love it too
um okay i'm kind of curious since we've just been talking so much about
hardware and instruction sets and all these things,
is there anything that you think C++ programmers should be thinking about when programming
that relates to how the compiler is going to generate our code?
Or do you think we should just mostly just let the compiler do its job?
Or do you have any advice from the work that you've done?
I think if you're doing like uh like
more close to bare metal work um i would say i don't i don't think there's a lot uh extra things
you should care about of course like you should know that there's o2 and oz and that oz is gonna
like kind of like at least in clang right right? Like take a deeper care about your code size
and things like that.
But I would be, what I would advise though,
is like try to check what code gets generated
for the things you really care.
You know, like I'm not a big believer
of early optimization.
I think like you write your things
and then if you're writing code,
thinking about like testing your code and doing all the due diligence around it, it should became obvious like which parts need more care.
And then you go to those parts and then you might want to look into using specific attributes to kind of force some behaviors and things like that.
Like you're probably want to pack a few structs, right?
Like you're going to probably want to make some, depending on how much memory you have,
you might like want to be able to align some things. And some of those things, the compiler
might not be able to figure out themselves. So you've got to like give it extra clues.
But yeah, I would like looking to first getting
like just make sure you're writing good C++ code that other people can
understand first and then if if that's not enough you go to the specific parts
and then you look at things that people usually use in embedded like things like
packing struts and using other attributes to guarantee some of the semantics.
Be really careful with volatile and all those things.
Yeah.
Awesome.
Okay.
Well, it's been great having you on the show today, Bruno.
Is there anything else you want to share with our listeners
before we let you go?
No, it was a great pleasure to talk with you, Jason, you, Rob.
I'm a big fan of the podcast, and I think you guys are doing amazing work.
Please keep doing it.
Don't stop doing it.
It's pretty cool.
Thank you very much.
Awesome.
Thank you.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in,
or if you have a suggestion for a topic we'd love to hear about that too you can email all your thoughts to feedback at cppcast.com we'd also appreciate if you can like cppcast on facebook and follow cppcast
on twitter you can also follow me at rob w irving and jason at lefticus on twitter we'd also like to
thank all our patrons who help support the show Patreon. If you'd like to support us on
Patreon, you can do so at patreon.com
slash cppcast.
And of course you can find all that info and the
show notes on the podcast website
at cppcast.com
Theme music for this
episode was provided by podcastthemes.com