C++ Club - 151. JWST, Dogbolt, May-June ‘22 mailings, developer survey, errors, Unicode, mold
Episode Date: July 16, 2022With Gianluca Delfino, Ivor Hewitt, and other colleaguesNotes: https://cppclub.uk/meetings/2022/151/Video: https://youtu.be/mPxivXzT9Y4...
Transcript
Discussion (0)
Welcome to the C++ Club. This is meeting number 151 that took place on the 14th of July 2022.
The first images from James Webb Space Telescope have arrived
and are absolutely mind-blowing. That gravitational lensing is something else.
I'm mentioning this because, not just because this is an amazing achievement and
I like science, but also because JWST runs on C++. We've discussed this before and there was a video
interview with the team and they confirmed that it runs on C++ indeed. I think the operating system is VX
Works as is usual with these things and the CPU is a hardened version of the PowerPC that was in
Nintendo GameCube. And in this thread, the first comment is
Struggling to see why that's special, so do millions of other things.
And I mean, technically yes, but can you imagine being this person looking at the
JWST and going, meh, my microwave is also powered by C++.
We have lost the sense of awe and wonder.
Right.
Everyone knows Godbolt,
Compiler Explorer by now.
Enter Dogbolt,
the Decompiler Explorer. Somebody sent me that this morning, I was
actually having a little play with it because I normally use Ghidra and I used to use Hex Ray,
so I'd approach and then I switched to Ghidra and it's just awesome, but yeah that is just brilliant
that I can run side by side. So I'm not going to reverse engineering yet, but next time I do,
I'll certainly fire that up and see how do they compare which ones are really
getting it.
So yeah, that is just amazing.
Pretty amazing.
And the naming is just perfect.
Dogbolt uses several open source decompilers
to try and produce C-like source code for a binary program
uploaded by the user, which has to be under 2 MB in size.
It's not a trivial task, and the success readability rate of these tools varies wildly.
But some results could actually be useful.
The website offers a few example binaries as a showcase.
DocBolt is open source and is available on github. I had a look at lots of docker stuff
and the main script I think is written in python.
Right, we skipped two mailings of the committee. The May mailing and the June mailing. I collated the interesting papers in one set. Let's look at some of them.
First of all, the sort of procedure oriented paper. 2022-11 Kona hybrid meeting information. The 2022 November meeting in Kona will be the
first in-person committee meeting since the pandemic started. It will also be the first
hybrid meeting with remote participation via Zoom, which is probably going to be challenging,
according to some feedback I heard.
Also, COVID is still here, despite what some authorities may say,
and I hope that in-person participants will all be vaccinated and boosted.
There will be good ventilation in the venue,
and face masks will be used sensibly so that the meeting doesn't turn into a super spread event
next paper is by Jens Mora and it's called saturation arithmetic in order to
implement some algorithms the use of saturation arithmetic is necessary,
where an operation yielding a result whose absolute value is too large instead returns
the smallest or largest representable number.
For example, when determining the color of a pixel, it would not make sense that brightening a white pixel suddenly turns it black or dark gray.
Instead, brightening a white pixel should simply yield a white pixel.
The paper proposes to add simple free functions for basic saturating operations on all signed and unsigned integer types. Further, a saturate cast is
provided that can convert from any of those types to any other, saturating the
value as needed. Is it an interesting new way to make signed integer overflow a
defined operation? The author mentioned that a lot of
SIMD instruction sets
already have special instructions
for saturating arithmetics
the paper proposes that
the new saturating functions
have short names like
ADSAT
and SUBSAT
as these are basic low-level operations
wouldn't it be better to have a set of special operators
instead one can dream it's probably not happening so this is suggested as a feature as a standard
feature for which um standard that would be for 23 or for 26 the saturation arithmetic
it's probably going to 26
at the earliest
by the way there were
several mails
on the github
issues
regarding things that are not getting into
23 that were previously
tagged
to go there because of
committee's lack of time and one of those unfortunately is function ref
it's not getting into 23
but it's definitely going to be in 26
i think so it's it should be pretty much ready it's just that they didn't have time to review the wording or whatever
I guess it's not a major loss, it was an HD maybe
No, probably not, it was something else that I forgot yeah I think there might be some other things that were destined for 23 that
are not going to get into into 23 because the backlog is huge this next
paper is tuple protocol for C style arrays by paulo de giglio
it will be dgio
thank you this paper proposes to make c-style arrays of known size behave like tuples which
should improve their usability in cases where c-style arrays can't be avoided, like
when using C-style interfaces. That would mean that you could split C-style arrays
into tuple or use structured bindings I suppose. Right, this next paper specifying the interoperability of binary module interface files.
This paper by Daniel Rosso of Bloomberg specifies the mechanism to allow build systems to identify
if a binary module interface shipped with a pre-built library can be used directly,
or if the build system needs to produce its own version of the binary module interface file.
Binary modules need to have some sort of metadata included,
so that the build system can determine if the pre-built binary module interface files
are compatible with the currently used toolchain.
I can see how this could work in an enterprise setting like Bloomberg,
where compilers are upgraded across the board, but the upgrade doesn't happen very often,
and so projects that depend on other libraries could often reuse pre-built module information files shipped with their internal
dependencies okay next one is a paper called static operator subscript this paper proposes
to enable operator subscript to be static in line with an existing proposal that enables static operator function
call. Next one is explicit lifetime management. This paper by Timo Dummler and Richard Smith is is about starting a lifetime of objects manually
since C++ 20 you can use certain blessed standard library functions
like malloc, bitcast and memcpy
to start object lifetime
and the example code is
you allocate memory on the heap
of size of a struct
and then
because that starts the lifetime of that object
you can immediately access its members
and treat it like a normal structure
this was you mentioned to replace the idea of standard bless?
No, not really. They just use this term sort of blessed
functions that are special in some some way. It's not the
standard bless. It's confusing.
So the bless was it to basically allow for avoiding the undefined behavior that you would get with just reinterpret casting what would come out of a socket.
Was that Stadlonda?
I think there was an overlap, but I don't remember to be honest but i know that standard bless was then renamed
to something else which may have been start lifetime as that sounds like more or less the
same thing yeah so for memory allocated using any other function including user defined allocator
for example a memory pool the above code snippet is undefined behavior.
So this paper proposes a set of library functions that would start object lifetime
given arbitrary memory block.
That was it, yeah.
I think it was undefined behavior because we never called the construct of the object.
And therefore, you know, technically that object never existed so calling bless or now I
guess start lifetime as you would allow it to be used without undefined behavior in a way this
standard start lifetime as would kind of call the constructor I don't know what we do actually under
the hood but I think that was the idea
yeah this is only being proposed for implicit lifetime types like aggregates as no constructor is actually being called
interesting yeah the proposed functions are like you said start lifetime as and start lifetime as array
although in my humble opinion they could have been called something like stat create or indeed stat bless
or maybe even stat evolve from a lowly flat memory buffer into a real actual object
that would have been less controversial i guess guess. On the other hand, I guess being explicit about your intent is probably better.
Also naming is hard.
I do wonder how many people are going to actually start using this because the practice
nowadays is just you get a piece of memory that is supposed to be some sort of aggregate
and you just do reinterpret cast, especially when you need to do it fast and you have something
that comes out of a socket or something
yeah well this is the approved way of doing that no random reinterpret costs
std hive paper got updated again it's now at the revision 20. It addresses quite a few issues raised by the reviewers, including improvements to the technical specification, addition of C++ 20
ranges of loads and API extensions and clarifications. We are bound to get it at some point I guess the author is insisting
so eventually it will work
yeah
eventually the committee will run out of
issues to report to the author
and we'll have to accept it
that's a marketing issue at this point
I don't even know it's not a priority i guess
some people are not convinced that it should be in the standard even and i understand you
know we we still don't have important things like reflections you know so
i know that it's different groups that should talk about this but
if we cannot get those things in
uh from what i heard the reflection is actually that was pattern matching uh
someone's working on it financed by the committee i think that would probably be Michael Park the original author I guess but I
haven't heard much about reflection here right so this paper by Jeff Garland
proposes to add monadic functions available for stdoptional to
stdexpected the proposed functions are and then which composes a chain of functions returning std expected
or else which returns if std expected has value or calls a function with the error
and transform which applies a function to change value or type. In normal languages, this would be called map, but C++, so transform it is.
Additional functions have been proposed.
Transform error, which applies a function to change value or type.
Or if there's an error, it calls a function with error type.
And error or, which returns a value when there is no error
there are several snippets before and after
so this before snippet calls hypothetical function from string that parses a string and returns an expected of time or a string
containing the error message and in the before code you would check that the expected value
is true which means it has a value and then act accordingly.
And after this proposal with the monadic interface available, you would chain those functions
from string.oralsprintError and.transform, for example, do something with the date.
Don't dislike it, but I think it's going to take some getting used to.
I think this is what is getting into 23 for the stdoptional monadic interface already.
Interesting.
So probably if when we get standard expect, we also get this automatically just for parity
with stdoptional.
I'm not sure if expected is
getting into 23 I don't remember but if 20 is one of those things that they're
still figuring out hmm so yeah we should get it with this monadic interface in 23
probably lots of this is also a don't know if Boost Outcome has this kind of interface as well these days
and it's kind of a similar object
Boost Outcome is vastly more capable and supports all kinds of special error codes
and yeah I think it also has monadic interface.
This next paper allow multiple init statements. This is just revision zero so
no idea how it'll fare. Justin Cook proposes to allow multiple init statements wherever
an init statement is currently allowed
specifically in for, if and switch statements. Currently you can only declare
more than one variable there if all declared variables are of the same type
so as you can see this example of use the first line declares two variables of type int one after another separated by
a comma and this is legal in C++20 the proposal is to make declaring like
int k equals zero semicolon double s equals zero semicolon and then the condition clause and then
the increment of the index many redditors have a problem with this they
say that it makes the statements change its meaning depending on the number of semicolons in it and I kind of agree this is like pushing it too far maybe making it less readable than with
just the init block you can always create a scope outside the loop if you need to declare
lots of stuff definitely it's a little bit strange for the trained eye to parse
it now because you have an extra bit frankly also don't think I would use it
very much I think you know this is one of those that people think oh there
is a missing corner that it may be added because we have this here we may have this there as well
yeah maybe it's not dead not convinced by the way tools like sea lion have already fixed it for the normal init statement like if
you have a variable declared just outside the loop or an if they suggest
that it should be moved to the inner scope nice right the next one is a format for describing dependencies of source files
this is also related to modules it describes a format for discovery of source file and module
dependencies to be generated or consumed by build systems the proposed format format is JSON and I just can't.
I wish developers would stop being so obsessed with JSON and trying to use it for anything
remotely related to structured text or configuration information.
I'm so old I remember when the same thing was happening to XML.
It was being used everywhere.
I guess it's okay I mean JSON as an intermediate data exchange format
better than XML for sure but yes they're not very human readable it's not very human readable and it
doesn't even support things like dates they say I just use string in a so format or comments indeed it's like i would
uh honestly prefer toml over json but not yaml i had my share of working with yaml didn't one bit. Significant white space. Right, that's it for the papers. And now there was a C++
annual developer survey, which closed on the 7th of June. And the results are now available. And I wanted to say that it's probably the first time in several years when I didn't finish filling a developer survey.
Not only the questions were subjective and seemed to seem to bias towards a particular understanding of the development world by the survey authors.
As an example, when asking about IDEs and compilers, the only choice for usage were primary, secondary, and occasional.
I often use more than three IDEs, and they take different priority on different platforms. Another problem was the multiple checkbox questions at the beginning asking where I
use C++ with the following choices at work, at school and in personal time.
It should be clear that these settings may be significantly different to the point that
the subsequent questions should be separate for each of the ticked settings.
But instead, the authors just joined everything together.
It appeared to me at the time that extracting any meaningful results from such a survey would be impossible.
And I think I was right, looking at these results.
The corresponding thread on Reddit started with this.
I think an important missing question is, how much do you care about ABI stability of
C++? The answer of that should guide many decisions of the standard committee.
Yes, let's use surveys to guide the committee, because as we all know, especially my UK colleagues,
referendums work really well for making important decisions.
Imagine if they wanted to turn the committee process into some sort of a democracy where
the person that screams the loudest wins, as usually is in this kind of stuff.
Yeah. We would have a dictatorship of the minority in one minute.
Lots of papers approved or not approved.
And now that the results are in.
And they confirm my fears the results themselves are in pdf format and oh my word clipped text like uh if you if you look at more complex charts, like you would think that the choices would be like visible because
there's plenty of space, but they decided to limit them to two lines and then just cut
them off.
And that mysterious hundred percent scale for all the questions, all the charts. This makes them useless.
And the data also wasn't sanitized.
What organizations come to your mind the most when you think about C++ and why?
And they decided to do a word cloud without sanitizing it. So you get things like seam, main, et cetera, mostly.
Since, given the question.
Microsoft, Google, and Google, Microsoft.
I mean, given the questions, I had low expectations, but damn.
Right. Given the questions I had low expectations, but damn. Right, next one is
how to handle errors. This old chestnut again.
Redditor wants to get an idea what people use for error handling in C++ these days.
One of the responses says, quote, for me, it's exceptions alone until I can see through the measurement and with a profiler that they are too costly.
Then it's still exceptions alone, except for the parts that show hot in the measurement.
I understand that some domains cannot use exceptions, but I rather think they are few
and between.
Too many people think they are special when they are not.
High frequency trading people working with exceptions tell me I have some leeway.
The response to the statement exceptions break control flow was, quote, early return also
breaks control flow and is considered a correct way to do it.
So someone said, my problem with exceptions
is less about performance and more about being very anxious
of a random exception from a random function
that I didn't think can throw exceptions.
And to that, the same redditor replied,
oh, I absolutely don't worry about that. What you say is quite common. And in my not so humble
opinion just needs a small bit of understanding. Code must be designed with exception safety
guarantees in mind. Note that early return and go to are similar to exceptions,
point being this kind of thinking is far from specific to exceptions.
And the no throw guarantees exist for an exceedingly small number of functions
and other code artifacts, notably extern C equals, non-throwing swap,
plain old data type, assignment, things like that.
They are easily visible.
And another reply to the same post goes, quote,
that's the wrong mindset with exceptions.
You should assume every line can throw and write your code
so that the cleanup is done automatically by default.
So you have the basic exception guarantee always and
only when it matters for you to have stronger guarantees you use constructs that you know
can't throw in order to build the guarantees you need at that point it should never be a matter of
being worried that a random function may throw you either know for sure or don't care."
End quote.
The general vibe of the thread seems to be just use exceptions
and not worry too much about their cost, which is perhaps
a bit surprising given the number of new error handling
mechanisms proposed recently and widely published
perceived problems with exceptions.
Some posters in the thread state quite correctly that there is no universal error handling solution that is going to suit every need and use case, and in some cases you may want to use std expected or
similar class as a function result. However, the problem with this is that you'll have to either handle the errors locally
or propagate them manually, which exceptions give you automatically.
While waiting for C++23, you can use Cybrand's TL-Expected,
which is std expected with functional extensions. It's available on GitHub where the author put it in public domain.
It's also available via Conan or VC package manager. It has nice documentation.
It works with C++ 11, 14 and 17 and compiles with GCC, Clang and MSVC. There was another
implementation of std expected just announced on reddit this one also
supports monadic extensions and this one requires C++ 20.
It's under MIT license and yeah use it. So yeah that's settled then. Error handling is solved.
Right? Actually there was another library proposed and I think that's what is going to settle error handling once and for all. So a redditor created a library called inline try. Quote, I decided to go to do a
thing and solve this issue once and for all. With inline try you can turn any exception-based function into an expected-based function.
End quote.
The library wraps function calls in try-catch block and returns to the expected,
thus reducing exceptions down to mere return codes that you check after each function call.
And the funniest thing about it is that the author clearly meant this as a joke.
But the redditors in the thread seem to have completely missed it.
As expected, see what I did there,
the comment has descended into the usual discussions of exceptions versus no exceptions,
herbceptions, how this is similar to Boostleaf, and efficiency of the proposed code. So I guess stay tuned for more error handling discussions.
And by the way, the library is under GPL, so now you can't wrap exceptions and return expected without open-sourcing your entire program.
Next one is a link to an old C++11 2016 but still useful 10 lecture course by Stefan T. Lavoie
on YouTube. It's called Core C++ and it's good. There's 10 episodes, around one hour each.
If you have someone learning C++, this is a good resource.
So this is a weird one.
Tom Horniman is the chair of SG16 Unicode and Text Processing Study Group.
And he posted a quiz on Twitter.
There is a function that takes two parameters of type int. The first one is x and the second one
is this weird symbol. So the function has this body return x minus 321 with each digit in that number separated by apostrophe and minus that weird symbol
that's the other parameter and this function is called in main with the parameters 3 2
1 and 1 2 3 and And the result is printed.
And the question is, without checking,
what output is produced?
The majority of people, including myself,
said a minus 123.
And that was wrong.
But why?
See that weird character?
It's a Hebrew character used for parameter name, and it's called tav.
It's pronounced as voiceless t.
But more importantly, its Unicode bidirectional class is right-to-left, and its mere presence causes nearby characters to be interpreted
in the right-to-left order.
So the expression x-321-tav is seen by the compiler as x-123-tav. x minus one two three minus time
and so the current correct answer to the quiz is 75.
some text editors like vs code try to mitigate this by inserting a special unicode character called left to right mark after each token
by the way trying to paste this code snippet and then editing it in VS Code
for the meeting nodes was an exercise in frustration,
as the cursor was moving all over the place on the line containing this character.
Tom writes, SG16 plans to propose allowances for implicit directional marks to appear in conjunction with other whitespace characters in a future C++ standard.
Probably to mitigate this situation that's not ideal.
In the meantime, if you value your sanity, try to not use non-left-to-right characters in your source code.
And don't use that as an interview question Nikolai Yosutis is writing a book on C++20
it's 95% complete
or maybe now it's already fully complete
you can buy it on LeanPub for a suggested price of 44.90,
minimum price 22.90 plus VAT. Updates are free so that you'll be able to download
new versions of the book as it's being completed. The table of contents suggests
that the book is very detailed and thorough. I'm currently finishing up
Nikolai's book on C++17 and it's really very good. He tends to go into minuscule
details and explain things very thoroughly. Next one is an article about the Moldlinker.
Martin Richtarski, a developer from Germany,
wrote a blog post on his Productive C++ blog
called Using the Moldlinker for Fun and 3-8x Link Time Speedups.
It's a very interesting article.
It's quite long, quite thorough.
It starts with a quick and very high-level introduction to the C++ build process.
Quote, best practices for writing C++ code and a distributed build system can go a long
way in reducing compile times.
But in this post, we want to focus on speeding up the linking step, which comes after building
the object files of a library or executable
end quote one tip i intend to try right away was a linker switch i didn't know about
and that was gsplit dwarf i think it's mentioned somewhere towards the end
the author says this outsources the debugging
from the object file into an adjacent file
and therefore reduces the work the linker has to perform.
Yeah, it makes a bit of a difference overall.
But when you've got decent machines
with a decent amount of memory, it's not that much of a
difference these days.
The main thing that makes the biggest difference
is simply putting attributes
to reduce the actual number of symbols explicitly,
whereas obviously the standard practice
is everybody just exports everything.
Yeah.
The most efficient and most effective way
of speeding up link times is clearly defining your APIs
and reducing the amount of symbols,
but clearly that is a lot of work.
Yeah. What's most interesting, though, is the author's real world experience using Mold,
which is going to be very useful real soon, I hope. There is even a solution for using Mold
with ICC compiled objects. The provided benchmarks show marked improvement in link times when using
Mold. There was an interesting related tweet by
Rui Ueyama the creator of mold quote leaving Google and starting working on
mold was a bet and it's going well so far the idea is to try to replicate the
success of mold in C++ is growing on me It feels like we might be able to write a 10 times faster C++ compiler
if we really focus on speed. Just thinking. And now some amusing tweets.
Jonah Miller writes, why would I want a programming crash course?
I can make my programs crash without help, thanks.
A quote by Kevin Farzad. Sure, I made mistakes when I was younger. But now that I'm older,
I've learned how to make different, often far more serious mistakes.
And finally, this is from Reddit.
This person wins Reddit for this answer on how to mock databases.
I usually start by saying, oh, look at me, I'm a database.
I could be replaced with a text file, but I'm also important in a really sarcastic way right that's it for today
thank you for joining me until next time thank you bye