C++ Club - 151. JWST, Dogbolt, May-June ‘22 mailings, developer survey, errors, Unicode, mold

Episode Date: July 16, 2022

With Gianluca Delfino, Ivor Hewitt, and other colleaguesNotes: https://cppclub.uk/meetings/2022/151/Video: https://youtu.be/mPxivXzT9Y4...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the C++ Club. This is meeting number 151 that took place on the 14th of July 2022. The first images from James Webb Space Telescope have arrived and are absolutely mind-blowing. That gravitational lensing is something else. I'm mentioning this because, not just because this is an amazing achievement and I like science, but also because JWST runs on C++. We've discussed this before and there was a video interview with the team and they confirmed that it runs on C++ indeed. I think the operating system is VX Works as is usual with these things and the CPU is a hardened version of the PowerPC that was in Nintendo GameCube. And in this thread, the first comment is
Starting point is 00:01:07 Struggling to see why that's special, so do millions of other things. And I mean, technically yes, but can you imagine being this person looking at the JWST and going, meh, my microwave is also powered by C++. We have lost the sense of awe and wonder. Right. Everyone knows Godbolt, Compiler Explorer by now. Enter Dogbolt,
Starting point is 00:01:44 the Decompiler Explorer. Somebody sent me that this morning, I was actually having a little play with it because I normally use Ghidra and I used to use Hex Ray, so I'd approach and then I switched to Ghidra and it's just awesome, but yeah that is just brilliant that I can run side by side. So I'm not going to reverse engineering yet, but next time I do, I'll certainly fire that up and see how do they compare which ones are really getting it. So yeah, that is just amazing. Pretty amazing.
Starting point is 00:02:11 And the naming is just perfect. Dogbolt uses several open source decompilers to try and produce C-like source code for a binary program uploaded by the user, which has to be under 2 MB in size. It's not a trivial task, and the success readability rate of these tools varies wildly. But some results could actually be useful. The website offers a few example binaries as a showcase. DocBolt is open source and is available on github. I had a look at lots of docker stuff
Starting point is 00:02:46 and the main script I think is written in python. Right, we skipped two mailings of the committee. The May mailing and the June mailing. I collated the interesting papers in one set. Let's look at some of them. First of all, the sort of procedure oriented paper. 2022-11 Kona hybrid meeting information. The 2022 November meeting in Kona will be the first in-person committee meeting since the pandemic started. It will also be the first hybrid meeting with remote participation via Zoom, which is probably going to be challenging, according to some feedback I heard. Also, COVID is still here, despite what some authorities may say, and I hope that in-person participants will all be vaccinated and boosted.
Starting point is 00:04:00 There will be good ventilation in the venue, and face masks will be used sensibly so that the meeting doesn't turn into a super spread event next paper is by Jens Mora and it's called saturation arithmetic in order to implement some algorithms the use of saturation arithmetic is necessary, where an operation yielding a result whose absolute value is too large instead returns the smallest or largest representable number. For example, when determining the color of a pixel, it would not make sense that brightening a white pixel suddenly turns it black or dark gray. Instead, brightening a white pixel should simply yield a white pixel.
Starting point is 00:04:55 The paper proposes to add simple free functions for basic saturating operations on all signed and unsigned integer types. Further, a saturate cast is provided that can convert from any of those types to any other, saturating the value as needed. Is it an interesting new way to make signed integer overflow a defined operation? The author mentioned that a lot of SIMD instruction sets already have special instructions for saturating arithmetics the paper proposes that
Starting point is 00:05:32 the new saturating functions have short names like ADSAT and SUBSAT as these are basic low-level operations wouldn't it be better to have a set of special operators instead one can dream it's probably not happening so this is suggested as a feature as a standard feature for which um standard that would be for 23 or for 26 the saturation arithmetic
Starting point is 00:06:05 it's probably going to 26 at the earliest by the way there were several mails on the github issues regarding things that are not getting into 23 that were previously
Starting point is 00:06:23 tagged to go there because of committee's lack of time and one of those unfortunately is function ref it's not getting into 23 but it's definitely going to be in 26 i think so it's it should be pretty much ready it's just that they didn't have time to review the wording or whatever I guess it's not a major loss, it was an HD maybe No, probably not, it was something else that I forgot yeah I think there might be some other things that were destined for 23 that
Starting point is 00:07:06 are not going to get into into 23 because the backlog is huge this next paper is tuple protocol for C style arrays by paulo de giglio it will be dgio thank you this paper proposes to make c-style arrays of known size behave like tuples which should improve their usability in cases where c-style arrays can't be avoided, like when using C-style interfaces. That would mean that you could split C-style arrays into tuple or use structured bindings I suppose. Right, this next paper specifying the interoperability of binary module interface files. This paper by Daniel Rosso of Bloomberg specifies the mechanism to allow build systems to identify
Starting point is 00:08:20 if a binary module interface shipped with a pre-built library can be used directly, or if the build system needs to produce its own version of the binary module interface file. Binary modules need to have some sort of metadata included, so that the build system can determine if the pre-built binary module interface files are compatible with the currently used toolchain. I can see how this could work in an enterprise setting like Bloomberg, where compilers are upgraded across the board, but the upgrade doesn't happen very often, and so projects that depend on other libraries could often reuse pre-built module information files shipped with their internal
Starting point is 00:09:05 dependencies okay next one is a paper called static operator subscript this paper proposes to enable operator subscript to be static in line with an existing proposal that enables static operator function call. Next one is explicit lifetime management. This paper by Timo Dummler and Richard Smith is is about starting a lifetime of objects manually since C++ 20 you can use certain blessed standard library functions like malloc, bitcast and memcpy to start object lifetime and the example code is you allocate memory on the heap
Starting point is 00:10:09 of size of a struct and then because that starts the lifetime of that object you can immediately access its members and treat it like a normal structure this was you mentioned to replace the idea of standard bless? No, not really. They just use this term sort of blessed functions that are special in some some way. It's not the
Starting point is 00:10:38 standard bless. It's confusing. So the bless was it to basically allow for avoiding the undefined behavior that you would get with just reinterpret casting what would come out of a socket. Was that Stadlonda? I think there was an overlap, but I don't remember to be honest but i know that standard bless was then renamed to something else which may have been start lifetime as that sounds like more or less the same thing yeah so for memory allocated using any other function including user defined allocator for example a memory pool the above code snippet is undefined behavior. So this paper proposes a set of library functions that would start object lifetime
Starting point is 00:11:32 given arbitrary memory block. That was it, yeah. I think it was undefined behavior because we never called the construct of the object. And therefore, you know, technically that object never existed so calling bless or now I guess start lifetime as you would allow it to be used without undefined behavior in a way this standard start lifetime as would kind of call the constructor I don't know what we do actually under the hood but I think that was the idea yeah this is only being proposed for implicit lifetime types like aggregates as no constructor is actually being called
Starting point is 00:12:15 interesting yeah the proposed functions are like you said start lifetime as and start lifetime as array although in my humble opinion they could have been called something like stat create or indeed stat bless or maybe even stat evolve from a lowly flat memory buffer into a real actual object that would have been less controversial i guess guess. On the other hand, I guess being explicit about your intent is probably better. Also naming is hard. I do wonder how many people are going to actually start using this because the practice nowadays is just you get a piece of memory that is supposed to be some sort of aggregate and you just do reinterpret cast, especially when you need to do it fast and you have something
Starting point is 00:13:03 that comes out of a socket or something yeah well this is the approved way of doing that no random reinterpret costs std hive paper got updated again it's now at the revision 20. It addresses quite a few issues raised by the reviewers, including improvements to the technical specification, addition of C++ 20 ranges of loads and API extensions and clarifications. We are bound to get it at some point I guess the author is insisting so eventually it will work yeah eventually the committee will run out of issues to report to the author
Starting point is 00:13:58 and we'll have to accept it that's a marketing issue at this point I don't even know it's not a priority i guess some people are not convinced that it should be in the standard even and i understand you know we we still don't have important things like reflections you know so i know that it's different groups that should talk about this but if we cannot get those things in uh from what i heard the reflection is actually that was pattern matching uh
Starting point is 00:14:36 someone's working on it financed by the committee i think that would probably be Michael Park the original author I guess but I haven't heard much about reflection here right so this paper by Jeff Garland proposes to add monadic functions available for stdoptional to stdexpected the proposed functions are and then which composes a chain of functions returning std expected or else which returns if std expected has value or calls a function with the error and transform which applies a function to change value or type. In normal languages, this would be called map, but C++, so transform it is. Additional functions have been proposed. Transform error, which applies a function to change value or type.
Starting point is 00:15:40 Or if there's an error, it calls a function with error type. And error or, which returns a value when there is no error there are several snippets before and after so this before snippet calls hypothetical function from string that parses a string and returns an expected of time or a string containing the error message and in the before code you would check that the expected value is true which means it has a value and then act accordingly. And after this proposal with the monadic interface available, you would chain those functions from string.oralsprintError and.transform, for example, do something with the date.
Starting point is 00:16:40 Don't dislike it, but I think it's going to take some getting used to. I think this is what is getting into 23 for the stdoptional monadic interface already. Interesting. So probably if when we get standard expect, we also get this automatically just for parity with stdoptional. I'm not sure if expected is getting into 23 I don't remember but if 20 is one of those things that they're still figuring out hmm so yeah we should get it with this monadic interface in 23
Starting point is 00:17:20 probably lots of this is also a don't know if Boost Outcome has this kind of interface as well these days and it's kind of a similar object Boost Outcome is vastly more capable and supports all kinds of special error codes and yeah I think it also has monadic interface. This next paper allow multiple init statements. This is just revision zero so no idea how it'll fare. Justin Cook proposes to allow multiple init statements wherever an init statement is currently allowed specifically in for, if and switch statements. Currently you can only declare
Starting point is 00:18:11 more than one variable there if all declared variables are of the same type so as you can see this example of use the first line declares two variables of type int one after another separated by a comma and this is legal in C++20 the proposal is to make declaring like int k equals zero semicolon double s equals zero semicolon and then the condition clause and then the increment of the index many redditors have a problem with this they say that it makes the statements change its meaning depending on the number of semicolons in it and I kind of agree this is like pushing it too far maybe making it less readable than with just the init block you can always create a scope outside the loop if you need to declare lots of stuff definitely it's a little bit strange for the trained eye to parse
Starting point is 00:19:34 it now because you have an extra bit frankly also don't think I would use it very much I think you know this is one of those that people think oh there is a missing corner that it may be added because we have this here we may have this there as well yeah maybe it's not dead not convinced by the way tools like sea lion have already fixed it for the normal init statement like if you have a variable declared just outside the loop or an if they suggest that it should be moved to the inner scope nice right the next one is a format for describing dependencies of source files this is also related to modules it describes a format for discovery of source file and module dependencies to be generated or consumed by build systems the proposed format format is JSON and I just can't.
Starting point is 00:20:46 I wish developers would stop being so obsessed with JSON and trying to use it for anything remotely related to structured text or configuration information. I'm so old I remember when the same thing was happening to XML. It was being used everywhere. I guess it's okay I mean JSON as an intermediate data exchange format better than XML for sure but yes they're not very human readable it's not very human readable and it doesn't even support things like dates they say I just use string in a so format or comments indeed it's like i would uh honestly prefer toml over json but not yaml i had my share of working with yaml didn't one bit. Significant white space. Right, that's it for the papers. And now there was a C++
Starting point is 00:21:54 annual developer survey, which closed on the 7th of June. And the results are now available. And I wanted to say that it's probably the first time in several years when I didn't finish filling a developer survey. Not only the questions were subjective and seemed to seem to bias towards a particular understanding of the development world by the survey authors. As an example, when asking about IDEs and compilers, the only choice for usage were primary, secondary, and occasional. I often use more than three IDEs, and they take different priority on different platforms. Another problem was the multiple checkbox questions at the beginning asking where I use C++ with the following choices at work, at school and in personal time. It should be clear that these settings may be significantly different to the point that the subsequent questions should be separate for each of the ticked settings. But instead, the authors just joined everything together.
Starting point is 00:23:11 It appeared to me at the time that extracting any meaningful results from such a survey would be impossible. And I think I was right, looking at these results. The corresponding thread on Reddit started with this. I think an important missing question is, how much do you care about ABI stability of C++? The answer of that should guide many decisions of the standard committee. Yes, let's use surveys to guide the committee, because as we all know, especially my UK colleagues, referendums work really well for making important decisions. Imagine if they wanted to turn the committee process into some sort of a democracy where
Starting point is 00:23:59 the person that screams the loudest wins, as usually is in this kind of stuff. Yeah. We would have a dictatorship of the minority in one minute. Lots of papers approved or not approved. And now that the results are in. And they confirm my fears the results themselves are in pdf format and oh my word clipped text like uh if you if you look at more complex charts, like you would think that the choices would be like visible because there's plenty of space, but they decided to limit them to two lines and then just cut them off. And that mysterious hundred percent scale for all the questions, all the charts. This makes them useless.
Starting point is 00:25:09 And the data also wasn't sanitized. What organizations come to your mind the most when you think about C++ and why? And they decided to do a word cloud without sanitizing it. So you get things like seam, main, et cetera, mostly. Since, given the question. Microsoft, Google, and Google, Microsoft. I mean, given the questions, I had low expectations, but damn. Right. Given the questions I had low expectations, but damn. Right, next one is how to handle errors. This old chestnut again.
Starting point is 00:25:55 Redditor wants to get an idea what people use for error handling in C++ these days. One of the responses says, quote, for me, it's exceptions alone until I can see through the measurement and with a profiler that they are too costly. Then it's still exceptions alone, except for the parts that show hot in the measurement. I understand that some domains cannot use exceptions, but I rather think they are few and between. Too many people think they are special when they are not. High frequency trading people working with exceptions tell me I have some leeway. The response to the statement exceptions break control flow was, quote, early return also
Starting point is 00:26:42 breaks control flow and is considered a correct way to do it. So someone said, my problem with exceptions is less about performance and more about being very anxious of a random exception from a random function that I didn't think can throw exceptions. And to that, the same redditor replied, oh, I absolutely don't worry about that. What you say is quite common. And in my not so humble opinion just needs a small bit of understanding. Code must be designed with exception safety
Starting point is 00:27:20 guarantees in mind. Note that early return and go to are similar to exceptions, point being this kind of thinking is far from specific to exceptions. And the no throw guarantees exist for an exceedingly small number of functions and other code artifacts, notably extern C equals, non-throwing swap, plain old data type, assignment, things like that. They are easily visible. And another reply to the same post goes, quote, that's the wrong mindset with exceptions.
Starting point is 00:27:55 You should assume every line can throw and write your code so that the cleanup is done automatically by default. So you have the basic exception guarantee always and only when it matters for you to have stronger guarantees you use constructs that you know can't throw in order to build the guarantees you need at that point it should never be a matter of being worried that a random function may throw you either know for sure or don't care." End quote. The general vibe of the thread seems to be just use exceptions
Starting point is 00:28:31 and not worry too much about their cost, which is perhaps a bit surprising given the number of new error handling mechanisms proposed recently and widely published perceived problems with exceptions. Some posters in the thread state quite correctly that there is no universal error handling solution that is going to suit every need and use case, and in some cases you may want to use std expected or similar class as a function result. However, the problem with this is that you'll have to either handle the errors locally or propagate them manually, which exceptions give you automatically. While waiting for C++23, you can use Cybrand's TL-Expected,
Starting point is 00:29:19 which is std expected with functional extensions. It's available on GitHub where the author put it in public domain. It's also available via Conan or VC package manager. It has nice documentation. It works with C++ 11, 14 and 17 and compiles with GCC, Clang and MSVC. There was another implementation of std expected just announced on reddit this one also supports monadic extensions and this one requires C++ 20. It's under MIT license and yeah use it. So yeah that's settled then. Error handling is solved. Right? Actually there was another library proposed and I think that's what is going to settle error handling once and for all. So a redditor created a library called inline try. Quote, I decided to go to do a thing and solve this issue once and for all. With inline try you can turn any exception-based function into an expected-based function.
Starting point is 00:30:46 End quote. The library wraps function calls in try-catch block and returns to the expected, thus reducing exceptions down to mere return codes that you check after each function call. And the funniest thing about it is that the author clearly meant this as a joke. But the redditors in the thread seem to have completely missed it. As expected, see what I did there, the comment has descended into the usual discussions of exceptions versus no exceptions, herbceptions, how this is similar to Boostleaf, and efficiency of the proposed code. So I guess stay tuned for more error handling discussions.
Starting point is 00:31:31 And by the way, the library is under GPL, so now you can't wrap exceptions and return expected without open-sourcing your entire program. Next one is a link to an old C++11 2016 but still useful 10 lecture course by Stefan T. Lavoie on YouTube. It's called Core C++ and it's good. There's 10 episodes, around one hour each. If you have someone learning C++, this is a good resource. So this is a weird one. Tom Horniman is the chair of SG16 Unicode and Text Processing Study Group. And he posted a quiz on Twitter. There is a function that takes two parameters of type int. The first one is x and the second one
Starting point is 00:32:33 is this weird symbol. So the function has this body return x minus 321 with each digit in that number separated by apostrophe and minus that weird symbol that's the other parameter and this function is called in main with the parameters 3 2 1 and 1 2 3 and And the result is printed. And the question is, without checking, what output is produced? The majority of people, including myself, said a minus 123. And that was wrong.
Starting point is 00:33:20 But why? See that weird character? It's a Hebrew character used for parameter name, and it's called tav. It's pronounced as voiceless t. But more importantly, its Unicode bidirectional class is right-to-left, and its mere presence causes nearby characters to be interpreted in the right-to-left order. So the expression x-321-tav is seen by the compiler as x-123-tav. x minus one two three minus time and so the current correct answer to the quiz is 75.
Starting point is 00:34:17 some text editors like vs code try to mitigate this by inserting a special unicode character called left to right mark after each token by the way trying to paste this code snippet and then editing it in VS Code for the meeting nodes was an exercise in frustration, as the cursor was moving all over the place on the line containing this character. Tom writes, SG16 plans to propose allowances for implicit directional marks to appear in conjunction with other whitespace characters in a future C++ standard. Probably to mitigate this situation that's not ideal. In the meantime, if you value your sanity, try to not use non-left-to-right characters in your source code. And don't use that as an interview question Nikolai Yosutis is writing a book on C++20
Starting point is 00:35:14 it's 95% complete or maybe now it's already fully complete you can buy it on LeanPub for a suggested price of 44.90, minimum price 22.90 plus VAT. Updates are free so that you'll be able to download new versions of the book as it's being completed. The table of contents suggests that the book is very detailed and thorough. I'm currently finishing up Nikolai's book on C++17 and it's really very good. He tends to go into minuscule details and explain things very thoroughly. Next one is an article about the Moldlinker.
Starting point is 00:36:06 Martin Richtarski, a developer from Germany, wrote a blog post on his Productive C++ blog called Using the Moldlinker for Fun and 3-8x Link Time Speedups. It's a very interesting article. It's quite long, quite thorough. It starts with a quick and very high-level introduction to the C++ build process. Quote, best practices for writing C++ code and a distributed build system can go a long way in reducing compile times.
Starting point is 00:36:40 But in this post, we want to focus on speeding up the linking step, which comes after building the object files of a library or executable end quote one tip i intend to try right away was a linker switch i didn't know about and that was gsplit dwarf i think it's mentioned somewhere towards the end the author says this outsources the debugging from the object file into an adjacent file and therefore reduces the work the linker has to perform. Yeah, it makes a bit of a difference overall.
Starting point is 00:37:15 But when you've got decent machines with a decent amount of memory, it's not that much of a difference these days. The main thing that makes the biggest difference is simply putting attributes to reduce the actual number of symbols explicitly, whereas obviously the standard practice is everybody just exports everything.
Starting point is 00:37:32 Yeah. The most efficient and most effective way of speeding up link times is clearly defining your APIs and reducing the amount of symbols, but clearly that is a lot of work. Yeah. What's most interesting, though, is the author's real world experience using Mold, which is going to be very useful real soon, I hope. There is even a solution for using Mold with ICC compiled objects. The provided benchmarks show marked improvement in link times when using
Starting point is 00:38:01 Mold. There was an interesting related tweet by Rui Ueyama the creator of mold quote leaving Google and starting working on mold was a bet and it's going well so far the idea is to try to replicate the success of mold in C++ is growing on me It feels like we might be able to write a 10 times faster C++ compiler if we really focus on speed. Just thinking. And now some amusing tweets. Jonah Miller writes, why would I want a programming crash course? I can make my programs crash without help, thanks. A quote by Kevin Farzad. Sure, I made mistakes when I was younger. But now that I'm older,
Starting point is 00:38:56 I've learned how to make different, often far more serious mistakes. And finally, this is from Reddit. This person wins Reddit for this answer on how to mock databases. I usually start by saying, oh, look at me, I'm a database. I could be replaced with a text file, but I'm also important in a really sarcastic way right that's it for today thank you for joining me until next time thank you bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.