C++ Club - Meeting 148

Starting point is 00:00:00 Welcome to C++ Club. This is meeting 148 that took place on the 5th of May 2022. We'll start with WG21 April mailing. There are several updates for papers that we talked about previously and a few new proposals. Let's quickly go through some of them. Proxy. A polymorphic programming library. This paper by Mingxin Wang of Microsoft introduces a template-based notion of a proxy that combines object-oriented programming and functional programming to provide an efficient implementation of polymorphism. It is supposedly capable of replacing virtual function mechanism entirely while being more

Starting point is 00:00:50 efficient. There is a header-only test implementation for C++20. The author illustrates the proposal by comparing it against a standard polymorphic class hierarchy with the following abstract base class defining an interface. Class iDrawable, virtual, void draw, pure virtual function, and then we have class rectangle derived from public iDrawable, overriding function draw, and another class Circle, deriving from public iDrawable, and overriding function draw for itself. Then we have a caller function, do something with drawable that takes a pointer to iDrawable the usual stuff now architecting with the proxy you define a dispatch

Starting point is 00:01:53 called draw as a type so it's a struct draw that inherits stud dispatch of void return type and then you have a function template that is an operator function call which takes a self template parameter by const reference and in the function body it calls drawOnSelf. After that you define a so-called facade struct fDrawable derived from std facade of draw the template parameter is draw, it's an empty struct, and with that the actual implementation classes can then be defined without using any virtual functions. So we have a class Circle with its own void draw function, and so on.

Starting point is 00:03:11 A function that uses those would be defined as doSomethingWithDrawable, and it would take std proxy of fDrawable template argument by value and inside it would call p, which is the proxy for your facade, p.invoke and for that invoke as a template parameter, it would specify the name of the function that you need to call. So that would be pInvokeDraw and empty parenthesis. Now let's say we need to add another function to the drawable so-called interface. Instead of adding a new virtual function to the base class and overriding it in all the derived classes with proxy, we only need to add another dispatch. And, for example, if we wanted to compute the area of a shape, We would say that another dispatch would be a struct area derived from the dispatch of double function without parameters, and inside

Starting point is 00:04:38 we would have an operator function call template that would call area on the parameter self of the type T of the template parameter. Then for that we would need to update our fDrawable facade by adding another template parameter after the existing draw called area. And after that, in the caller function, we would be able to just call the facade and invoke the area function. The author also describes a factory function use case. Regarding his motivation, the author writes, quote, Currently, the standard polymorphic wrapper types, including std function and std any,

Starting point is 00:05:37 are based on value semantics. Polymorphic wrappers based on value semantics have certain limitations in lifetime management comparing to pointer semantics. Designing the proxy library based on pointer semantics decouples the responsibility of lifetime management from the proxy, which provides more flexibility and helps consistency in API design without reducing runtime performance." Interesting technique, and according to the author, the generated assembly is also better than that of virtual functions. The implementation uses type erasure and stores function pointers in stdunit pointer, which

Starting point is 00:06:23 unfortunately means there is a cost of heap allocation. The author compares proxy to other similar libraries like dyno by Louis Dion, which uses value semantics, and dynamic generic programming with virtual concepts Binary Programming with Virtual Concepts by Andrea Proli So even if it's not accepted into C++, it could be implemented as a library, since it doesn't require any new language features. The test implementation doesn't currently build in Clang, as it lacks support of conditionally trivial special member functions. Next paper is called Structured Bindings

Starting point is 00:07:07 Can Introduce a Pack. This paper by Barry Revzin and Jonathan Wakely proposes to allow the following syntax auto equals function call, which is fine, it's okay today. But the new proposed syntax would be auto triple dot xs equals function call. xs is a pack of length 3 containing an x, y and z of a tuple returned from that function call. There is a difficulty with implementation needed to support the following usage. If we have a function called autoSumNonTemplate taking some concrete type tuple, and inside the function we want to introduce a pack to brackets close equals tuple. And then we use

Starting point is 00:08:30 a folding expression to sum the elements and return them. The authors write, quote, we have not yet in the history of C++ had this notion of packs outside of dependent contexts. This is completely novel and deposes a burden on implementations to have to track packs outside of templates where they previously had not." There is a test implementation in Clang and Compiler Explorer. Hash embed a scannable tooling friendly binary resource inclusion mechanism. This proposal by Jean-Huidh Meneed resurfaced after a pause, raising hope to be able to easily include binary data in programs. The syntax would be very simple.

Starting point is 00:09:26 You would declare and define an array of unsigned char, and in the curly braces, instead of initializing that array, you would write hash embed followed by a file name which would be read into that array at compile time and convert it to an array of unsigned char. To remind you of the history of this proposal it's an evolution of the initial std embed feature that was supposed to be a library function, but the author decided to go with a preprocessor directive-based feature as a start. Hopefully we get it now in C++ 26, most likely. std execution gets another update addressing some of the issues and review comments. Just look at this example code implementing recursive file copying. Is not beautiful?

Starting point is 00:10:32 I'm still scrolling. Still scrolling. Still scrolling. Yeah. Still scrolling. Still scrolling. Yep. Still scrolling. Around 400 lines. Next one is

Starting point is 00:10:58 equals delete should have a reason. This proposal by Yi He Li proposes to add a message to equals delete so that in case of usage the error message is more meaningful. Yeah, I guess it could be better than just a comment. This paper by Daniel Rosso of Bloomberg proposes a mechanism to allow a pre-built library to specify which modules it provides to clients by distributing metadata files alongside library binaries to use them in client linker commands. Quote, this specification may become obsolete by a wider scale convergence in the area of package management in the C++ ecosystem.

Starting point is 00:11:52 I'm not holding my breath. Next we have a comment in the Reddit thread for the mailing. Quote, man, reflection really fell off the mailing. Quote Man, reflection really fell off the map. There was a lot of activity for a while. To which Niall Douglas replies, quote There should be

Starting point is 00:12:15 a whole bunch more activity soon. The Standard C++ Foundation have been funding a dedicated developer in this area since last year, I think. Just like how the development of ranges was funded. It just takes time to bake the cookie, that's all." End quote. Of all the future C++ features, reflection is my most anticipated one. No, actually pattern matching is most anticipated one. Reflection is probably

Starting point is 00:12:49 second most. For me anyway. Shuai Mu from New York created a Rust-style borrow checker for C++. The author says, quote, Initially, all the checks are at runtime, which already eases some debugging issues for me. I also tried many static analysis tools, including CppCheck, Clang, ClangTidy,

Starting point is 00:13:20 and MSCC, the most recent one with lifetime support. I had high hopes for them, but then I found they mainly support single function or file level check. Or in other cases like MSVC, the checker would mark everything as false positives. The other day I came across Facebook's Infer, and it seems to have implemented a Rust-like lifetime checker. So I tested it with my borrow CPP and it seems to work well. It can accurately tell which line of code violates the rule." When a runtime borrow check fails in this library, the library triggers a null pointer dereference to cause runtime error.

Starting point is 00:14:05 Unfortunately, as a Redditor suggested, this is a problem. Null pointer dereference is undefined behavior in C++, which means compilers are free to interpret and optimize away code that causes it as they please. When this issue was raised, the author promised to use abort instead. The issue of handling potential UB by compilers spawned quite a discussion in the Reddit thread. The author suggested that compilers warn about UB they find in code. Quote So I think just from the case of memory management,

Starting point is 00:14:49 many UB are not intentional and maybe they are just bugs. The right thing the compilers should do is to warn them instead of optimize on them. End quote. And this following reply addresses this idea and explains why it's impossible. There are switches in compilers to try and do that. Search for mention of hardening or for sanitizers. Some checks are relatively cheap, most are not, however. Warning about UB, however, is otherwise nigh impossible. In the middle layer

Starting point is 00:15:28 of a compiler UB is normal. There is an assumption that the front-end will have created an intermediate representation where UB is only in paths that cannot be reached during execution, which the front-end knows from higher-level semantics. Optimizers are incredibly dumb. They are composed of hundreds of very simple, very focused analysis and transformation passes. And faced by the emerging behavior of the pipeline, it may look like they are smart or annoying, but really each path is fairly dumb and so is the whole."

Starting point is 00:16:12 The commenter suggested reading articles by Chris Lattner, the main author of LLVM, on how UB helps optimization and how big of a minefield it creates. This is part 1 of the series. Quote. Undefined behavior exists in C-based languages because the designers of C wanted it to be an extremely efficient low-level programming language. In contrast, languages like Java and many other safe languages have eschewed undefined behavior because they want safe and reproducible behavior across implementations

Starting point is 00:16:55 and willing to sacrifice performance to get it. While neither is the right goal to aim for, if you're a C programmer, you really should understand what undefined behavior is." Chris provides the following examples of UB. Use of an uninitialized variable being UB helps optimization, as Java-like zero-initialization guarantee would be too costly. Signed integer overflow being UB helps optimization, guarantee would be too costly. Signed integer overflow, being UB, helps optimization, like loops. It can be treated as defined by using fwrapv switch in Clang and GCC.

Starting point is 00:17:40 Oversized shift amounts is UB as it behaves differently on various CPUs. Dereferencing bad pointers out-of-bounds array access is another example of unavoidable UB. To prevent this, each array access would have to be checked, and each pointer would have to carry size informationside it, thus breaking C-ABI. Dereferencing a null pointer is UB and not necessarily a crash. If you want a crash, dereference a volatile null pointer when using Clang. Violating type rules is UB, like type-punning using any type other than char pointer. Chris illustrates this with the following example optimization. This is the code before.

Starting point is 00:18:39 We have a function 0 array and a global float pointer. Inside the function we have an int i and then a for loop which goes from i equals 0 to 10,000 and assigns 0 to each of the elements of the P array, but it's just a pointer to float. This is before optimization. converts this to a memset of 0 of size 40,000 into the pointer p, because it assumes that p is not null, and it can be dereferenced, and the loop is converted into a memset of 10,000 4 bytes stores. Let's look at part 2, or as Chris calls it, why undefined behavior is often a scary and terrible thing for C programmers. Reordering different optimizations can produce baffling results

Starting point is 00:20:07 when you want to rely on say a null check and the compiler decides nah I'm good don't need it. So this is the code before we have a function called contains nullCheck which takes an int pointer. The first line of the function assigns dereferenced pointer to an integer called dead. The next line checks that the pointer is not null and if it is null it returns from the function. The last line of the function assigns 4 to the memory pointer2 by the pointer. That's the parameter. So in this example, the code clearly checks for the null pointer. And if the compiler happens to run dead code elimination before a redundant null check elimination pass, then we see the code evolve over two steps. The first step would be the assignment to an int dead of the dereferenced pointer would be deleted by the optimizer,

Starting point is 00:21:28 and then the null check would not be redundant and would be kept. However, if the optimizer happens to be differently structured, it could run those two checks in reverse order, which would mean that it would see that the pointer was dereferenced on the first line, and so it must not be null. And so the check for if the pointer is null will be always false. So it can be eliminated. And because the last line assigns 4 to the memory pointed to by the pointer, the first line is redundant and will also be removed. So the function is reduced to just assigning 4 to the dereferenced pointer, which if the pointer is null will lead to a crash.

Starting point is 00:22:35 So undefined behavior dependent optimization can allow security exploits due to buffer overflows, like in the code where various checks are optimized out because compilers think they are UB and that can never happen. Some hard code developers debug optimized, which often doesn't make sense. In this case, it's advisable to disable optimizations with O0 to still be able to debug release builds. Then there is a worrisome aspect of UB that changing or upgrading compilers can expose new latent bugs because of changing memory layout

Starting point is 00:23:26 or different compiler behavior. Even worse, there is no reliable way to determine if a codebase contains UB. But there are some tools that can help with that. Valgrind. Pronounced Valgrind, not Valgrind. Memcheck tool. It's limited because it's quite slow. It can only find bugs that still exist in the generated machine code. So it can't find things the optimizer removes. And doesn't know that the source language is C. So it can't find shift out of range

Starting point is 00:24:06 or signed integer overflow bugs. Clang has an experimental switch that I didn't know about. It's called... Let's see if I can find it. Fcatch undefined behavior. It inserts runtime checks for certain types of UBE, but slows down execution. Clang also has the switch ftrapv, which makes signed integer overflows trap at runtime. The Clang Static Analyzer can detect many bugs and is built into Apple Xcode. It is also available as a separate tool.

Starting point is 00:24:53 An experimental project called Cli from LLVM can produce a test case for a piece of code. It's a symbolic execution engine which, I guess, analyzes your code but doesn't actually execute it, which is really magical. And there is also the C semantics tool which can detect some some UB at runtime. In part 3, Chris explains why warning about UB at compile time is impossible. Quote, The challenges with this approach are that it is 1. likely to generate far too many warnings to be useful because these optimizations kick in all the

Starting point is 00:25:46 time where there's no bug. Two, it is really tricky to generate these warnings only when people want them. And three, we have no good way to express to the user how a series of optimizations combined to expose the opportunity being optimized. He presents a hypothetical example UB warning. Quote, warning after three levels of inlining potentially across files with link time optimization, some common sub-expression elimination after hoisting this thing out of a loop and proving that these 13 pointers don't alias we found a case where

Starting point is 00:26:32 you are doing something undefined this could either be because there is a bug in your code or because you have macros and inlining and the invalid code is dynamically unreachable but we can't prove that it is dead. And then Chris says,

Starting point is 00:26:50 unfortunately, we simply don't have the internal tracking infrastructure to produce this, and even if we did, the compiler doesn't have a user interface good enough to express this to the programmer. End quote. So given this sad state of things, Chris suggests to use warning flags wall, wextra as a way to detect more bugs at compile time. But his conclusion is not very uplifting.

Starting point is 00:27:18 Ultimately, the real problem here is that CEE just isn't a safe language, and that despite its success and popularity, many people do not really understand how the language works." And this is the Facebook tool called Infer, which was mentioned. And it's a static analyzer for C, Objective-C, C++, and Java. There is a short introduction video. Infer supports many build systems and can be included in the build process. For C++ it requires that your code compiles with Clang, but will also work with GCC as

Starting point is 00:28:02 its front-end. It doesn't support Windows at this time. It's open-source on GitHub under an MIT license, and it's written in OCaml. That was the last thing for today. Now I'll leave you with this tweet. I'm doing a project about elderly programmers. If you are a programmer and over 25, please DM.

Starting point is 00:28:29 Alright, that's it. Thanks for joining me. Until next time. Bye!

C++ Club - Meeting 148

Notes: https://cppclub.uk/meetings/2022/148/Video: https://youtu.be/thD1mN_aqq8...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

C++ Club - Meeting 148

Notes: https://cppclub.uk/meetings/2022/148/Video: https://youtu.be/thD1mN_aqq8...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.