C++ Club - Meeting 148
Episode Date: June 27, 2022Notes: https://cppclub.uk/meetings/2022/148/Video: https://youtu.be/thD1mN_aqq8...
Transcript
Discussion (0)
Welcome to C++ Club. This is meeting 148 that took place on the 5th of May 2022.
We'll start with WG21 April mailing.
There are several updates for papers that we talked about previously and a few new proposals.
Let's quickly go through some of them.
Proxy.
A polymorphic programming library.
This paper by Mingxin Wang of Microsoft introduces a template-based notion of a proxy that combines object-oriented programming and functional programming to provide an efficient implementation
of polymorphism. It is supposedly capable of replacing virtual function mechanism entirely while being more
efficient. There is a header-only test implementation for C++20. The author illustrates
the proposal by comparing it against a standard polymorphic class hierarchy with the following abstract base class defining
an interface. Class iDrawable, virtual, void draw, pure virtual function, and then we have class
rectangle derived from public iDrawable, overriding function draw, and another class Circle, deriving
from public iDrawable, and overriding function draw for itself. Then we have a caller function,
do something with drawable that takes a pointer to iDrawable
the usual stuff
now architecting with the proxy you define a dispatch
called draw as a type
so it's a struct draw that inherits
stud dispatch of void return type
and then you have a function template that is an
operator function call which takes a self template parameter by const reference
and in the function body it calls drawOnSelf. After that you define a so-called facade
struct fDrawable derived from std facade of draw the template parameter is draw, it's an empty struct, and with that the actual implementation classes
can then be defined without using any virtual functions. So we have a class Circle with its own void draw function, and so on.
A function that uses those would be defined as doSomethingWithDrawable,
and it would take std proxy of fDrawable template argument by value and inside it would call p, which
is the proxy for your facade, p.invoke and for that invoke as a template parameter, it would specify the name of the function that you need to call.
So that would be pInvokeDraw and empty parenthesis.
Now let's say we need to add another function to the drawable so-called interface. Instead of adding a new virtual function to the base class
and overriding it in all the derived classes with proxy, we only need to add another dispatch. And,
for example, if we wanted to compute the area of a shape, We would say that another dispatch would be a struct area
derived from the dispatch of double function without parameters, and inside
we would have an operator function call template that would call area on the parameter self of the type
T of the template parameter. Then for that we would need to update our fDrawable facade
by adding another template parameter after the existing draw called area.
And after that, in the caller function,
we would be able to just call the facade and invoke the area function.
The author also describes a factory function use case.
Regarding his motivation, the author writes, quote,
Currently, the standard polymorphic wrapper types, including std function and std any,
are based on value semantics. Polymorphic wrappers based on value semantics have certain limitations in lifetime management
comparing to pointer semantics.
Designing the proxy library based on pointer semantics decouples the responsibility of
lifetime management from the proxy, which provides more flexibility and helps consistency
in API design without reducing runtime performance."
Interesting technique, and according to the author, the generated assembly is also better
than that of virtual functions.
The implementation uses type erasure and stores function pointers in stdunit pointer, which
unfortunately means there is a cost of heap allocation.
The author compares proxy to other similar libraries like dyno by Louis Dion,
which uses value semantics, and dynamic generic programming with virtual concepts Binary Programming with Virtual Concepts by Andrea Proli
So even if it's not accepted into C++, it could be implemented as a library, since it
doesn't require any new language features.
The test implementation doesn't currently build in Clang, as it lacks support of conditionally
trivial special member functions.
Next paper is called Structured Bindings
Can Introduce a Pack. This paper by Barry Revzin and Jonathan
Wakely proposes to allow the following syntax auto equals function call, which is fine, it's okay today.
But the new proposed syntax would be auto triple dot
xs equals function call.
xs is a pack of length 3 containing an x, y and z of a tuple returned from that function call.
There is a difficulty with implementation needed to support the following usage.
If we have a function called autoSumNonTemplate taking some concrete type tuple,
and inside the function we want to introduce a pack to brackets close equals tuple. And then we use
a folding expression to sum the elements and return them. The authors write, quote,
we have not yet in the history of C++ had this notion of packs outside of dependent contexts. This is completely
novel and deposes a burden on implementations to have to track packs
outside of templates where they previously had not."
There is a test implementation in Clang and Compiler Explorer.
Hash embed a scannable tooling friendly binary resource inclusion mechanism.
This proposal by Jean-Huidh Meneed resurfaced after a pause, raising hope to
be able to easily include binary data in programs. The syntax would be very simple.
You would declare and define an array of unsigned char, and in the curly braces, instead of
initializing that array, you would write hash embed followed by a file name which would be read into that array at
compile time and convert it to an array of unsigned char. To remind you of the history of this proposal
it's an evolution of the initial std embed feature that was supposed to be a library function,
but the author decided to go with a preprocessor directive-based feature as a start.
Hopefully we get it now in C++ 26, most likely.
std execution gets another update addressing some of the issues and review comments. Just look at this example code implementing recursive file copying.
Is not beautiful?
I'm still scrolling.
Still scrolling.
Still scrolling.
Yeah. Still scrolling. Still scrolling.
Yep.
Still scrolling.
Around 400 lines.
Next one is
equals delete should have a reason.
This proposal by
Yi He Li
proposes to add a message to equals delete so that in case of usage the error message is more meaningful. Yeah, I guess it could be better than
just a comment.
This paper by Daniel Rosso of Bloomberg proposes a mechanism to allow a pre-built library to specify which modules it provides to clients
by distributing metadata files alongside library binaries to use them in client linker commands. Quote, this specification may become obsolete by a wider scale convergence in the area of
package management in the C++ ecosystem.
I'm not holding my breath.
Next we have a comment in the Reddit thread for the mailing.
Quote, man, reflection really fell off the mailing. Quote Man, reflection
really fell off the map.
There was a lot of activity for a while.
To which
Niall Douglas replies, quote
There should be
a whole bunch more activity soon.
The Standard C++
Foundation have been funding
a dedicated developer in this area
since last year, I think. Just like how the development of ranges was funded. It just takes time to
bake the cookie, that's all." End quote. Of all the future C++ features,
reflection is my most anticipated one.
No, actually pattern matching is most anticipated one. Reflection is probably
second most. For me anyway.
Shuai Mu from New York created a Rust-style borrow checker for C++.
The author says,
quote,
Initially, all the checks are at runtime,
which already eases some debugging issues for me.
I also tried many static analysis tools,
including CppCheck, Clang, ClangTidy,
and MSCC, the most recent one with lifetime support.
I had high hopes for them, but then I found they mainly support single function or file level check.
Or in other cases like MSVC, the checker would mark everything as false positives.
The other day I came across Facebook's Infer, and it seems to have implemented a Rust-like lifetime checker.
So I tested it with my borrow CPP and it seems to work well.
It can accurately tell which line of code violates the rule."
When a runtime borrow check fails in this library,
the library triggers a null pointer dereference to cause runtime error.
Unfortunately, as a Redditor suggested, this is a problem.
Null pointer dereference is undefined behavior in C++,
which means compilers are free to interpret and optimize away code that causes it as they please.
When this issue was raised, the author promised to use abort instead.
The issue of handling potential UB by compilers spawned quite a discussion in the Reddit thread.
The author suggested that compilers warn about UB they find in code.
Quote
So I think just from the case of memory management,
many UB are not intentional and maybe they are just bugs.
The right thing the compilers should do is to warn them instead of optimize on them.
End quote.
And this following reply addresses this idea and explains why it's impossible.
There are switches in compilers to try and do that.
Search for mention of hardening or for sanitizers.
Some checks are relatively cheap, most are not, however.
Warning about UB, however, is otherwise nigh impossible. In the middle layer
of a compiler UB is normal. There is an assumption that the front-end will have created an intermediate
representation where UB is only in paths that cannot be reached during execution, which the front-end knows from higher-level semantics.
Optimizers are incredibly dumb.
They are composed of hundreds of very simple,
very focused analysis and transformation passes.
And faced by the emerging behavior of the pipeline,
it may look like they are smart or annoying,
but really each path is fairly dumb and so is the whole."
The commenter suggested reading articles by Chris Lattner, the main author of LLVM, on
how UB helps optimization and how big of a minefield it creates.
This is part 1 of the series.
Quote.
Undefined behavior exists in C-based languages because the designers of C wanted it to be
an extremely efficient low-level programming language.
In contrast, languages like Java and many other safe languages have eschewed
undefined behavior because they want safe and reproducible behavior across implementations
and willing to sacrifice performance to get it. While neither is the right goal to aim
for, if you're a C programmer, you really should understand what undefined behavior is."
Chris provides the following examples of UB.
Use of an uninitialized variable being UB helps optimization,
as Java-like zero-initialization guarantee would be too costly.
Signed integer overflow being UB helps optimization, guarantee would be too costly.
Signed integer overflow, being UB, helps optimization, like loops.
It can be treated as defined by using fwrapv switch in Clang and GCC.
Oversized shift amounts is UB as it behaves differently on various CPUs.
Dereferencing bad pointers out-of-bounds array access is another example of unavoidable UB.
To prevent this, each array access would have to be checked,
and each pointer would have to carry size informationside it, thus breaking C-ABI.
Dereferencing a null pointer is UB and not necessarily a crash.
If you want a crash, dereference a volatile null pointer when using Clang.
Violating type rules is UB, like type-punning using any type other than char pointer. Chris illustrates this with the following example optimization.
This is the code before.
We have a function 0 array and a global float pointer. Inside the function we have an int i and then a for loop
which goes from i equals 0 to 10,000 and assigns 0 to each of the elements of the P array, but it's just a pointer to float.
This is before optimization. converts this to a memset of 0 of size 40,000 into the pointer p, because it
assumes that p is not null, and it can be dereferenced, and the loop is converted into a memset of 10,000 4 bytes stores.
Let's look at part 2,
or as Chris calls it,
why undefined behavior is often a scary and terrible thing for C programmers.
Reordering different optimizations can produce baffling results
when you want to rely on say a null check and the compiler decides nah I'm good don't need it.
So this is the code before we have a function called contains nullCheck which takes an int pointer. The first line of the function assigns
dereferenced pointer to an integer called dead. The next line checks that the pointer is not null
and if it is null it returns from the function. The last line of the function assigns 4 to the memory pointer2 by the pointer.
That's the parameter.
So in this example, the code clearly checks for the null pointer. And if the compiler happens to run dead code elimination before a redundant
null check elimination pass, then we see the code evolve over two steps. The first step would be
the assignment to an int dead of the dereferenced pointer would be deleted by the optimizer,
and then the null check would not be redundant and would be kept.
However, if the optimizer happens to be differently structured,
it could run those two checks in reverse order, which would mean that
it would see that the pointer was dereferenced on the first line, and so it must not be null.
And so the check for if the pointer is null will be always false. So it can be eliminated. And because the last line
assigns 4 to the memory pointed to by the pointer, the first line is redundant and will also be
removed. So the function is reduced to just assigning 4 to the
dereferenced pointer, which if the pointer is null will lead to a crash.
So undefined behavior dependent optimization can allow security
exploits due to buffer overflows, like in the code
where various checks are optimized out because compilers think they are UB and that can never
happen. Some hard code developers debug optimized, which often doesn't make sense.
In this case, it's advisable to disable optimizations with
O0 to still be able to debug release builds.
Then there is a worrisome aspect of UB that changing or upgrading compilers can expose
new latent bugs because of changing memory layout
or different compiler behavior. Even worse, there is no reliable way to determine if a codebase
contains UB. But there are some tools that can help with that. Valgrind. Pronounced Valgrind, not Valgrind.
Memcheck tool.
It's limited because it's quite slow.
It can only find bugs that still exist in the generated machine code.
So it can't find things the optimizer removes.
And doesn't know that the source language is C.
So it can't find shift out of range
or signed integer overflow bugs.
Clang has an experimental switch that I didn't know about.
It's called...
Let's see if I can find it.
Fcatch undefined behavior.
It inserts runtime checks for certain types of UBE, but slows down execution.
Clang also has the switch ftrapv, which makes signed integer overflows trap at runtime. The Clang Static Analyzer can detect many bugs and is built into
Apple Xcode. It is also available as a separate tool.
An experimental project called Cli from LLVM can produce a test case for a piece
of code. It's a symbolic execution engine which, I guess, analyzes your code but doesn't actually
execute it, which is really magical. And there is also the C semantics tool which can detect some some UB at runtime. In part 3, Chris explains why warning about UB at compile time is impossible.
Quote,
The challenges with this approach are that it is
1.
likely to generate far too many warnings to be useful
because these optimizations kick in all the
time where there's no bug. Two, it is really tricky to generate these warnings only when people want them.
And three, we have no good way to express to the user how a series of optimizations combined to expose the opportunity being optimized.
He presents a hypothetical example UB warning.
Quote, warning after three levels of inlining potentially across files with link time optimization,
some common sub-expression elimination after hoisting this thing out of a loop
and proving that these
13 pointers don't alias
we found a case where
you are doing something undefined
this could either be because there is
a bug in your code or because you
have macros and inlining and the
invalid code is dynamically unreachable
but we can't prove
that it is dead.
And then Chris says,
unfortunately, we simply don't have the internal tracking infrastructure
to produce this, and even if we did,
the compiler doesn't have a user interface good enough
to express this to the programmer.
End quote.
So given this sad state of things,
Chris suggests to use warning flags wall, wextra as a way to detect more bugs at compile time.
But his conclusion is not very uplifting.
Ultimately, the real problem here is that CEE just isn't a safe language,
and that despite its success
and popularity, many people do not really understand how the language works."
And this is the Facebook tool called Infer, which was mentioned.
And it's a static analyzer for C, Objective-C, C++, and Java.
There is a short introduction video.
Infer supports many build systems and can be included in the build process.
For C++ it requires that your code compiles with Clang, but will also work with GCC as
its front-end.
It doesn't support Windows at this time.
It's open-source on GitHub under an MIT license,
and it's written in OCaml.
That was the last thing for today.
Now I'll leave you with this tweet.
I'm doing a project about elderly programmers.
If you are a programmer and over 25, please DM.
Alright, that's it.
Thanks for joining me.
Until next time.
Bye!