CppCast - SonarSource Analysis Tools
Episode Date: May 7, 2021Rob and Jason are joined by Loïc Joly from SonarSource. They first discuss compiler updates in GCC and MSVC as well as survey results of most used C++ features. Then they talk to Loïc about the Sona...rSource static analysis tools for C++, what sorts of bugs they discover, and what goes into creating a new analysis rule. News VS 2019 STL is C++20 feature complete GCC 11.1 Released Meeting C++ survey results: the most popular C++ standard features Links SonarSource The NeverEnding Story of writing a rule for argument passing in C++ Sponsors C++ Builder
Transcript
Discussion (0)
Episode 298 of CppCast with guest Loic Jolie, recorded May 5th, 2021.
This episode is sponsored by C++ Builder, a full-featured C++ IDE for building Windows
apps five times faster than with other IDEs.
That's because of the rich visual frameworks and expansive libraries.
Prototyping, developing, and shipping are easy with C++ Builder.
Start for free at
Embarcadero.com.
In this episode, we discuss compiler updates and survey results.
And we talk to Leroy Fischoli.
Love talks to us about the SonarSource suite of static analysis tools. Welcome to episode 298 of CBPCast,
the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
All right, Rob, how are you doing?
Doing all right.
Do you want to put in a plug for your latest C++ Weekly video?
I know it's been getting a lot of views, right?
Yeah, I just thought I'd pull up the stats just out of curiosity.
So I did an episode called Break the ABI to save C++,
which I haven't gotten any hate mail yet directly towards me.
And it's been up since what's today.
Today's Wednesday.
So it's been up for 48 hours.
I've got 11,000 views so far,
which is a little over two times my normal view rate.
Okay.
So yeah, we'll see what happens with that. Yeah, and it's obviously something we've been talking
about a lot on this show for a while now, and
we were talking right before the show how we might try to have someone with
the opposing viewpoint on soon. Right. Yeah, I mean,
our regular listeners may have noticed that I have been conducting an informal poll
for the last several years.
Pretty much every guest I ask and they all say we haven't had anyone say, no, we need
ABI stability for our project or whatever.
Right.
Okay.
Well, at the top of every episode i cured piece of feedback and and on that subject uh this
tweet is from uh nick gully and he wrote uh cpp cast how about some new gear a ceramic mug with
a giant abi on the side for us to mull over breaking and i kind of like that idea i might
put that together yeah oh that's terrible you had like a party everyone smashes their abion
don't they wasn't there a book that had like you know it was titled like steal this book or
something like that we could have a mug and say break this mug right maybe on the other side
something like that maybe i think weird l one of his more recent releases also was like steal this
song or something like that.
Alright.
Well, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook, Twitter, or email us at
feedback at cppcast.com. And don't forget
to leave us a review on iTunes or subscribe
on YouTube. Joining us today
is Loic Jolie. Loic is
a C++ coder, speaker, teacher, and
expert. He represents France on the ISO
C++ Standardization Committee
and is also a member of the committee drafting the next version
of the MISRA C++ Standard for Safety Critical Systems.
In addition to developing in C++, he has a special interest in teaching it
and spreading good practices across the community.
He's a frequent speaker at meetups and conferences
and teaches at Telecom Sud Paris.
Since he joined SonarSource in 2018,
he has worked on static analysis for C++,
both specifying rules to help other developers
and having the fun of implementing them.
Loic, welcome to the show.
Hello, welcome, and thank you for having me.
And so do you think that we should be willing
to break the ABI for C++ in the near future?
Well, I must say that every concern about the ABI
are a little bit foreign to me
because I always rebid my code from source.
And I'm mostly a Windows user
when it's typical to package the libraries
at the same time as you package your product.
I don't work with shared libraries
across different products.
So for me, ABI is almost a non-existing concern.
And yet I have to pay for
the things that I don't really use. Oh, that's an ironic way of looking at it with the, yeah,
don't pay for what you don't use mentality of C++. Well, and it's interesting too, since you
mentioned that you're a Windows developer, because when I released this video, I got a lot of
feedback from people that are like, what are you even talking about? We don't have ABI stability.
I saw some of those comments.
I was very surprised by that since it's been such an ongoing discussion.
There were a lot of people who are still very unaware of what it is and why it's important.
And I think that now in Windows, it happens.
Because previously, when you switch from a version of Visual Studio,
you just had to recompile everything.
And now they try to introduce stability between different versions of Visual Studio, you just had to recompile everything. And now they try to
introduce stability
between different versions of Visual Studio.
So I think
it's a little bit moving into
the direction of stable ABI, even
if it's not totally there
yet. But it still seems like almost
every Windows developer just assumes, if I upgrade
my compiler, I have to rebuild the world.
Like, they still assume that. So, I mean, if everyone's already in I have to rebuild the world. Like they still assume that.
So, I mean, if everyone's already in the place to rebuild the world, then let's just go ahead and rebuild the world.
That's what I say.
I know that I may not have to do it, but I will feel more confident if I do it anyway.
Right.
It's like the very first thing you do, right?
You have a weird bug that you can't diagnose.
What do you do?
Rebuild. Right. Because you just assume there's a link problem somewhere at some point
all right well look we have a couple news articles to discuss uh feel free to comment on any of these
and we'll start talking more about uh the work you're doing a sooner source okay sure all right Sure. All right. So this first one, speaking of Visual Studio, the 2019 16.10 preview is, I believe, out now.
And this is their changelog announcing that it is going to be the STL is going to be C++ 20 feature complete, which is pretty exciting that we already have at least all the library features implemented one of the major
compilers and there's an ironic problem here what is that i've been following this along on twitter
and i got some clarification from victor on it um there's basically a bug in std format where one
part of it accidentally relies on locale and Instead, format is supposed to be locale independent.
They want to change that as a real quick fix to C++20, theoretically.
But Visual Studio has already promised ABI compatibility with their standard library implementation.
So if they were to fix that in the standard, Visual Studio can't fix it,
and it would still be stuck with locale dependence.
Hmm.
Interesting.
Yeah.
Anything else either of you want to highlight?
Anything you're looking forward to being able to use
in the new version?
Well, just one point to me.
It's a little bit strange to focus on full library support
and full language support, because now I think that
both of them are entangled in some way. And what really matters for me is full support, period.
But I know it's a good step. I'm pretty happy. I'm excited by it. But I really think that
I'm not sure to which point full library supports totally work without full language support.
They're pretty close, though.
If you look on CVP reference, Visual Studio's language support is really, really close for C++20 also.
What are they still waiting on?
Do you have that?
I don't recall.
I'd have to look back.
All right.
Well, in other news, we have an update for another one of the compilers.
And this is GCC 11.1 is being released.
And they're also announcing a lot of great support with C++ 20.
They're changing their default language version up to C++ 17, which is great.
And it sounds like they specify exactly how far along they are with 20,
or they're just saying that they're making good progress?
They say go to their status page for some of these things.
I went to CBP Reference
to look at their version of the compiler support chart
for C++20 GCC.
And it's interesting because GCC is, let's see,
they're in the green for every single language feature
except modules.
Oh, okay.
Yeah, that's very good.
Yeah.
Which is interesting because Clang had the tendency
to be ahead on these things.
But if you look at this comparison chart right now,
GCC and Visual Studio are way ahead of Clang for C++20 support.
Yeah.
It's a pattern that I've noticed also, and I'm even a little bit worried about it,
because it seems to show that there is a kind of slowdown in Clang.
It used to be first for language features, and now it's...
I really rely on Clang for my daily for my daily work not as a not only
as a compiler but also as the libraries that we use as a basis to do static code analysis
so i mean it's a wonderful tool for that but the fact that they are lagging behind Visual Studio and behind the GCC is not a good sign. I hope it's just some
punctual reasons and not a global pattern. Because right now, for example, for modules,
I don't see many stuff in Clang at all. Even GCCs are not complete with modules, but they have the
basics done and there are some few corner cases to handle, but they already are pretty good.
Yeah.
Yeah, I'm sorry.
I'm just scrolling through this list of comparisons again.
Yeah.
And some of the things that are like lambdas in unevaluated context, which sounds small,
but is huge to library developers.
It doesn't even list as partial support in clang yet
hopefully they'll be catching up soon i do want to comment on one other thing before we move past
gcc that one of these notes in the gcc 11 so first of all i was so confused apparently 11.1
is the first official release of the gcc 11 series oh there wasn't an 11.0? Not that I can find. Maybe I'm nuts.
But I think it's the first official release in the GCC 11 series.
But they mention here hardware-assisted address sanitizer support.
Apparently, the 64-bit ARM target has the ability somehow to do hardware-assisted address sanitizer support.
So it uses considerably less RAM.
It's in a testing phase and can only be used at the Linux kernel right now,
but I found that very interesting.
Yeah, sounds interesting.
Okay, and then the last thing we have
is meeting C++ survey results,
and this was a survey on the most popular features of C++.
And what's interesting is he shows last year's survey results
side by side with this year's results.
So you can see how some features
might get used more over the past year
as developers have had more time
to get accustomed to using these features.
And I think actually there were a couple
that looked like they had gone down
just a little bit.
But most of the usage of various features seems to be pretty consistent from year to year.
So I thought that was interesting. Yeah, that's one of the main things that stood out to me was
that the histograms are almost identical from year to year, which I just really didn't expect that.
But yeah, just a couple of little little things here there seems
there was more people reporting that they're not using c++ 14 which is maybe just a different group
of people that responded to that question i don't know but or they're upgrading to 17 right i guess
maybe they misunderstood the question and they're like no i'm not using not using 14. I'm using 17. I don't know. Right. Okay.
So Loic, we've talked a lot about different static analysis tools
on the podcast before,
but I don't think we've discussed SonarQube before.
So do you want to start off by maybe just telling us a little bit
about the SonarQube analysis tool?
Well, I'm not going to talk only about SonarQube,
but about all the tools that we do at SonarSource, which are basically three tools.
But they work together.
And if you want to have a full story of what we propose, we have to talk about all of them.
So we have SonarLint, SonarQube, and SonarCloud.
SonarLint is the tool that runs in the IDE.
So it's your first line of defense. And then we have SonarQube and SonarCloud,
which work more in the continuous integration phase. So SonarQube is something that is self-hosted,
while SonarCloud is something on the cloud. But basically, it's the same set of features,
and it allows you to analyze your code during pull requests, for example, and to block a pull request if it's not meeting some quality criteria.
And you can also see the history of the code,
how many new issues you have introduced, and this kind of stuff.
I'm sorry, please go ahead.
So, yes, our goal is to be tightly integrated into the development cycle
to make a tool for developers.
So that's why we are in the inner loop with the IDE,
and we are in the slightly outer loop with pull requests.
And that's where we are positioned.
With SonarLint, you said that's an IDE tool.
Is that available for different IDs as a plugin?
Which ones does it support?
Well, it supports different ID depending on the language.
I'm going to focus on C++ here because that's why we're there.
So for C++, we support Visual Studio.
And since very few times, maybe one week or two,
we also support CLion.
Okay.
So it's brand new.
And basically, it's the same rules that run inside of the IDE
and inside of SonarQube, SonarCloud.
The main difference is that some rules that require full project knowledge
are not going to work in the IDE,
because in the IDE, we work on a translation-in basis.
Okay.
I'm kind of curious, like, what kind of tools
require full visibility of the project?
Like unused function.
If you don't have a full project, unused function is obviously not doable.
That's interesting because I just spent a week or so with one of my clients removing about 100,000 lines of code that were unused.
Considerable performance as pile time savings for
us on that project. And it's not so easy to detect that something is unused in C++. It's
very tricky because you have some functions that are not used, but you still don't want to remove
them. For example, if you have a class which is a container, maybe you want to remove them. For example, if you have a class which is a container,
maybe you want to have begin and end.
And even if the user is not using them right now,
you still want to provide begin and end,
because it just makes sense to make your class compatible
with other similar classes.
So this notion of a new function is not a clear cut,
I would say.
So that is an interesting one that came up in the project I was working on
because some of the functions were used, but only by the test suite.
So I had to disable the test suite for some of my detection
because if it's only used in the test suite,
for our particular case, I didn't care.
I wanted to remove it.
Makes sense. So does your tool actually say it is completely unused or can i
get a report that says it is used only in this one place or something like that no it's going to
only going to tell you if it's totally unused we we try to how to... we are not providing a tool to help you explore your code.
We are just focusing on raising issues and trying to explain the issues. So for
example, if we have a rule about the rule of five, so we are going to point
the class and you are going to say, okay look here you have defined a destructor,
here you have defined a copy constructor, but you forgot something. So we are going to
show you all the interesting locations that are useful to understand what's
the problem, but you are not providing a tool to manually explore your code base.
Okay.
So as you said, we're obviously on a C++ podcast.
We're going to focus on C++.
But do you want to tell us about what other languages Sonar Source Analysis works for?
I think we have 27 languages
but the main languages we focus on are java we started as a java company mostly so we are very
well known in the java ecosystem and then we also have tools for C Sharp, Python, JavaScript, Apps Script. And I forgot many of them.
That many, okay.
I think I mentioned the most important ones.
And if I forgot some of them,
some of my colleagues are going to kill me tomorrow.
So we'll see.
I actually, coincidentally, as we were discussing this interview,
one of my friends shared with me an issue that SonarQube had found in their C Sharp code base.
That was, I don't know, I can't remember the detail of it, but it was particularly insidious.
Oh, it was something equals minus something instead of something minus equals the other value.
And it was, yeah, particularly insidious in their code.
And SonarQube found it.
So that brought it to my awareness recently.
And I know that, for example,
since we are very well known in the world of Java,
some people who work on the JVM
also use our tool on C++
and discover interesting stuff in the JVM.
Oh, kind of bring it full cycle that way.
Yeah.
I'm curious, I see here that it says,
because I brought up the SonarLint page after you mentioned that and looking at the CLion plugin, it says SonarLint is open source. It's an open source project?
Well, SonarLint, which is the integration inside of the IDE is open source, but the core of the
C++ analyzer is closed source.
Okay.
However, it's available for free in Sonalin.
So if you analyze code in IDE, it's available for free.
It's also available for free
if you do Sonar Cloud on an open source project.
Okay.
But if you want to do Sonar Cloud on a closed source project
or if you want to do C++ analysis in Sonar Cube,
you have to pay for it.
And so it's based on the
number of lines of code in the project.
Oh, that's interesting.
Model. Okay. So are there
specific types
of bugs that the SonarSource
analysis tools are particularly
good at finding?
Well, we have a
full range of issues in
many different directions.
And so I think that, at least from my point of view,
where we have a lot of value compared to other tools that exist,
at least when we try to invest, we have a lot of value,
is about rules, not only about detecting a bug,
but rules that push forward good ways of programming. So let me give you a
classical example. We can have a rule about double delete or about missing a delete and memory leak.
It's pretty hard to have this kind of rule. Everybody has it, but to have a good rule for
that, it's pretty difficult because you have to follow all the paths inside of your code. And then
as soon as you have a function call, you don't know in which condition it could be called, so you have to do
very complex stuff. But we are developing C++. What we should do is not do any raw memory
allocation. We should use unique pointers. And it's pretty easy to do a rule that detects
just that you allocated memory without storing it immediately into a unique
pointer or shell pointer on the class like that. And it adds... this is almost so simple that you
don't need a static analysis to do it, but there are some other rules of the same category where
by pushing forward for the good practices, you prevent bugs indirectly. And at the same time,
you also try to have rules that detect
the bug when you didn't follow those
good practices. But I really think that
this is one differentiating
point between O2 and, for example,
compiler warnings, because
compilers now are doing lots
of warnings, very interesting ones,
but they really try to warn only
when the code is
almost certainly broken. We try to warn only when the code is almost certainly broken.
We try to raise issues when the code is badly written.
So that's a difference.
It's really valuable, yeah.
And as I mentioned, for example, we have some rules about rule of five or rule of zero.
The code could perfectly be correct without following those rules, but it's so much more
simple to read code that follows them that we do that.
Okay.
That's interesting.
I think you said you aim towards a rule like
don't allocate memory without immediately putting it
into a unique pointer or something like that.
And just out of curiosity,
do you also have rules that try to detect misuse
or abuse of smart pointers,
like manually calling delete on the pointer
returned from a unique pointer
or unintentional copies of a shared pointer
or something like that?
I don't think we have the one you mentioned,
but for example, we have one rule,
which is you should not pass a unique pointer
by const reference.
Oh, okay.
Because if you do, you should just pass the pointer,
your function will have a broader interface
and it will be just the same.
Right, yeah, yeah, exactly.
So we have a few rules like that.
Okay, interesting.
I already asked about the other languages that Sonar Source supports.
Does it run on other platforms as well?
I know you mentioned the Visual Studio plugin.
Does it run on Windows?
Does it run on Linux and Mac as well?
Yes, the core of Sonar Source runs on Windows,
Linux and Mac OS.
And then for SonarLint, it really depends.
For example, I know that for Java,
it runs on Eclipse and VS Code also.
And I don't have the full matrix
of which language is supported in which ID.
But yes, basically we try to support
the major ones, obviously.
Okay.
And a point I forgot to mention,
an important difference between SonarLint
and SonarQube, SonarCloud is SonarLint
is basically more for the developer himself,
while SonarCloud and SonarQube are more for a team. So for Sonar Cube
and Sonar Cloud you have some rules that are going to be followed by the full team. And so you have
one settings of rule that is going to be shared on each pull request. And there is also in Sonar
Lint what we call a connected mode, which is if you also run Sonar Cube or Sonar Cloud,
you can synchronize with it. So that for example if in Sonar Cube or Sonar Cloud, you can synchronize with it
so that, for example, if in
Sonar Cube you say that a rule
is a false positive for you,
it's not an issue, you decided that, okay,
in this case, I don't want this rule,
we will stop reporting it in the IDE
also. Oh, nice.
That is interesting.
So those tools are not totally separated, they really
work together. So he says, Son totally separated. They really work together.
So he says Sonar Cloud is available for open source projects.
Is that right?
Well, it's available for open or closed source,
and it's free for open source.
Free for open source, okay. Is there any limitations on the open source analysis?
It's exactly the same version.
Okay.
I'm just thinking about how to go about enabling it
on this other project that I've been working on.
Yeah, how easy is it to set it up for an open source project if you're interested?
Well, if you want to set up Sonar Cloud or Sonar Cube, you have to add some steps into the build system because, as you know, C++ build systems are awfully complex. So what we do if we have a program that is going to eavesdrop what happens during a build
and save all information about which file was compiled with which option, which include path and everything.
So it just detects when we call the compiler itself and it saves it in the file.
So now we have all this information and we can use it to correctly
analyze the code with the right parameters. So the way to configure this analysis is first to
modify slightly your build system so that when you build you are wrapped inside of this additional
program that will save the data. It's like a compilation database of Clang if you know this,
except that it can be created with any kind of build system.
And then we use this kind of compilation database
to do the analysis in the second step
and to upload the results on SonarQube and SonarCloud.
It's a little bit of work to integrate it into your CI,
but once you are used to it, it's just a few lines of script.
Do you have a quick start?
If you're using a CMake project, then this is what you need to do
kind of documentation? We have some quick start guides, and we have some simple
projects that you can look at.
I want to end up the discussion for just a moment to bring you a word from our sponsor, C++ Builder.
The IDE of choice to build Windows applications five times faster while writing less code.
It supports you through the full development lifecycle to deliver a single-source codebase
that you simply recompile and redeploy. Featuring an enhanced Clang-based compiler,
Dyncomware STL, and packages like Boost and SDL2 in C++ Builder's Package Manager,
and many more. Integrate with continuous build configurations quickly with MSBuild,
CMake, and Ninja Support, either as a lone developer or as part of a team.
Connect natively to almost 20 databases like MariaDB, Oracle, SQL Server, Postgres, and
more with FireDax high-speed direct access.
The key value is C++ Builder's frameworks, powerful libraries that do more than other
C++ tools.
This includes the award-winning VCL framework for high-performance native Windows apps and
the powerful FireMonkey framework for cross-platform UIs.
Smart developers and agile software teams write better code faster
using modern OOP practices and C++ Builder's robust frameworks and feature-rich IDE.
Test drive the latest version at Embarcadero.com.
So you mentioned in your bio that while working on the static analysis, you specify
rules and helping other developers implement them.
What kind of work goes into specifying new rules for the SoonerSource analysis tools?
How do you go about that?
Well, sometimes it's very complex and there are some rules which took us more time to specify
than to develop.
It can seem surprising, but I even wrote a blog article on one of them because it took
so much effort that I said, okay, I'm not going to waste this effort just to write a
rule.
I'm also going to write a blog article to explain what are the different steps in the rule. It's a rule which is simple. Should you pass by
copy or by cross-reference? It's very simple. You probably have your own algorithm in your head
about what should you do. But if you try to implement this kind of algorithm automatically,
then it becomes much more complex. For example,
how do you decide if a type is expensive to copy or not? What's the criterion? Is it just the size
of the type? If you look at CppCore guidelines, for example, you will see that they suggest having
a rule like that, only focusing on the size of the type. But you can have small types which are
expensive to copy because they own external resources. But you can have small types which are expensive to copy
because they own external resources.
But if you say, for example, that you want to look
at the copy constructor to see what it does,
it means that you have to have it available.
So it goes against the...
What we like to do is to be able to analyze
the translation you need in isolation.
We try to avoid
full project analysis if we can because it's much more expensive, it's more complex to set up. So,
if you are just looking at one translation unit, you may not have the body of the copy
constructor, so you may not know if it's doing some expensive operations or not.
So, we try to develop this rule and we developed it maybe
three or four times and right now we are in the process of upgrading it once again. Because
it's very simple, it's already difficult in fact to have guidelines that are for human
beings. But when you try to apply the guidelines by computer, it's every, every dark corner,
you have to tackle it and to decide,
okay, am I going to raise a violation here or not?
And we try very hard not to raise false positives
because false positive are a pain for the developer.
It's just going to waste their time.
But at the same time, at some times we need to, to do something because it's very going to waste their time. But at the same time, sometimes we need to do something
because it's very easy to do a static analysis that raises no false positives. It's obviously
simple. Just raise nothing. But we want to raise some stuff and it's hard to find the
good balance between, in some cases, raising some false positives, but not raising too many of them.
Yeah, well, I mean, on the upside, if you did raise no errors at all,
you'd also have the fastest static analysis in the world, I think.
And I could deliver it with the fastest C++ compiler,
fully conformant C++ compiler with just outputs.
Sorry, not enough resources to compile this file.
Right. conformance C++ compiler with just outputs, sorry, not enough resources to compile this file. Which is very much true.
You include a header file, but anyhow.
Otherwise, one interesting point
about rules is where we get the
inspiration from the rules.
Because there are two parts.
One, you
need to know, okay, I want to do a rule for this
problem. So first you need to discover the problem and then the second part is, okay, now that I know
I want to do this rule, what should be these limits, the special cases and how to tune it.
So I spoke about the second part, but the first part is quite interesting too. So we
are using our knowledge of the language, of course. We are C++ developers.
So when we do something that doesn't work great,
we try to remember it so that maybe we write a rule about it in the future.
And it also works the other way.
So when we have a rule and in our own code, it's triggered too often.
And when you say, okay, trying to follow this rule
would make the code less clear.
Okay, maybe we screwed up with the rule.
We have to change it.
So that's the good point of dogfooding our own product.
So for example, we had one about it,
about trying to detect
when a lambda is called immediately
because if it's called immediately,
we can capture everything by reference.
It's not going to the angle, so it's safe to capture everything by reference.
And there were some cases, for example, in the first implementation we had
when we were calling an algorithm from the library.
We said, okay, we are passing the lambda to a function.
We don't know if it's going to be called immediately.
But in fact, most of the algorithms of the standard library are just going to call the
lambda and not store it anywhere.
So we add to a special case those algorithms, for example.
And it's through experience on our own code that we discovered this.
So there is this part and the part of inspiration.
So inspiration comes from our experience too.
It also comes from new versions of the standards.
So recently we spent some time reading what was new in C++20
and tried to think if it would deserve some new rules.
And usually those rules come in two categories.
One is, okay, there is this new stuff in C++20,
but it's dangerous because of that and that,
so we should warn against it, only in C++20, but it's dangerous because of that and that, so we should
warn against it, only in the dangerous cases, of course. And there is also the other direction,
which is, okay, there is this new stuff in C++20, it allows people to white-code differently than
it was used before, so maybe we should detect a pattern in all C++ to suggest to upgrade to the
new feature in C++20.
So this is one source of inspiration.
Of course, C++ guidelines are also another source
of inspiration, but those guidelines are very much written
for human beings.
Even if they gave some tips about how it could be
automatically analyzed, it's just a first step.
So we cannot directly implement the C++ code guideline most of the time.
We have to use it as a source of inspiration to make the rule,
but it's not a direct translation.
And also, a big part of C++ code guideline requires the user
to write the code in a very specific way,
which is not the typical way of writing C++. Everything which is related to JSON and this lifetime safety
with decorating the types that you pass everywhere,
we don't consider that this is a typical way of writing the code.
And we are not in the business of telling people to write the code
in a specific way so that we are going to be business of telling people to write the code in a specific way
so that we are going to be able to analyze it correctly.
There are tools that do that and they're pretty useful
if they work in a very narrow domain,
when people have the capability to pay the price,
to do extra work, to have the tool give better results. But this is not what we try to be.
We try to be a tool for all C++ developers and working on existing code. So we cannot do that.
It would not be well accepted by your users. So that's why the CppCogon lines, we cannot do all
of it. And for the part that we do,
we have to do some transformation
between the way it's described and the way we implement it.
And then we have also other sources.
For example, there is MISRA,
which is coding standards for the automotive industry,
mostly basically for safety critical software.
So it was the current official version of MISRA is MISRA C++ 2008. It was a different
C++ at the time. We are working now on upgrading it to C++ 17. I don't know exactly when it's
going to be out, but we try to modernize the rule.
And I think we have a very good balance now
in the people working on MISRA
between people working in the safety critical industry
and people having a very good knowledge of the language
to try to make them match and not safety critical
and modern C++ are not in opposition.
They can work hand in hand.
And I think that's what we're trying to do.
Yet those rules are not going to be rules applicable in every situation
because safety critical code requires extra caution.
And some of the rules could be generic.
Some of them are probably going to be specific for safety critical software.
So if you're using any of the SOS analysis tools against your codebase, do you have options with what ruleset you want to use?
Like, I want to use the MISRA or I want to use the core guidelines or whatever else?
While we have some flags on the rules that explain where they come from, we have the possibility to create some ruleset, but now for
MISRA, we don't have the full coverage of MISRA C++ 2008 and we didn't think it was interesting to
develop this full coverage. So right now we don't have a profile for safety critical code. When we
work on the next version of MISRA, that's probably something that's going to happen. And then we have
so we have basically one profile which goes on our way, which is the set of rules
that we believe should be applicable in all circumstances.
And then we have other rules that could make sense in some contexts that are not enabled
by default, but the user can enable them if it makes sense for him.
Let's say, for example, I don't think we have it now, but we are discussing about adding
a rule about who is const, or or east const or just the same const.
You see, it's typically the kind of rule for which there is no right or wrong.
Right.
There's a couple of things you mentioned about rules that you do have or struggled with that caught my attention, like the deciding when to copy something,
pass by copy or cost by value. So do you actually, in your analysis, do you actually go and look and see what the copy constructor does? You don't just rely on is trivially copyable or something like
that? Well, currently, for this specific rule, we don't look at what the copy constructor does.
Okay. For some other rule, we do some, we look in what it does. For example, we don't look at what the Compute Constructor does.
For some other rule, we look at what it does.
For example, we have a series of rules about constness.
For example, we have some rules that say, okay, this variable is not modified.
It could be const.
But to know if it's modified or not, we don't only rely on the prototype of functions.
So even if you pass by non-const reference, if you can
see the body of the function and if you can detect that it's not modifying
the object, you are still going to raise the violation telling you
you should have this. But first you should pass it
as a const reference and it should be a const variable.
It just caught my attention because
I noticed
one project I'm working on that
some tag types
which are just used for overload
resolution,
right, like just passed as a parameter to a function
just to say choose this version of the overload
or the constructor
where in some cases accidentally being
passed by const reference
instead of by value even though they were just empty types entirely and in a templated type
used thousands of times across hundreds of c++ files that actually had a notable impact on the
code base to just make that pass by value in that case i think think I'm piloted enough to push pointers onto the stack. Actually for this rule, in the first version, in some cases we said, okay, you're passing by
reference, but you could pass it by copy. But we had lots of issues. When we do a rule for that,
we test it on, I don't know, maybe 30 or 50 open source projects. We run it on LibreOffice, we run it on Clang,
we run it on Linux Kernel,
we run it on many big open source projects,
and we try to see the results.
And for this rule, there were so many cases
when people passed by reference,
they could have passed by copy,
and we said, okay, there is value in passing by copy as you said but the
value is not as great as in the other direction so because if we just where's an issue in these
cases it's going to be too noisy so we are not going to do it well maybe you could upgrade the
rule to say if it's a completely empty type then always pass it it by night. Maybe that could be an option, yes.
The other thing you mentioned of when to alert
on a default capture by reference for a lambda
with immediately invoked lambdas or passing them,
that's one of the rules from effective modern C++
that just is too strict because I believe in Scott says there
to never do a default capture by reference,
but it basically eliminates all the useful use case,
all the interesting use cases for lambdas,
like immediately invoking them or passing them to algorithms.
So I found that interesting that you had to put in
like a special rule to check for algorithm usage.
Yeah, the first version of the rules about Lambda
didn't care about if the Lambda was immediately called or not.
And the feedback we looked, we saw on different projects
and our own project told us that we really needed
to make this important distinction.
Yeah.
So do you, when users set ignore,
a rule to be ignored on Sonar Cloud,
do you keep those statistics and say,
hey, there's like 100,000 users
that are all ignoring this rule.
Maybe we need to take a closer look at it.
Well, we are starting to do it.
We are starting to get this kind of feedback
from the users.
Right now, I think my colleagues
who work on other languages are more advanced than we are
on this point. Sonar Cloud is more recent than Sonar Cube. And at first, we also had to make
sure that by doing that, you are not sharing secrets from the customers. So we had to be careful about what information we collect. So now we do, I know
that we do it, I don't know the details and we clearly want to use this information, especially
we introduced during this year, lots of effort about rules around security, mostly for other
languages, a little bit for C++ also,
but mostly for other languages. And we wanted to know what was the feedback of those. So I know
that people got a good look at it, if it was detected as being a false positive, or if it
was deactivated, trying to understand what happened. It just feels like that could be so fascinating
to look at that. Because on one hand, you want to be like, a thousand people are ignoring this.
They clearly can't all be wrong.
But then sometimes I'm guessing you're going to look at it
and be like, wow, a thousand programmers are dead wrong.
I know they made an experiment
for some very safety-sensitive embedded software.
They had the requirement that the software had to be
developed twice and run twice in two different parallel hardware on the real system and to just
to make sure that it was uh rock solid and and those teams that developed the the program uh the
contractual obligation not to talk with each other. Yet, in some cases, they discovered that there was the same bug at the same place.
Wow.
Because both of them were developed by human beings
and had almost the same education.
So it's easy to, I mean, some places are naturally error prone.
So that's the same bug in the same places in some cases.
That's really interesting.
Because it's like the Apollo guidance computer or whatever was also written. And so they had the same bug in the same places in some cases. That's really interesting.
Because it's like the Apollo guidance computer or whatever was also written.
Anyhow, some of the NASA mission stuff was written twice by two different contractors.
And then the systems voted to see what was the correct thing to do at that moment.
I'm curious.
And the problem is twice.
If they disagree, what do you do? So if you should do it three times, that could be a majority.
That's a Minority Report plot right there, right?
A whole Tom Cruise movie for those who didn't catch that. But anyhow.
Gosh, that movie is old now.
It is old now. Yes. And I'll just make you feel old to think about how old that movie is old now it is old now yes and i'll just make you feel old to
think about how old that movie is you you did mention uh some newer security rules i think you
said for for civil sauce what are some of those newer rules well basically we are focusing on uh
what there's several uh like always we we go in several directions and we see what we can do,
where we can bring some value to the user. So for security, one of the not so complex to develop,
but I think quite interesting rule was one about when people are using in C, when they're using
memsets to try to delete some memory so that even if someone gets access to the memory of the
process, they will not see a password that could be stored there.
But of course, memset could be optimized away.
If you never read back that memory.
If you never read back the memory, which is what you do usually when you do this kind of stuff.
You just want to scrap it away before doing something else.
So you are not going to read it back.
So we have a rule that detects that and says you should use the special,
it's memset underscore s, I think,
a special function that cannot be optimized away,
so it does the same work.
So we have this rule.
We have a bunch of rules around the POSIX functions,
which are, it's mostly C because, you know,
it's so much easier to make a buffer overflow
in C than in C++. I mean, CharStar, why CharStar? So, there are lots of POSIX functions that
work with CharStar, and that takes two arguments, a buffer and a size, and we try to detect if the buffer size is consistent
with the size which is passed to the function.
And this is clearly one domain into which we have a way to improve again
because it's in the category of rules which are hard to do
because you have to follow the data across the execution path. And it's never going to be... it's been proven to be impossible to
be perfect in those cases. It would be like solving the halting
problem to do that perfectly. So we know we're never going to be
perfect, but we try to improve. And also we are going to try also
to improve on the way we report the issue to the user.
Because we are not totally great on this point right now.
But still, we have these rules that try to detect
when you are using some POSIX functions
and you are not doing it well and you might have buffer overflows.
And then we have also some security rules
around some uses of cryptography libraries
or hashing libraries
that compute safety-critical hash.
And so for those roles,
it's mostly those libraries
have lots of API
and some API are safe,
some aren't safe.
And we detect
if people use the wrong API
or didn't. For example, you create a
cryptographic context, then you should set some option before encrypting something and you forgot
to set those options. So that's the kind of stuff we detect. Wow. Yeah. Some of those sound like
really advanced. Like you said, like trying to make sure that the size matches the buffer that was created,
you have to do all kinds of code analysis to figure out when this buffer was created,
how big it was, is it still alive, whatever. Yeah, I mean, the day we will have solved this
problem perfectly, but it's probably never good to happen. I will retire before it happens.
We have some things that give some good results.
We have lots of ways to improve.
But yes, it's a really hard problem.
So where do you see how Sonar Source or Sonar, the tool suite,
fits in with things like Address Sanitizer and Clang Tidy
and some of the other tools that are available out there right now?
Well, actually, to do this kind of analysis,
we rely on the Clang Static Analyzer,
which is one of the engines behind the Clang Tidy.
Okay.
We extended it.
We might tune it a little bit differently,
but it's the same basis. And if you compare to address sanitizer,
it's totally different because we don't work at runtime. And I think one of the biggest advantage
of not running at runtime is that we detect problems even if you don't run into them right now
in your test suite. So even if your test suite is not very good and it's not
going to trigger these specific conditions, we could detect that in some very corner case
situations that could happen on the customer side, but that's not happening in your unit test,
you could have a memory allocation issue. So I really think that address sanitizer
and what we do are complementary stuff.
You should have both on your programs.
But you said you use the Clang static analyzer
for the backend.
So if I run Sonar Cloud,
if I run my project on Sonar Cloud,
am I going to get all the Clang tidy rules
as well as your rules run? Or are you going to run
just your rules? So we don't integrate directly Clang tidy. We are based on Clang, but we don't
integrate everything from Clang tidy. So some of the rules we have are inspired from Clang tidy
because we just had the same idea, but we didn't try to integrate all of Clang-Tidy. So we designed the rules
ourselves. We looked at every source possible, of course, but we didn't just copy paste Clang-Tidy.
Oh sure. No, I mean, I just didn't know how much of it was inherited, not necessarily that you,
you know, copy and pasted it in there or whatever. And we also have some rules that are just based
on Clang warnings, because some of those warnings could be very useful.
And also, OAnalyzer is capable of analyzing code written for Visual Studio, for example.
So especially for those users, having some of the warnings from Clang might not be easy.
So we also integrated some of them.
Okay. Do you want to tell us more about the process of implementing some of these rules?
So you talked a little bit about how it's all based on Clang. Do you want to go us more about the process of implementing some of these rules? So you talked a little bit about how it's all based on Clang.
Do you want to go into any more detail?
Well, we basically have three strategies to implement a rule.
One is there is already a warning in Clang which does the work.
So we just reuse it.
So it's the easiest case.
Unless you want to do something 99% like Clang, but not 100%.
So you have to give up this easy process.
Then there is the rules which needs to follow the flow of execution.
So typically, knowing that a specific value, a specific variable must have a specific value
at a moment in the code.
So those ones are based on Clang Static Analyzer. So we developed Static Analyzer checkers
using the framework provided by Clang to implement those rules. And then the third type of rule is
rules that are based on the structure of the code. So typically rules about rule of five. You don't need to simulate the execution to be able to detect the violation of the code. So typically, rules about rule of five,
you don't need to simulate the execution to be able to detect a violation of the rule of five
or the rule of zero.
And so those ones are based also on Clang,
but not HT matches.
So it's the same technologies that is used in Clang-Tidy,
just that we developed other rules.
If I get a rule violation from one of your tools, how much information am I going to
get now?
What I'm specifically thinking of, as you said, a lot of the rules are inspired by the
core guidelines.
I'm just curious, like if I get a rule violation, am I going to get documentation that says
this is a bad idea because such and such, and you might want to read this core guideline,
for example?
Yeah, basically every rule we
have a documentation where we try to explain why we have the rule we also try to explain what
pattern we detect which might not be exactly the same as why we have the rule because we could have
a why which is generic and inside of this generic purpose we detect only a few specific patterns
then we almost always have an example of bad code, an example of how
to rewrite the code so that it's better.
And if the rule is inspired by some external sources like CppCodeGroundline or Misra on
search, we are referencing them.
And in the rule itself, say we try to have what we call secondary location.
So the rule is located at one place in the code, but we also try to reference other places of the code that are interesting to understand why the
rule is triggered. So for example, if you let me try to find a real world example,
I'm not 100% sure, but for example, for the rule about the fact that you could pass an argument by
const reference, maybe we are going to highlight where the argument is going to be used
so that you can easily check that you really want to pass it by cause reference.
It's not that we could have made a mistake in the rule.
It's more like maybe you didn't pass it by cause reference
because right now you're not modifying it,
but in your mind you plan to modify it in the future
or something like that.
So it's always better to have information
about more context when you read the rule
to really understand if it's something
that you want to act on or not.
Okay.
Well, Loic, it's been great having you on the show today.
Is there anything else you want to talk about
before we let you go?
Obviously, listeners can go and check everything out
at sonarsource.com,
right? Yes.
No, I don't have any
specific ideas right now.
Okay. Where can listeners find you online?
Sorry? Where can listeners
find you online? Are you on Twitter, blog, or anything
like that? Well, I'm old
school. I'm not on
Twitter.
I have a few blog articles into the blog of Sonar Source.
I'm mostly joinable by mail.
That's the main way to do it.
I mean, that's the way it worked in the 90s.
Why shouldn't it work again today?
All right.
Well, we'll be sure to include those links to your blog and everything on the show notes.
Yeah.
And if you speak French, you might also follow a few.
I quite often participate to French meetups or to the CPPP conference that started in 2019
that was canceled in 2020 for some reason.
And we plan to start it again
in 2021
and I was a speaker
there but it was also in French
so I like to speak
in French because it allows me to reach
a different kind of people than when
I speak in English
sure, alright well it's great having you
on the show today Loic
thanks for coming on
thanks so much for listening in as we chat about C++ Sure. All right. Well, it's great having you on the show today, Loic. Thank you. Thanks for coming on.
Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have
a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to
feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter.
You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter.
We'd also like to thank all our patrons who help support the show through Patreon.
If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast.
And of course, you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode is provided by podcastthemes.com.