CppCast - SonarSource Analysis Tools

Episode Date: May 7, 2021

Rob and Jason are joined by Loïc Joly from SonarSource. They first discuss compiler updates in GCC and MSVC as well as survey results of most used C++ features. Then they talk to Loïc about the Sona...rSource static analysis tools for C++, what sorts of bugs they discover, and what goes into creating a new analysis rule. News VS 2019 STL is C++20 feature complete GCC 11.1 Released Meeting C++ survey results: the most popular C++ standard features Links SonarSource The NeverEnding Story of writing a rule for argument passing in C++ Sponsors C++ Builder

Transcript
Discussion (0)
Starting point is 00:00:00 Episode 298 of CppCast with guest Loic Jolie, recorded May 5th, 2021. This episode is sponsored by C++ Builder, a full-featured C++ IDE for building Windows apps five times faster than with other IDEs. That's because of the rich visual frameworks and expansive libraries. Prototyping, developing, and shipping are easy with C++ Builder. Start for free at Embarcadero.com. In this episode, we discuss compiler updates and survey results.
Starting point is 00:00:49 And we talk to Leroy Fischoli. Love talks to us about the SonarSource suite of static analysis tools. Welcome to episode 298 of CBPCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? All right, Rob, how are you doing? Doing all right. Do you want to put in a plug for your latest C++ Weekly video?
Starting point is 00:01:36 I know it's been getting a lot of views, right? Yeah, I just thought I'd pull up the stats just out of curiosity. So I did an episode called Break the ABI to save C++, which I haven't gotten any hate mail yet directly towards me. And it's been up since what's today. Today's Wednesday. So it's been up for 48 hours. I've got 11,000 views so far,
Starting point is 00:02:01 which is a little over two times my normal view rate. Okay. So yeah, we'll see what happens with that. Yeah, and it's obviously something we've been talking about a lot on this show for a while now, and we were talking right before the show how we might try to have someone with the opposing viewpoint on soon. Right. Yeah, I mean, our regular listeners may have noticed that I have been conducting an informal poll for the last several years.
Starting point is 00:02:31 Pretty much every guest I ask and they all say we haven't had anyone say, no, we need ABI stability for our project or whatever. Right. Okay. Well, at the top of every episode i cured piece of feedback and and on that subject uh this tweet is from uh nick gully and he wrote uh cpp cast how about some new gear a ceramic mug with a giant abi on the side for us to mull over breaking and i kind of like that idea i might put that together yeah oh that's terrible you had like a party everyone smashes their abion
Starting point is 00:03:07 don't they wasn't there a book that had like you know it was titled like steal this book or something like that we could have a mug and say break this mug right maybe on the other side something like that maybe i think weird l one of his more recent releases also was like steal this song or something like that. Alright. Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cppcast.com. And don't forget
Starting point is 00:03:34 to leave us a review on iTunes or subscribe on YouTube. Joining us today is Loic Jolie. Loic is a C++ coder, speaker, teacher, and expert. He represents France on the ISO C++ Standardization Committee and is also a member of the committee drafting the next version of the MISRA C++ Standard for Safety Critical Systems.
Starting point is 00:03:52 In addition to developing in C++, he has a special interest in teaching it and spreading good practices across the community. He's a frequent speaker at meetups and conferences and teaches at Telecom Sud Paris. Since he joined SonarSource in 2018, he has worked on static analysis for C++, both specifying rules to help other developers and having the fun of implementing them.
Starting point is 00:04:11 Loic, welcome to the show. Hello, welcome, and thank you for having me. And so do you think that we should be willing to break the ABI for C++ in the near future? Well, I must say that every concern about the ABI are a little bit foreign to me because I always rebid my code from source. And I'm mostly a Windows user
Starting point is 00:04:32 when it's typical to package the libraries at the same time as you package your product. I don't work with shared libraries across different products. So for me, ABI is almost a non-existing concern. And yet I have to pay for the things that I don't really use. Oh, that's an ironic way of looking at it with the, yeah, don't pay for what you don't use mentality of C++. Well, and it's interesting too, since you
Starting point is 00:04:58 mentioned that you're a Windows developer, because when I released this video, I got a lot of feedback from people that are like, what are you even talking about? We don't have ABI stability. I saw some of those comments. I was very surprised by that since it's been such an ongoing discussion. There were a lot of people who are still very unaware of what it is and why it's important. And I think that now in Windows, it happens. Because previously, when you switch from a version of Visual Studio, you just had to recompile everything.
Starting point is 00:05:24 And now they try to introduce stability between different versions of Visual Studio, you just had to recompile everything. And now they try to introduce stability between different versions of Visual Studio. So I think it's a little bit moving into the direction of stable ABI, even if it's not totally there yet. But it still seems like almost
Starting point is 00:05:39 every Windows developer just assumes, if I upgrade my compiler, I have to rebuild the world. Like, they still assume that. So, I mean, if everyone's already in I have to rebuild the world. Like they still assume that. So, I mean, if everyone's already in the place to rebuild the world, then let's just go ahead and rebuild the world. That's what I say. I know that I may not have to do it, but I will feel more confident if I do it anyway. Right. It's like the very first thing you do, right?
Starting point is 00:06:01 You have a weird bug that you can't diagnose. What do you do? Rebuild. Right. Because you just assume there's a link problem somewhere at some point all right well look we have a couple news articles to discuss uh feel free to comment on any of these and we'll start talking more about uh the work you're doing a sooner source okay sure all right Sure. All right. So this first one, speaking of Visual Studio, the 2019 16.10 preview is, I believe, out now. And this is their changelog announcing that it is going to be the STL is going to be C++ 20 feature complete, which is pretty exciting that we already have at least all the library features implemented one of the major compilers and there's an ironic problem here what is that i've been following this along on twitter and i got some clarification from victor on it um there's basically a bug in std format where one
Starting point is 00:07:00 part of it accidentally relies on locale and Instead, format is supposed to be locale independent. They want to change that as a real quick fix to C++20, theoretically. But Visual Studio has already promised ABI compatibility with their standard library implementation. So if they were to fix that in the standard, Visual Studio can't fix it, and it would still be stuck with locale dependence. Hmm. Interesting. Yeah.
Starting point is 00:07:32 Anything else either of you want to highlight? Anything you're looking forward to being able to use in the new version? Well, just one point to me. It's a little bit strange to focus on full library support and full language support, because now I think that both of them are entangled in some way. And what really matters for me is full support, period. But I know it's a good step. I'm pretty happy. I'm excited by it. But I really think that
Starting point is 00:08:01 I'm not sure to which point full library supports totally work without full language support. They're pretty close, though. If you look on CVP reference, Visual Studio's language support is really, really close for C++20 also. What are they still waiting on? Do you have that? I don't recall. I'd have to look back. All right.
Starting point is 00:08:21 Well, in other news, we have an update for another one of the compilers. And this is GCC 11.1 is being released. And they're also announcing a lot of great support with C++ 20. They're changing their default language version up to C++ 17, which is great. And it sounds like they specify exactly how far along they are with 20, or they're just saying that they're making good progress? They say go to their status page for some of these things. I went to CBP Reference
Starting point is 00:08:54 to look at their version of the compiler support chart for C++20 GCC. And it's interesting because GCC is, let's see, they're in the green for every single language feature except modules. Oh, okay. Yeah, that's very good. Yeah.
Starting point is 00:09:14 Which is interesting because Clang had the tendency to be ahead on these things. But if you look at this comparison chart right now, GCC and Visual Studio are way ahead of Clang for C++20 support. Yeah. It's a pattern that I've noticed also, and I'm even a little bit worried about it, because it seems to show that there is a kind of slowdown in Clang. It used to be first for language features, and now it's...
Starting point is 00:09:43 I really rely on Clang for my daily for my daily work not as a not only as a compiler but also as the libraries that we use as a basis to do static code analysis so i mean it's a wonderful tool for that but the fact that they are lagging behind Visual Studio and behind the GCC is not a good sign. I hope it's just some punctual reasons and not a global pattern. Because right now, for example, for modules, I don't see many stuff in Clang at all. Even GCCs are not complete with modules, but they have the basics done and there are some few corner cases to handle, but they already are pretty good. Yeah. Yeah, I'm sorry.
Starting point is 00:10:30 I'm just scrolling through this list of comparisons again. Yeah. And some of the things that are like lambdas in unevaluated context, which sounds small, but is huge to library developers. It doesn't even list as partial support in clang yet hopefully they'll be catching up soon i do want to comment on one other thing before we move past gcc that one of these notes in the gcc 11 so first of all i was so confused apparently 11.1 is the first official release of the gcc 11 series oh there wasn't an 11.0? Not that I can find. Maybe I'm nuts.
Starting point is 00:11:08 But I think it's the first official release in the GCC 11 series. But they mention here hardware-assisted address sanitizer support. Apparently, the 64-bit ARM target has the ability somehow to do hardware-assisted address sanitizer support. So it uses considerably less RAM. It's in a testing phase and can only be used at the Linux kernel right now, but I found that very interesting. Yeah, sounds interesting. Okay, and then the last thing we have
Starting point is 00:11:33 is meeting C++ survey results, and this was a survey on the most popular features of C++. And what's interesting is he shows last year's survey results side by side with this year's results. So you can see how some features might get used more over the past year as developers have had more time to get accustomed to using these features.
Starting point is 00:12:00 And I think actually there were a couple that looked like they had gone down just a little bit. But most of the usage of various features seems to be pretty consistent from year to year. So I thought that was interesting. Yeah, that's one of the main things that stood out to me was that the histograms are almost identical from year to year, which I just really didn't expect that. But yeah, just a couple of little little things here there seems there was more people reporting that they're not using c++ 14 which is maybe just a different group
Starting point is 00:12:32 of people that responded to that question i don't know but or they're upgrading to 17 right i guess maybe they misunderstood the question and they're like no i'm not using not using 14. I'm using 17. I don't know. Right. Okay. So Loic, we've talked a lot about different static analysis tools on the podcast before, but I don't think we've discussed SonarQube before. So do you want to start off by maybe just telling us a little bit about the SonarQube analysis tool? Well, I'm not going to talk only about SonarQube,
Starting point is 00:13:03 but about all the tools that we do at SonarSource, which are basically three tools. But they work together. And if you want to have a full story of what we propose, we have to talk about all of them. So we have SonarLint, SonarQube, and SonarCloud. SonarLint is the tool that runs in the IDE. So it's your first line of defense. And then we have SonarQube and SonarCloud, which work more in the continuous integration phase. So SonarQube is something that is self-hosted, while SonarCloud is something on the cloud. But basically, it's the same set of features,
Starting point is 00:13:40 and it allows you to analyze your code during pull requests, for example, and to block a pull request if it's not meeting some quality criteria. And you can also see the history of the code, how many new issues you have introduced, and this kind of stuff. I'm sorry, please go ahead. So, yes, our goal is to be tightly integrated into the development cycle to make a tool for developers. So that's why we are in the inner loop with the IDE, and we are in the slightly outer loop with pull requests.
Starting point is 00:14:14 And that's where we are positioned. With SonarLint, you said that's an IDE tool. Is that available for different IDs as a plugin? Which ones does it support? Well, it supports different ID depending on the language. I'm going to focus on C++ here because that's why we're there. So for C++, we support Visual Studio. And since very few times, maybe one week or two,
Starting point is 00:14:38 we also support CLion. Okay. So it's brand new. And basically, it's the same rules that run inside of the IDE and inside of SonarQube, SonarCloud. The main difference is that some rules that require full project knowledge are not going to work in the IDE, because in the IDE, we work on a translation-in basis.
Starting point is 00:14:59 Okay. I'm kind of curious, like, what kind of tools require full visibility of the project? Like unused function. If you don't have a full project, unused function is obviously not doable. That's interesting because I just spent a week or so with one of my clients removing about 100,000 lines of code that were unused. Considerable performance as pile time savings for us on that project. And it's not so easy to detect that something is unused in C++. It's
Starting point is 00:15:34 very tricky because you have some functions that are not used, but you still don't want to remove them. For example, if you have a class which is a container, maybe you want to remove them. For example, if you have a class which is a container, maybe you want to have begin and end. And even if the user is not using them right now, you still want to provide begin and end, because it just makes sense to make your class compatible with other similar classes. So this notion of a new function is not a clear cut,
Starting point is 00:16:04 I would say. So that is an interesting one that came up in the project I was working on because some of the functions were used, but only by the test suite. So I had to disable the test suite for some of my detection because if it's only used in the test suite, for our particular case, I didn't care. I wanted to remove it. Makes sense. So does your tool actually say it is completely unused or can i
Starting point is 00:16:30 get a report that says it is used only in this one place or something like that no it's going to only going to tell you if it's totally unused we we try to how to... we are not providing a tool to help you explore your code. We are just focusing on raising issues and trying to explain the issues. So for example, if we have a rule about the rule of five, so we are going to point the class and you are going to say, okay look here you have defined a destructor, here you have defined a copy constructor, but you forgot something. So we are going to show you all the interesting locations that are useful to understand what's the problem, but you are not providing a tool to manually explore your code base.
Starting point is 00:17:14 Okay. So as you said, we're obviously on a C++ podcast. We're going to focus on C++. But do you want to tell us about what other languages Sonar Source Analysis works for? I think we have 27 languages but the main languages we focus on are java we started as a java company mostly so we are very well known in the java ecosystem and then we also have tools for C Sharp, Python, JavaScript, Apps Script. And I forgot many of them. That many, okay.
Starting point is 00:17:49 I think I mentioned the most important ones. And if I forgot some of them, some of my colleagues are going to kill me tomorrow. So we'll see. I actually, coincidentally, as we were discussing this interview, one of my friends shared with me an issue that SonarQube had found in their C Sharp code base. That was, I don't know, I can't remember the detail of it, but it was particularly insidious. Oh, it was something equals minus something instead of something minus equals the other value.
Starting point is 00:18:22 And it was, yeah, particularly insidious in their code. And SonarQube found it. So that brought it to my awareness recently. And I know that, for example, since we are very well known in the world of Java, some people who work on the JVM also use our tool on C++ and discover interesting stuff in the JVM.
Starting point is 00:18:41 Oh, kind of bring it full cycle that way. Yeah. I'm curious, I see here that it says, because I brought up the SonarLint page after you mentioned that and looking at the CLion plugin, it says SonarLint is open source. It's an open source project? Well, SonarLint, which is the integration inside of the IDE is open source, but the core of the C++ analyzer is closed source. Okay. However, it's available for free in Sonalin.
Starting point is 00:19:09 So if you analyze code in IDE, it's available for free. It's also available for free if you do Sonar Cloud on an open source project. Okay. But if you want to do Sonar Cloud on a closed source project or if you want to do C++ analysis in Sonar Cube, you have to pay for it. And so it's based on the
Starting point is 00:19:26 number of lines of code in the project. Oh, that's interesting. Model. Okay. So are there specific types of bugs that the SonarSource analysis tools are particularly good at finding? Well, we have a
Starting point is 00:19:41 full range of issues in many different directions. And so I think that, at least from my point of view, where we have a lot of value compared to other tools that exist, at least when we try to invest, we have a lot of value, is about rules, not only about detecting a bug, but rules that push forward good ways of programming. So let me give you a classical example. We can have a rule about double delete or about missing a delete and memory leak.
Starting point is 00:20:15 It's pretty hard to have this kind of rule. Everybody has it, but to have a good rule for that, it's pretty difficult because you have to follow all the paths inside of your code. And then as soon as you have a function call, you don't know in which condition it could be called, so you have to do very complex stuff. But we are developing C++. What we should do is not do any raw memory allocation. We should use unique pointers. And it's pretty easy to do a rule that detects just that you allocated memory without storing it immediately into a unique pointer or shell pointer on the class like that. And it adds... this is almost so simple that you don't need a static analysis to do it, but there are some other rules of the same category where
Starting point is 00:20:57 by pushing forward for the good practices, you prevent bugs indirectly. And at the same time, you also try to have rules that detect the bug when you didn't follow those good practices. But I really think that this is one differentiating point between O2 and, for example, compiler warnings, because compilers now are doing lots
Starting point is 00:21:18 of warnings, very interesting ones, but they really try to warn only when the code is almost certainly broken. We try to warn only when the code is almost certainly broken. We try to raise issues when the code is badly written. So that's a difference. It's really valuable, yeah. And as I mentioned, for example, we have some rules about rule of five or rule of zero.
Starting point is 00:21:38 The code could perfectly be correct without following those rules, but it's so much more simple to read code that follows them that we do that. Okay. That's interesting. I think you said you aim towards a rule like don't allocate memory without immediately putting it into a unique pointer or something like that. And just out of curiosity,
Starting point is 00:21:57 do you also have rules that try to detect misuse or abuse of smart pointers, like manually calling delete on the pointer returned from a unique pointer or unintentional copies of a shared pointer or something like that? I don't think we have the one you mentioned, but for example, we have one rule,
Starting point is 00:22:13 which is you should not pass a unique pointer by const reference. Oh, okay. Because if you do, you should just pass the pointer, your function will have a broader interface and it will be just the same. Right, yeah, yeah, exactly. So we have a few rules like that.
Starting point is 00:22:28 Okay, interesting. I already asked about the other languages that Sonar Source supports. Does it run on other platforms as well? I know you mentioned the Visual Studio plugin. Does it run on Windows? Does it run on Linux and Mac as well? Yes, the core of Sonar Source runs on Windows, Linux and Mac OS.
Starting point is 00:22:51 And then for SonarLint, it really depends. For example, I know that for Java, it runs on Eclipse and VS Code also. And I don't have the full matrix of which language is supported in which ID. But yes, basically we try to support the major ones, obviously. Okay.
Starting point is 00:23:12 And a point I forgot to mention, an important difference between SonarLint and SonarQube, SonarCloud is SonarLint is basically more for the developer himself, while SonarCloud and SonarQube are more for a team. So for Sonar Cube and Sonar Cloud you have some rules that are going to be followed by the full team. And so you have one settings of rule that is going to be shared on each pull request. And there is also in Sonar Lint what we call a connected mode, which is if you also run Sonar Cube or Sonar Cloud,
Starting point is 00:23:43 you can synchronize with it. So that for example if in Sonar Cube or Sonar Cloud, you can synchronize with it so that, for example, if in Sonar Cube you say that a rule is a false positive for you, it's not an issue, you decided that, okay, in this case, I don't want this rule, we will stop reporting it in the IDE also. Oh, nice.
Starting point is 00:24:00 That is interesting. So those tools are not totally separated, they really work together. So he says, Son totally separated. They really work together. So he says Sonar Cloud is available for open source projects. Is that right? Well, it's available for open or closed source, and it's free for open source. Free for open source, okay. Is there any limitations on the open source analysis?
Starting point is 00:24:18 It's exactly the same version. Okay. I'm just thinking about how to go about enabling it on this other project that I've been working on. Yeah, how easy is it to set it up for an open source project if you're interested? Well, if you want to set up Sonar Cloud or Sonar Cube, you have to add some steps into the build system because, as you know, C++ build systems are awfully complex. So what we do if we have a program that is going to eavesdrop what happens during a build and save all information about which file was compiled with which option, which include path and everything. So it just detects when we call the compiler itself and it saves it in the file.
Starting point is 00:25:01 So now we have all this information and we can use it to correctly analyze the code with the right parameters. So the way to configure this analysis is first to modify slightly your build system so that when you build you are wrapped inside of this additional program that will save the data. It's like a compilation database of Clang if you know this, except that it can be created with any kind of build system. And then we use this kind of compilation database to do the analysis in the second step and to upload the results on SonarQube and SonarCloud.
Starting point is 00:25:35 It's a little bit of work to integrate it into your CI, but once you are used to it, it's just a few lines of script. Do you have a quick start? If you're using a CMake project, then this is what you need to do kind of documentation? We have some quick start guides, and we have some simple projects that you can look at. I want to end up the discussion for just a moment to bring you a word from our sponsor, C++ Builder. The IDE of choice to build Windows applications five times faster while writing less code.
Starting point is 00:26:04 It supports you through the full development lifecycle to deliver a single-source codebase that you simply recompile and redeploy. Featuring an enhanced Clang-based compiler, Dyncomware STL, and packages like Boost and SDL2 in C++ Builder's Package Manager, and many more. Integrate with continuous build configurations quickly with MSBuild, CMake, and Ninja Support, either as a lone developer or as part of a team. Connect natively to almost 20 databases like MariaDB, Oracle, SQL Server, Postgres, and more with FireDax high-speed direct access. The key value is C++ Builder's frameworks, powerful libraries that do more than other
Starting point is 00:26:38 C++ tools. This includes the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey framework for cross-platform UIs. Smart developers and agile software teams write better code faster using modern OOP practices and C++ Builder's robust frameworks and feature-rich IDE. Test drive the latest version at Embarcadero.com. So you mentioned in your bio that while working on the static analysis, you specify rules and helping other developers implement them.
Starting point is 00:27:11 What kind of work goes into specifying new rules for the SoonerSource analysis tools? How do you go about that? Well, sometimes it's very complex and there are some rules which took us more time to specify than to develop. It can seem surprising, but I even wrote a blog article on one of them because it took so much effort that I said, okay, I'm not going to waste this effort just to write a rule. I'm also going to write a blog article to explain what are the different steps in the rule. It's a rule which is simple. Should you pass by
Starting point is 00:27:50 copy or by cross-reference? It's very simple. You probably have your own algorithm in your head about what should you do. But if you try to implement this kind of algorithm automatically, then it becomes much more complex. For example, how do you decide if a type is expensive to copy or not? What's the criterion? Is it just the size of the type? If you look at CppCore guidelines, for example, you will see that they suggest having a rule like that, only focusing on the size of the type. But you can have small types which are expensive to copy because they own external resources. But you can have small types which are expensive to copy because they own external resources.
Starting point is 00:28:27 But if you say, for example, that you want to look at the copy constructor to see what it does, it means that you have to have it available. So it goes against the... What we like to do is to be able to analyze the translation you need in isolation. We try to avoid full project analysis if we can because it's much more expensive, it's more complex to set up. So,
Starting point is 00:28:51 if you are just looking at one translation unit, you may not have the body of the copy constructor, so you may not know if it's doing some expensive operations or not. So, we try to develop this rule and we developed it maybe three or four times and right now we are in the process of upgrading it once again. Because it's very simple, it's already difficult in fact to have guidelines that are for human beings. But when you try to apply the guidelines by computer, it's every, every dark corner, you have to tackle it and to decide, okay, am I going to raise a violation here or not?
Starting point is 00:29:32 And we try very hard not to raise false positives because false positive are a pain for the developer. It's just going to waste their time. But at the same time, at some times we need to, to do something because it's very going to waste their time. But at the same time, sometimes we need to do something because it's very easy to do a static analysis that raises no false positives. It's obviously simple. Just raise nothing. But we want to raise some stuff and it's hard to find the good balance between, in some cases, raising some false positives, but not raising too many of them. Yeah, well, I mean, on the upside, if you did raise no errors at all,
Starting point is 00:30:11 you'd also have the fastest static analysis in the world, I think. And I could deliver it with the fastest C++ compiler, fully conformant C++ compiler with just outputs. Sorry, not enough resources to compile this file. Right. conformance C++ compiler with just outputs, sorry, not enough resources to compile this file. Which is very much true. You include a header file, but anyhow. Otherwise, one interesting point about rules is where we get the
Starting point is 00:30:37 inspiration from the rules. Because there are two parts. One, you need to know, okay, I want to do a rule for this problem. So first you need to discover the problem and then the second part is, okay, now that I know I want to do this rule, what should be these limits, the special cases and how to tune it. So I spoke about the second part, but the first part is quite interesting too. So we are using our knowledge of the language, of course. We are C++ developers.
Starting point is 00:31:05 So when we do something that doesn't work great, we try to remember it so that maybe we write a rule about it in the future. And it also works the other way. So when we have a rule and in our own code, it's triggered too often. And when you say, okay, trying to follow this rule would make the code less clear. Okay, maybe we screwed up with the rule. We have to change it.
Starting point is 00:31:27 So that's the good point of dogfooding our own product. So for example, we had one about it, about trying to detect when a lambda is called immediately because if it's called immediately, we can capture everything by reference. It's not going to the angle, so it's safe to capture everything by reference. And there were some cases, for example, in the first implementation we had
Starting point is 00:31:52 when we were calling an algorithm from the library. We said, okay, we are passing the lambda to a function. We don't know if it's going to be called immediately. But in fact, most of the algorithms of the standard library are just going to call the lambda and not store it anywhere. So we add to a special case those algorithms, for example. And it's through experience on our own code that we discovered this. So there is this part and the part of inspiration.
Starting point is 00:32:21 So inspiration comes from our experience too. It also comes from new versions of the standards. So recently we spent some time reading what was new in C++20 and tried to think if it would deserve some new rules. And usually those rules come in two categories. One is, okay, there is this new stuff in C++20, but it's dangerous because of that and that, so we should warn against it, only in C++20, but it's dangerous because of that and that, so we should
Starting point is 00:32:45 warn against it, only in the dangerous cases, of course. And there is also the other direction, which is, okay, there is this new stuff in C++20, it allows people to white-code differently than it was used before, so maybe we should detect a pattern in all C++ to suggest to upgrade to the new feature in C++20. So this is one source of inspiration. Of course, C++ guidelines are also another source of inspiration, but those guidelines are very much written for human beings.
Starting point is 00:33:15 Even if they gave some tips about how it could be automatically analyzed, it's just a first step. So we cannot directly implement the C++ code guideline most of the time. We have to use it as a source of inspiration to make the rule, but it's not a direct translation. And also, a big part of C++ code guideline requires the user to write the code in a very specific way, which is not the typical way of writing C++. Everything which is related to JSON and this lifetime safety
Starting point is 00:33:52 with decorating the types that you pass everywhere, we don't consider that this is a typical way of writing the code. And we are not in the business of telling people to write the code in a specific way so that we are going to be business of telling people to write the code in a specific way so that we are going to be able to analyze it correctly. There are tools that do that and they're pretty useful if they work in a very narrow domain, when people have the capability to pay the price,
Starting point is 00:34:21 to do extra work, to have the tool give better results. But this is not what we try to be. We try to be a tool for all C++ developers and working on existing code. So we cannot do that. It would not be well accepted by your users. So that's why the CppCogon lines, we cannot do all of it. And for the part that we do, we have to do some transformation between the way it's described and the way we implement it. And then we have also other sources. For example, there is MISRA,
Starting point is 00:34:55 which is coding standards for the automotive industry, mostly basically for safety critical software. So it was the current official version of MISRA is MISRA C++ 2008. It was a different C++ at the time. We are working now on upgrading it to C++ 17. I don't know exactly when it's going to be out, but we try to modernize the rule. And I think we have a very good balance now in the people working on MISRA between people working in the safety critical industry
Starting point is 00:35:34 and people having a very good knowledge of the language to try to make them match and not safety critical and modern C++ are not in opposition. They can work hand in hand. And I think that's what we're trying to do. Yet those rules are not going to be rules applicable in every situation because safety critical code requires extra caution. And some of the rules could be generic.
Starting point is 00:36:00 Some of them are probably going to be specific for safety critical software. So if you're using any of the SOS analysis tools against your codebase, do you have options with what ruleset you want to use? Like, I want to use the MISRA or I want to use the core guidelines or whatever else? While we have some flags on the rules that explain where they come from, we have the possibility to create some ruleset, but now for MISRA, we don't have the full coverage of MISRA C++ 2008 and we didn't think it was interesting to develop this full coverage. So right now we don't have a profile for safety critical code. When we work on the next version of MISRA, that's probably something that's going to happen. And then we have so we have basically one profile which goes on our way, which is the set of rules
Starting point is 00:36:47 that we believe should be applicable in all circumstances. And then we have other rules that could make sense in some contexts that are not enabled by default, but the user can enable them if it makes sense for him. Let's say, for example, I don't think we have it now, but we are discussing about adding a rule about who is const, or or east const or just the same const. You see, it's typically the kind of rule for which there is no right or wrong. Right. There's a couple of things you mentioned about rules that you do have or struggled with that caught my attention, like the deciding when to copy something,
Starting point is 00:37:32 pass by copy or cost by value. So do you actually, in your analysis, do you actually go and look and see what the copy constructor does? You don't just rely on is trivially copyable or something like that? Well, currently, for this specific rule, we don't look at what the copy constructor does. Okay. For some other rule, we do some, we look in what it does. For example, we don't look at what the Compute Constructor does. For some other rule, we look at what it does. For example, we have a series of rules about constness. For example, we have some rules that say, okay, this variable is not modified. It could be const. But to know if it's modified or not, we don't only rely on the prototype of functions.
Starting point is 00:38:04 So even if you pass by non-const reference, if you can see the body of the function and if you can detect that it's not modifying the object, you are still going to raise the violation telling you you should have this. But first you should pass it as a const reference and it should be a const variable. It just caught my attention because I noticed one project I'm working on that
Starting point is 00:38:29 some tag types which are just used for overload resolution, right, like just passed as a parameter to a function just to say choose this version of the overload or the constructor where in some cases accidentally being passed by const reference
Starting point is 00:38:45 instead of by value even though they were just empty types entirely and in a templated type used thousands of times across hundreds of c++ files that actually had a notable impact on the code base to just make that pass by value in that case i think think I'm piloted enough to push pointers onto the stack. Actually for this rule, in the first version, in some cases we said, okay, you're passing by reference, but you could pass it by copy. But we had lots of issues. When we do a rule for that, we test it on, I don't know, maybe 30 or 50 open source projects. We run it on LibreOffice, we run it on Clang, we run it on Linux Kernel, we run it on many big open source projects, and we try to see the results.
Starting point is 00:39:34 And for this rule, there were so many cases when people passed by reference, they could have passed by copy, and we said, okay, there is value in passing by copy as you said but the value is not as great as in the other direction so because if we just where's an issue in these cases it's going to be too noisy so we are not going to do it well maybe you could upgrade the rule to say if it's a completely empty type then always pass it it by night. Maybe that could be an option, yes. The other thing you mentioned of when to alert
Starting point is 00:40:12 on a default capture by reference for a lambda with immediately invoked lambdas or passing them, that's one of the rules from effective modern C++ that just is too strict because I believe in Scott says there to never do a default capture by reference, but it basically eliminates all the useful use case, all the interesting use cases for lambdas, like immediately invoking them or passing them to algorithms.
Starting point is 00:40:38 So I found that interesting that you had to put in like a special rule to check for algorithm usage. Yeah, the first version of the rules about Lambda didn't care about if the Lambda was immediately called or not. And the feedback we looked, we saw on different projects and our own project told us that we really needed to make this important distinction. Yeah.
Starting point is 00:40:57 So do you, when users set ignore, a rule to be ignored on Sonar Cloud, do you keep those statistics and say, hey, there's like 100,000 users that are all ignoring this rule. Maybe we need to take a closer look at it. Well, we are starting to do it. We are starting to get this kind of feedback
Starting point is 00:41:19 from the users. Right now, I think my colleagues who work on other languages are more advanced than we are on this point. Sonar Cloud is more recent than Sonar Cube. And at first, we also had to make sure that by doing that, you are not sharing secrets from the customers. So we had to be careful about what information we collect. So now we do, I know that we do it, I don't know the details and we clearly want to use this information, especially we introduced during this year, lots of effort about rules around security, mostly for other languages, a little bit for C++ also,
Starting point is 00:42:06 but mostly for other languages. And we wanted to know what was the feedback of those. So I know that people got a good look at it, if it was detected as being a false positive, or if it was deactivated, trying to understand what happened. It just feels like that could be so fascinating to look at that. Because on one hand, you want to be like, a thousand people are ignoring this. They clearly can't all be wrong. But then sometimes I'm guessing you're going to look at it and be like, wow, a thousand programmers are dead wrong. I know they made an experiment
Starting point is 00:42:37 for some very safety-sensitive embedded software. They had the requirement that the software had to be developed twice and run twice in two different parallel hardware on the real system and to just to make sure that it was uh rock solid and and those teams that developed the the program uh the contractual obligation not to talk with each other. Yet, in some cases, they discovered that there was the same bug at the same place. Wow. Because both of them were developed by human beings and had almost the same education.
Starting point is 00:43:15 So it's easy to, I mean, some places are naturally error prone. So that's the same bug in the same places in some cases. That's really interesting. Because it's like the Apollo guidance computer or whatever was also written. And so they had the same bug in the same places in some cases. That's really interesting. Because it's like the Apollo guidance computer or whatever was also written. Anyhow, some of the NASA mission stuff was written twice by two different contractors. And then the systems voted to see what was the correct thing to do at that moment. I'm curious.
Starting point is 00:43:44 And the problem is twice. If they disagree, what do you do? So if you should do it three times, that could be a majority. That's a Minority Report plot right there, right? A whole Tom Cruise movie for those who didn't catch that. But anyhow. Gosh, that movie is old now. It is old now. Yes. And I'll just make you feel old to think about how old that movie is old now it is old now yes and i'll just make you feel old to think about how old that movie is you you did mention uh some newer security rules i think you said for for civil sauce what are some of those newer rules well basically we are focusing on uh
Starting point is 00:44:20 what there's several uh like always we we go in several directions and we see what we can do, where we can bring some value to the user. So for security, one of the not so complex to develop, but I think quite interesting rule was one about when people are using in C, when they're using memsets to try to delete some memory so that even if someone gets access to the memory of the process, they will not see a password that could be stored there. But of course, memset could be optimized away. If you never read back that memory. If you never read back the memory, which is what you do usually when you do this kind of stuff.
Starting point is 00:44:56 You just want to scrap it away before doing something else. So you are not going to read it back. So we have a rule that detects that and says you should use the special, it's memset underscore s, I think, a special function that cannot be optimized away, so it does the same work. So we have this rule. We have a bunch of rules around the POSIX functions,
Starting point is 00:45:21 which are, it's mostly C because, you know, it's so much easier to make a buffer overflow in C than in C++. I mean, CharStar, why CharStar? So, there are lots of POSIX functions that work with CharStar, and that takes two arguments, a buffer and a size, and we try to detect if the buffer size is consistent with the size which is passed to the function. And this is clearly one domain into which we have a way to improve again because it's in the category of rules which are hard to do because you have to follow the data across the execution path. And it's never going to be... it's been proven to be impossible to
Starting point is 00:46:10 be perfect in those cases. It would be like solving the halting problem to do that perfectly. So we know we're never going to be perfect, but we try to improve. And also we are going to try also to improve on the way we report the issue to the user. Because we are not totally great on this point right now. But still, we have these rules that try to detect when you are using some POSIX functions and you are not doing it well and you might have buffer overflows.
Starting point is 00:46:36 And then we have also some security rules around some uses of cryptography libraries or hashing libraries that compute safety-critical hash. And so for those roles, it's mostly those libraries have lots of API and some API are safe,
Starting point is 00:46:59 some aren't safe. And we detect if people use the wrong API or didn't. For example, you create a cryptographic context, then you should set some option before encrypting something and you forgot to set those options. So that's the kind of stuff we detect. Wow. Yeah. Some of those sound like really advanced. Like you said, like trying to make sure that the size matches the buffer that was created, you have to do all kinds of code analysis to figure out when this buffer was created,
Starting point is 00:47:30 how big it was, is it still alive, whatever. Yeah, I mean, the day we will have solved this problem perfectly, but it's probably never good to happen. I will retire before it happens. We have some things that give some good results. We have lots of ways to improve. But yes, it's a really hard problem. So where do you see how Sonar Source or Sonar, the tool suite, fits in with things like Address Sanitizer and Clang Tidy and some of the other tools that are available out there right now?
Starting point is 00:48:07 Well, actually, to do this kind of analysis, we rely on the Clang Static Analyzer, which is one of the engines behind the Clang Tidy. Okay. We extended it. We might tune it a little bit differently, but it's the same basis. And if you compare to address sanitizer, it's totally different because we don't work at runtime. And I think one of the biggest advantage
Starting point is 00:48:33 of not running at runtime is that we detect problems even if you don't run into them right now in your test suite. So even if your test suite is not very good and it's not going to trigger these specific conditions, we could detect that in some very corner case situations that could happen on the customer side, but that's not happening in your unit test, you could have a memory allocation issue. So I really think that address sanitizer and what we do are complementary stuff. You should have both on your programs. But you said you use the Clang static analyzer
Starting point is 00:49:15 for the backend. So if I run Sonar Cloud, if I run my project on Sonar Cloud, am I going to get all the Clang tidy rules as well as your rules run? Or are you going to run just your rules? So we don't integrate directly Clang tidy. We are based on Clang, but we don't integrate everything from Clang tidy. So some of the rules we have are inspired from Clang tidy because we just had the same idea, but we didn't try to integrate all of Clang-Tidy. So we designed the rules
Starting point is 00:49:46 ourselves. We looked at every source possible, of course, but we didn't just copy paste Clang-Tidy. Oh sure. No, I mean, I just didn't know how much of it was inherited, not necessarily that you, you know, copy and pasted it in there or whatever. And we also have some rules that are just based on Clang warnings, because some of those warnings could be very useful. And also, OAnalyzer is capable of analyzing code written for Visual Studio, for example. So especially for those users, having some of the warnings from Clang might not be easy. So we also integrated some of them. Okay. Do you want to tell us more about the process of implementing some of these rules?
Starting point is 00:50:24 So you talked a little bit about how it's all based on Clang. Do you want to go us more about the process of implementing some of these rules? So you talked a little bit about how it's all based on Clang. Do you want to go into any more detail? Well, we basically have three strategies to implement a rule. One is there is already a warning in Clang which does the work. So we just reuse it. So it's the easiest case. Unless you want to do something 99% like Clang, but not 100%. So you have to give up this easy process.
Starting point is 00:50:50 Then there is the rules which needs to follow the flow of execution. So typically, knowing that a specific value, a specific variable must have a specific value at a moment in the code. So those ones are based on Clang Static Analyzer. So we developed Static Analyzer checkers using the framework provided by Clang to implement those rules. And then the third type of rule is rules that are based on the structure of the code. So typically rules about rule of five. You don't need to simulate the execution to be able to detect the violation of the code. So typically, rules about rule of five, you don't need to simulate the execution to be able to detect a violation of the rule of five or the rule of zero.
Starting point is 00:51:32 And so those ones are based also on Clang, but not HT matches. So it's the same technologies that is used in Clang-Tidy, just that we developed other rules. If I get a rule violation from one of your tools, how much information am I going to get now? What I'm specifically thinking of, as you said, a lot of the rules are inspired by the core guidelines.
Starting point is 00:51:54 I'm just curious, like if I get a rule violation, am I going to get documentation that says this is a bad idea because such and such, and you might want to read this core guideline, for example? Yeah, basically every rule we have a documentation where we try to explain why we have the rule we also try to explain what pattern we detect which might not be exactly the same as why we have the rule because we could have a why which is generic and inside of this generic purpose we detect only a few specific patterns then we almost always have an example of bad code, an example of how
Starting point is 00:52:26 to rewrite the code so that it's better. And if the rule is inspired by some external sources like CppCodeGroundline or Misra on search, we are referencing them. And in the rule itself, say we try to have what we call secondary location. So the rule is located at one place in the code, but we also try to reference other places of the code that are interesting to understand why the rule is triggered. So for example, if you let me try to find a real world example, I'm not 100% sure, but for example, for the rule about the fact that you could pass an argument by const reference, maybe we are going to highlight where the argument is going to be used
Starting point is 00:53:07 so that you can easily check that you really want to pass it by cause reference. It's not that we could have made a mistake in the rule. It's more like maybe you didn't pass it by cause reference because right now you're not modifying it, but in your mind you plan to modify it in the future or something like that. So it's always better to have information about more context when you read the rule
Starting point is 00:53:30 to really understand if it's something that you want to act on or not. Okay. Well, Loic, it's been great having you on the show today. Is there anything else you want to talk about before we let you go? Obviously, listeners can go and check everything out at sonarsource.com,
Starting point is 00:53:46 right? Yes. No, I don't have any specific ideas right now. Okay. Where can listeners find you online? Sorry? Where can listeners find you online? Are you on Twitter, blog, or anything like that? Well, I'm old school. I'm not on
Starting point is 00:54:02 Twitter. I have a few blog articles into the blog of Sonar Source. I'm mostly joinable by mail. That's the main way to do it. I mean, that's the way it worked in the 90s. Why shouldn't it work again today? All right. Well, we'll be sure to include those links to your blog and everything on the show notes.
Starting point is 00:54:26 Yeah. And if you speak French, you might also follow a few. I quite often participate to French meetups or to the CPPP conference that started in 2019 that was canceled in 2020 for some reason. And we plan to start it again in 2021 and I was a speaker there but it was also in French
Starting point is 00:54:52 so I like to speak in French because it allows me to reach a different kind of people than when I speak in English sure, alright well it's great having you on the show today Loic thanks for coming on thanks so much for listening in as we chat about C++ Sure. All right. Well, it's great having you on the show today, Loic. Thank you. Thanks for coming on.
Starting point is 00:55:09 Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.