CppCast - Hippomocks and cpp-dependencies

Starting point is 00:00:00 This episode of CppCast is sponsored by JFrog, the universal artifact repository including C++ binaries thanks to the integration of Conan, C, and C++ Package Manager. Start today at jfrog.com and conan routine ones. JetBrains is offering a 25% discount for an individual license on the C++ tool of your choice. CLion, ReSharper, C++, or AppCode. Use the coupon code JETBRAINS for CppCast during checkout at jetbrains.com. CppCast is also sponsored by Pacific++, the first major C++ conference in the Pacific region, providing great talks and opportunities for networking. Get your ticket now during early bird registration until June 1st. Episode 98 of CPP Cast with guest Peter Bindles, recorded April 19th, 2017. In this episode, we discussed reflection and new features in the sea lion. Then we talked to Peter Bindles from TomTom.

Starting point is 00:01:19 Peter talks to us about his hippomox library and the cpp dependencies analyzer so Welcome to episode 98 of CppCast, the only podcast for C++ developers by C++ developers. I'm your host, Rob Irving. Joe, I'm your co-host, Jason Turner. Jason, how are you doing today? I'm doing good, Rob. I'm operating from a new laptop today. Here's hoping it doesn't mess up the podcast. That's both of us. We're both operating on new laptops. So yeah, fingers crossed for us.

Starting point is 00:02:10 And just a warning to our listeners, I think both of us are dealing with some allergies right now. I don't know how bad it is in Colorado, but here in North Carolina in April, everything is green. Like my car is normally blue, but right now it looks teal because it's covered in green pollen. It's so disgusting. We've got, I think they might be like some sort of choke cherry tree or something that has these little white blossoms all through our neighborhood. And there's enough of the flower petals around that I actually saw kids out

Starting point is 00:02:41 with snow shovels. I mean, now granted they only had maybe a few cubic feet worth of them, but it was still enough. Yeah, nature can be pretty gross sometimes. Anyway, at the top of our episode, I'd like to read a piece of feedback. This week we got a tweet from Jonathan Bokara, and we've talked about several of his blog posts recently. And

Starting point is 00:03:06 he said to us, CppCast is awesome. And your feedback on episode 88, let me publish two more articles, which he linked to. And that was our episode with STL, of course. And I think we talked about his blog post where he was actually creating, I think he called it like an STL learning resource, right? Right. Yeah, that sounds right. Yeah. So he, based on feedback from us and probably most importantly from STL, he wrote a couple of new articles.

Starting point is 00:03:36 One is the design of the STL, and the other is inserting several elements into an STL container. I haven't had a chance to look at either of those articles in depth quite yet, but I'm sure they're quite good because he's been making some pretty good blog posts. I glanced at them, and if he's listening, I'll give another little comment here. I would like to see an article that covers move iterators also. That'd be a good one. Well, as always, we appreciate the feedback,

Starting point is 00:04:03 and Jonathan will be sending you the JetBrains raffle giveaway. And we'd love to hear your thoughts about the show as well. You can always reach out to us on Facebook, Twitter, or email us at feedbackatcpcast.com. And don't forget to leave us a review on iTunes. So joining us today is Peter Bindles. Peter is a C++ software engineer who prides himself on writing code that is easy to use, easy to work with, and well readable to anybody familiar with the language. He's worked for a contractor for a few years and then made the switch to work at TomTom, where he's been working on various parts of the software chain, last of which was a major cleanup in the navigation

Starting point is 00:04:38 code base. In doing so, he developed a tool to determine, check, and improve dependencies between components, which allows quicker structural insight into complicated systems. He also created Hippomox in 2008 and one of the first full-fledged C++ mocking frameworks that is still a relevant choice today. He has given two talks at Meeting C++ 2016 and will be giving his third talk on mocking in C++ at CPP Now 2017. Peter, welcome to the show. Hi. So I'm going to ask a question I haven't asked in a while now. How did you get started with C++?

Starting point is 00:05:13 That's a funny story. I was playing a computer game called Crusader No Remorse back in 1995, I think, or 1996. And as I was playing it, I ended up with an overheating computer. It was one of the first ones that needed a cooler on the CPU and it had physically fallen off because the mechanisms to attach it weren't as good back then as they are now. So as it was overheating,

Starting point is 00:05:36 the game gave an assert and it said there was an error in this and this line of C++ code. So I was looking at that and thinking I can program in BASIC and it's okay, but this is a full-fledged game running on my system. It's running 100 times as fast as the BASIC code I'm doing, and this is written in C++.

Starting point is 00:05:54 Therefore, I will be learning C++ now. As a 12-year-old. It's not the best language to learn as you're 12 years old. So you're saying as one of the first computers with a cooling fan, I'm thinking this had to be like early Pentium, late 486 era? That was medium Pentium. It's Cyrix M2.

Starting point is 00:06:14 So around the Pentium MMX era. Okay. So how did it work? I mean, did you end up learning enough C++ to solve the problem? Did you even have access to that source code? Did JVN have access to that source code? I didn't have any access to the source code. It was apparently a debug build they put on the CD-ROM.

Starting point is 00:06:31 I managed to open my case because I noticed that the amount of time I got to play kept shrinking by a few minutes every time until I let it alone for a few hours, and then it went back up to five minutes. So I unscrewed the case and figured out that it was actually just lying on the floor. Okay. So you started with a software problem, but ended up solving the hardware problem. Exactly. And the software took much, much longer to actually learn to do well.

Starting point is 00:06:58 That's fun. That's an interesting story, though, for getting you motivated for C++. Yeah. That's pretty cool. Okay, well, Peter, we got a couple articles to discuss. Feel free to jump in and comment on any of these, interesting story though for uh getting you motivated for c++ that's pretty cool okay well peter we got a couple articles to discuss uh feel free to jump in and comment on any of these and then we'll start talking to you about uh some of your talks and the work you're doing at tomtom okay okay okay so the first one is we have yet another keynote announcement from c++ now which

Starting point is 00:07:21 peter and jason you will both be attending soon. And in the same theme where they had other talks and keynotes from the D and Rust community, this third one is going to be from Haskell. Yeah. Well, what do you guys think? This is a pretty unusual conference at this point. Yeah. What do you think, Peter? Are you looking forward to this? I have to say that I did kind of expect a Haskell announcement,

Starting point is 00:07:46 given that there was an announcement about D and Rust. It kind of fits the theme. It looks close to C++ as it is now, because it's one of the core functional languages. And if you look at the talks given by Phil Nash, for example, there's a lot of functional in C++ now, and it's getting more and more. So I was kind of expecting this one.

Starting point is 00:08:07 Yeah, that's a good point. It makes sense to bring in a functional language. I was expecting Swift or something would be the third one, because I wasn't thinking quite along those lines. I'm sure there are a lot of other languages that maybe they considered, but SQL definitely does seem like a good fit. I'm really interested to see what you guys think of these keynotes after C++ now and whether they're all received and generate some good discussions. Definitely. I'm looking forward to it. I think it'll be a good fit for this conference.

Starting point is 00:08:37 I do think it's an interesting choice to have different languages present at a C++ conference because usually it's a case of very much introspection as in looking at other C++ developers among each other. And in this case, you get an outside view to join that, which should be good for the language. Yeah, I think so too. Okay, next we got an update

Starting point is 00:08:58 from CLion. This is their first major release of the new year, CLion 2017.1. And they got a pretty big feature list in here. Extended support for C++14, their first bit of support for C++17, which is nested

Starting point is 00:09:14 namespaces, support for precompiled headers, disassembly view, they added catch as a unit test framework, which makes sense since Phil Nash is over there. And they also added experimental support for the Visual C++ compiler, which I thought was interesting. I'm looking over this list myself, and I see extended support for C++14.

Starting point is 00:09:35 In brief, all except constexpr. And pretty much constexpr is the only programming I'm doing right now. Getting ready for my C++ Now talk. And when they say they don't support constexpr, I guess that just means they're not going to give you any code generation help when you're writing something with constexpr because they're just not going to recognize it. But you can still compile because you're still using GCC or Clang,

Starting point is 00:10:05 which is capable of constexpr code, right? Right, yes. It might be a code highlighting thing, where, like the problems you had in Visual Studio around the 2011-2012 timeframe, where if you wrote correct code, you got three different responses from the IntelliSense, from the code highlighter, and from the compiler.

Starting point is 00:10:24 That sounds awful. Right, because the IntelliSense compiler is not code highlighter, and from the compiler. That's not helpful. Right, because the IntelliSense compiler is not the same as the MSVC compiler. They used to have three complete frontends. They reduced it to two now, so it's better. But it might be the same thing for CLion because I think they have a Java parser, so they might be expecting...

Starting point is 00:10:41 They might be checking the code themselves. And in that case, I would have also postponed constexpr to the last because in C++ 17, you're getting if constexpr, which is a completely different way of parsing again. Right. Do you use an ID in UCLine yourself, Peter? I have tried it, but I found it a bit too slow. And I know this is an unfair statement to make without quantifying it.

Starting point is 00:11:07 I tried to use it on a really big project. Okay. And in that case, it was kind of slow. I'm still working with FilmNash to find out why it's slow there, because it's supposed to be fairly fast. Is that one of the earlier builds, or something more recent? It was a relatively recent one, as in nine months ago, I think. Yeah, it wasn't that long ago

Starting point is 00:11:29 that they made some improvements on the speed, but I think it was within the last year. I'm losing track of that. I noticed one of the things in here, a comment from someone, actually, not in the list of coverage, was from Olaf, who says, you forgot to bang your drum for the feature

Starting point is 00:11:47 that in my opinion is by far the best improvement to the zero latency typing. And I don't use IDEs enough to really know what that's in reference to. But it makes me wonder if that's one of the reasons why I've never really liked using IDEs. Yeah, I'm not really sure what that's in reference to either. But he also mentions in the same comment, which I did notice, he's asking, is there an ETA for when the Vim plugin will be released using this zero latency API? So I guess the Vim plugin is not using zero latency typing currently? I guess. We're all talking about stuff that we don't have direct experience with at the moment I guess I've definitely never noticed a problem where I'm typing in an IDE and I feel like there's a noticeable latency like that's just never something that's occurred to me as an issue

Starting point is 00:12:34 personally I feel like when the autocomplete is constantly popping stuff up when I'm typing even if it's not actually slowing me down, I feel like it's slowing me down because my brain is wondering, is it going to autocomplete something I don't want to autocomplete? Or what? I don't know. Maybe it's just a mental issue for me. I'm in the same boat with you for that mental issue. I try to turn off any autocompletes

Starting point is 00:13:00 or adding things on when I'm typing as much as I can. So I'm not alone when it comes to that. You're not the only one. I feel a little better. Especially if you have an IDE that tries to help you in adding more brackets, for example. Oh, I hate that. And then doesn't delete them when you close them. And then I'm just typing my full sentence, look up at the screen, and there's five different

Starting point is 00:13:23 parentheses behind my line. Yeah. Your idea is not actually helping out. It's just slowing you down. And I ended up at some point in some, I believe, some misconfiguration of Visual Studio, where when I would put the opening brace, and then I'd put the close one, and it would be smart enough to delete the one that it automatically added. It would be indented incorrectly, the one that it automatically added, it would be indented incorrectly than the one that was left over. And I'm like, okay, now I just have to go back in and delete a couple of spaces to re-indent

Starting point is 00:13:52 it to where I want it to be. That was, yeah. I do sometimes get frustrated with Visual Studio when it automatically adds quotes for strings. That bothers me sometimes. sometimes oh that's interesting yeah okay anyway um next up we have another blog article from jackie k and i think this is actually only her second one uh the first one generating a lot of controversy uh this one i don't think should be as controversial but it was a very long and thorough, well-researched article, An Introduction to Reflection in C++, where she goes on about kind of the use case for why we need reflection in C++ and what you can currently do with some of the well-known C++ reflection libraries. And I thought that was pretty interesting.

Starting point is 00:14:45 And she kind of did a deep dive into how these different reflection libraries work. Yeah, it's a very well-written article. Yeah. And she did hint that there will be a part two of this article where she'll actually go over what types of things you can do with C++ reflection. This one being just an overview of how reflection works. Now, considering the Hippomux library that you've been working on, Peter, you probably have some insight into this world of reflection.

Starting point is 00:15:15 Yes, but right now SG7... Is it SG7? It is SG7. They are currently looking into doing all reflection except for functions, which means you can make libraries to serialize, deserialize. You can introspect into classes to find out what members they have and metaprogram with that. But you cannot actually look at a function

Starting point is 00:15:35 and use that to create new functionality or create a new implementation of an interface. That's the current proposal going through the standards committee, you're saying? It is. To not have method reflection? It is without method reflection. I think that is to try to keep the size of it down so you don't get a new concept, for example, which has been in the standards committee for 12 years by now. So I think it's a good choice in them in not adding it.

Starting point is 00:16:02 But on the other hand, for me, the biggest thing would be adding functionality that allows you to create a new instance of an interface or inherit an interface so that you can create proxies, you can create logging wrappers, you can do aspect-oriented programming with that. Yeah, this is the kind of thing that I need for ChaiScript. I need to not only be able to walk over what members a class has, but ideally be able to generate a new class that implements virtual members, for example. That's exactly the thing that I would like to do. Right. And just thinking,

Starting point is 00:16:37 for example, I'm thinking out loud, if you take Chris Juziak's DI library, so dependency injection, it allows you to create an implementation of an interface. You can register which one you want. If you had method reflection, you could take that type and then create a new implementation that's a decorator around the same type and put in the actual implementation and create

Starting point is 00:16:58 logging around every function call automatically. And then we'd be doing it at compile time instead of having to do weird runtime hooks like other languages that allow that kind of thing. And then you do that at compile time and you can still make it a runtime switch

Starting point is 00:17:13 because you can insert the decorator. Either you can or you cannot insert the decorator. Then if you don't insert it, you don't get any overhead from it. Wow. Well, hopefully we get something that lets us do all these fun, crazy things soon. Hopefully.

Starting point is 00:17:28 I am really going to be attending Jackie Kay's talk at C++ Now this year, because I think she's talking about reflection there. Is she? I didn't even... Yeah. I wonder if it conflicts with anything else I have going. Okay, well,

Starting point is 00:17:43 Peter, it says here you just got back from Revision 2017. What was that conference? That's actually not a conference. That's a demo party. Do you know what a demo party is? I don't think I do. So if you look back 20,

Starting point is 00:18:00 30 years, you get to the time when the Commodore 64 and the Commodore Amiga were really big computers. And at that time, people were still kind of exchanging software directly, because you could either buy it at high prices if you knew where to find it, or you could find it at people coming together and just

Starting point is 00:18:15 sharing software. And the people tried to share all the software they had, but sometimes people tried to add a copy protection, which is of course their good right, making it very hard to do that. So groups popped up that cracked a copy protection, which is, of course, their good right, making it very hard to do that. So groups popped up that cracked the copy protection and then released it. And in doing so, wanted to tag it with their own, sorry, call it an intro that says, this is our group and we did this. And over time, the second part of it became more important.

Starting point is 00:18:41 And the first part kind of dropped out because, well, copying software is not really nice to the software vendors. So they kept making new intros and full-scale demos, which are essentially a long version of an intro, which is a graphics performance with audio that looks really awesome. And they do that in a very, very small amount of size. So if you think about a bit of software, I think the latest Facebook on your Android phone would be 270 megabytes.

Starting point is 00:19:08 A demo would typically be 64 kilobytes. And a small demo would be only 4 kilobytes. And it would still be 3 minutes of full-screen performance with audio. That's really impressive. And that's still the targets they're reaching for, is these 4 and 64K demos, even on Windows systems or today?

Starting point is 00:19:27 They have a Windows system with a 1080 Ti, so a really high-end Windows system. And they play demos of 64 kilobytes on that. And you wouldn't be able to tell that it's only 64 kilobytes. So, I mean, they're still able to use system libraries, I assume, like DirectX and such. They do import DirectX and such. They do import DirectX or OpenGL,

Starting point is 00:19:50 but they do the importing in a second way so that you don't have the length of the name of the function you're calling. So they do everything they can to get the size down and then cram in as much as possible to get the longest demo with the best music you can within that size limit. Okay, so I was getting ready to say, you know, is this kind of cheating compared to what you had to do on a Commodore 64, but they still are really having to do a lot to cram it all in.

Starting point is 00:20:12 Well, the thing is, the more you cram in, the more stuff you can add. And if you can cram that in just a little bit tighter, you can add in just two more minutes of demo, or you can make the models just a little bit better, or you can add an extra soundtrack. So what was the most impressive stuff that you saw at Revision this year? There was an 8K demo, so that's 8 kilobytes, about the size of an average email, and that did voice synthesis.

Starting point is 00:20:37 Wow. As part of a demo. There was also a 4K demo that had essentially a replica of Star Wars. You know the final scene in Episode VI? Yeah. Where they fly over the Death Star, launch a rocket into it, and so on. Oh, okay, yeah. He basically recreated that, including an exploding star and five fighters and so on, within 4K.

Starting point is 00:20:59 That's amazing. It is. So do you do any demo coding yourself? I would love to answer yes, but the answer is actually no. Okay. So what do you do as an observer at Revision? I mean, are these just presented or are people kind of working on this live? I'm just trying to get a sense of what happens here.

Starting point is 00:21:20 Well, it's a really big room. There's a lot of people there who bring their laptops, desktops, their ancient Commodore Amigas, because there are still Amiga demos being made, even this year. And you go around, you talk to people, you find people who are writing demos, and sometimes you can join them to help them out a bit. Wow.

Starting point is 00:21:39 And there's also a bunch of creative people. For example, there's a compo that's about making executable music. So there's just making the music from an executable, or somebody making a drawing. That's stuff I've never really played with. I do, you know, follow the Commodore 64 scene enough to know that they're still releasing Commodore 64 demos also. And recently, even, it's like they're still discovering new video modes that they can coerce the hardware into generating and it's hardware that you would have thought would have been fully understood 35 years ago. It's amazing. Yeah, some of the things that they're

Starting point is 00:22:17 doing is truly crazy. For example, they have figured out that you can switch the amount of pixels per line in the middle of a frame. So it means you can make the top half of the frame a lesser resolution and the bottom half a bigger resolution and then display something at really high resolution in the bottom part. And they're doing that on these Windows demos? Or on the older hardware?

Starting point is 00:22:37 No, that's all on the older hardware. On the new machines you don't get that kind of access to hardware anymore. Yeah, I didn't think so. I was really wanting to know more about that if that's what I had understood. But on the new hardware, you don't really need to because you can just make a 4K resolution demo. Right.

Starting point is 00:22:55 That's amazing. Okay, well, you are going to be attending C++ now. It looks like you're going to be giving a talk called Mocking C++. What are you going to be covering? That's going to be actually, I think, at the same time as Jason's talk. Yes, one of them. I'm sorry, Jason, I will not

Starting point is 00:23:12 be attending your talk on that. Yeah, I won't be attending yours either. I was planning to. That's too bad. The talk will be about the same subject I gave a lightning talk about at Meeting C++ last year, which is basically,

Starting point is 00:23:28 given that you know what marketing libraries do, and given that you know that C++ basically doesn't allow it right now, because you need the reflection proposal that we just discussed is not going to happen, how do you do it anyway? Okay. So that will be going into the low-level bits

Starting point is 00:23:44 of how does linking work, how do functions work, how do member function pointers look like? How do you actually go into that level of detail, get out the information you need, and then use that to create a class that looks and acts like an actual class without making an actual class? So, I mean, this is a topic we've discussed a little bit on our podcast before, but just give us maybe some teaser, high level, like what this looks like and why people are going to want to come and hear your talk at C++ now. So at a high level, you have the basic idea of a mocking framework, which is to be able to indicate I have an instance of this class, and I would like it to behave like that. As in, for this test, it needs to return false or true or throw an exception, that kind of behavior. Okay. And regularly in C++, you would create a new subclass, implement all the methods,

Starting point is 00:24:38 add a lot of functionality to it yourself, and essentially that's a lot of don't-repeat-yourself being violated. Because you have an interface specification, then you have a mock class specification, which is an exact copy of it. Then you do a lot of calls on it saying, I expect this function to be called and then it should return false. And that's again a duplication of the same function name

Starting point is 00:24:58 with the same arguments. So the mock class doesn't actually add anything there. It's just a bit of busy work you have to do and for frameworks that try to stay completely within the C++ language boundaries there is no way around it so that's for Trompe l'oeil from Bjorn Fowler and for your Google mock for example and I am going to show you that you can definitely do without

Starting point is 00:25:22 and that has a bunch of very, very interesting advantages. For example, if you delete a mock object and then call a function on it, so on a dangling pointer, I can just throw you an exception and say, hey, you did that. You call a function on the zombie mock, and now it returns. Okay, how do you do that? How would you hook into a deleted pointer? By making it not deleted.

Starting point is 00:25:48 Okay. One of the interesting things that I found out while doing this is that when you have a destructor in your Vtable, there are actually, in GCC and Clang, there are two entries, and both of those also do the delete. Okay. Which essentially means that if you hook them and hook in a different function, then you are also supposed to do the delete. Okay. Which essentially means that if you hook them and hook in a different function,

Starting point is 00:26:07 then you are also supposed to do the deleting. Which means that if you don't actually do the deleting, you're fine, you still have an object. Okay. So it means that somebody can do delete object X, and then you still have the object there, and you can still hook in functions. That's interesting, but you can still call the object there, and you can still hook in functions.

Starting point is 00:26:26 That's interesting, but you can still call the destructor and do the cleanup, you just don't free the memory. The user tries to call a destructor, and he does that by saying delete pointer to x. Right. And that invokes a function on your object

Starting point is 00:26:41 that should be doing the deletion and the destructing. Okay. And in this case, you hook in a mock function that does neither of those it basically just says this is now a zombie mock check marks placed and we're done okay and then you return now hypothetically maybe i'm getting a little too far into the weeds but um for the sake of my test, I needed that destructor to actually... I needed the body of the destructor to get called. Is there any way to still have your mock execute the body of the destructor without freeing

Starting point is 00:27:14 and do this zombie functionality that you're talking about? Practically, that would be possible, except that in the case of the mock object, it's actually not the class that it claims to be. Right. Which means that it never called your constructor, it's actually not the class that it claims to be. Right. Which means that it never called your constructor. It doesn't have your members initialized. So running the constructor wouldn't be a logical operation.

Starting point is 00:27:32 Okay. It is a total mock object replacing the entire object you have, including base classes. Okay. Yeah, I would like to attend your talk. Okay. Before we get too deep in the weeds, so we're talking about your hot mocking library, Hippomox, now, and you briefly mentioned Trumplay and Google Mock. What do you think sets Hippomox apart from some of those other libraries?

Starting point is 00:27:54 Well, the biggest thing is that you don't have to define any mock objects. You can just use a class as a mock without defining anything in between. So the moment that you type the semicolon after your interface definition, you can start a test and use it as if you had implemented it without having any implementation of it at all. That's fascinating. I would really like... That's a pretty big difference.

Starting point is 00:28:17 I wish that we could show some sample code right now, actually. Yeah, that would be handy, but I don't think a podcast lends itself well to showing samples. It does not. And I'm afraid that reading it out loud is going to be confusing. Yes, but it's a great hook for people to pay attention and look forward to your talk.

Starting point is 00:28:36 And I'll just give one more hint. There's also the ability to hook a free function. Okay. So you can make a test that some object calls assert, and then check that it calls assert, and then say, well, that's fine.

Starting point is 00:28:50 The test has now succeeded. And the same goes for abort and exit. Huh, interesting. Now, that's something I've played with a little bit myself, actually trying to hook free functions and did some research on that. So that one I have slightly more uh knowledge about but certainly not the replacing the entire class part replacing the class part is actually easier

Starting point is 00:29:12 interesting because replacing free functions comes with a few corner cases that aren't exactly the case that you want and it means that it doesn't always 100. So you do need to be aware of the corner cases there in order to avoid them. Right. Okay. Okay. So we mentioned that you're working at TomTom, and you're also working with them on an open-source tool called CPP Dependencies. Do you want to tell us a little bit about that? Yes.

Starting point is 00:29:41 So around three and a half years ago, I joined the navigation team, and they have a fairly big navigation code base. I think the total is at least 1.5 million lines of self-written code. And that adds on to a whole lot of third-party libraries that are used to not do everything yourself. But even then, you have one and a half million lines of code. So I did a bit of mental math in figuring out how does anybody understand all of this and why can't I figure out how everything hooks together? And I figured out that if you actually just read the code one line at a time and tried to finish, you would be busy for more than a year. So given that the average developer would be working on it for two years

Starting point is 00:30:21 and people don't actually read one line a second continually, it's kind of unrealistic to expect anybody to know the code base and that makes it fundamentally different from a small project just putting it a contraposition where you can basically expect somebody to go read the code base fully and then come back when you understand it okay so as doing as we were doing that i figured out well, most of these dependencies are not as people have written down. So there's a bunch of definitions we use CMake. So there's a target link library statement, which says this library depends on these other libraries. And we found out that actually a bunch of those aren't there anymore. There's no link there.

Starting point is 00:31:01 It doesn't use any of those headers. But there are a few that it does use, and it doesn't mention. Hmm. Okay. So that's confusing, as in there's a bit of lacking maintenance there. So, suppose that we try to fix that, and then I started adding

Starting point is 00:31:18 the dependencies that I knew were supposed to be there, and the build failed. Because if you add all the dependencies that should be there, without filtering out the ones that shouldn't be there, you actually create a giant circular dependency including everything. And CMake by default

Starting point is 00:31:32 only repeats everything twice. So that means your link breaks. You can say repeat it 20 times and then the link works again, but it takes forever. And given the size of project, adding a couple of tens of minutes to the build time was not something people were happy with. So we basically said, well, let's figure out if we

Starting point is 00:31:50 can just figure out how the dependencies should be, and then make it that. Because when you look at source code, you think I have an include to standard and I have an include to my interface, I should be able to figure out where this comes from. And humans are pretty good at that. I mean, if you look at an include statement, you don't have to look up all the files in the libraries to figure out which one it goes to. You already know which one it's supposed to be. So I figured,

Starting point is 00:32:16 well, let's just write a tool that tries to do that, and then do it automatically. See how well it works. Just the first time and see what the results are. And the first one that see what the results are. And the first one that I wrote was in a shell script and it took about two hours to run on the codebase.

Starting point is 00:32:31 So that's a terrible user interface. But it did result in getting dependency information about everything and everything we checked was right. So we figured, well, this is good. Let's develop it a bit further. And long story short, we now have it running in Let's develop it a bit further. And long story short, we now have it running in two seconds on the same code base.

Starting point is 00:32:48 Wow. And we can extract drafts from that. We can extract information from it. And we can't watch everything that's in the code base. So we are actively preventing new cycles from being introduced. So you said the original version was a shell script. What's the final version look like? The final version was written in, great surprise, C++.

Starting point is 00:33:14 And it's not actually that big. I think the total is around 2,000 lines of code. Oh, wow. And with permission from my managers, I was able to open source it. So anybody can download it and run it on your own code base. So are you using libclang or anything like that to help you with this, or is it all hand-rolled? It's exactly the opposite, actually. The thing with parsing C++ is that it's a really heavy duty task to do, and it depends on all the headers you included before that point. Right.

Starting point is 00:33:42 Which means that if you include a header, you can't always use a pre-compiled header for that one, because it might be different due to other things you included before that point. Right. Which means that if you include a header, you can't always use a precompiled header for that one because it might be different due to other things you included before that. Right. Okay. Which means that if you're actually trying to understand what the headers do, you have to use something like libclang, do a full parse of everything,

Starting point is 00:33:57 have, say, a tenth of a second per file, times 70,000 files is, well, wait for an hour or so. So that's an unrealistic approach if you want to do this, but the only,000 files is, well, wait for an hour or so. So that's an unrealistic approach if you want to do this, but the only information I need is, given this header file, what does it include? And that's a really simple parser to write. Right. You're just looking for...

Starting point is 00:34:16 Yeah, go ahead. At first I just looked for lines and parsed per line, but now I've actually implemented a per-character parser, which is more accurate and a little bit faster still. But that's even still only about 100 lines of code. Wow. So what's the end result of actually applying this to your code base?

Starting point is 00:34:34 I know you said it's keeping your dependencies clean, but have you noticed an increase in compile time, or excuse me, a decrease in compile time, or anything tangible like that? Well, the biggest thing it does is introspection. It allows you to see what's in your code base, what dependencies you actually currently have, and essentially get some numbers from that. The biggest direct result you can get from that is which file is the biggest impact on my compile time and which files do we have that

Starting point is 00:35:00 do nothing at all. For example, if you have a component that nobody uses, it will just tell you, hey, there's a component here and nobody's linking to that. You probably just want to delete that. There's a bunch of headers here that you have in your project. And I've tried running that on multiple projects. I think every project I tried had at least 10. 10 headers that were not used.

Starting point is 00:35:20 So 10 headers that were not used by anybody at all. So that's from the smallest hobby project from somebody to the biggest LOVM I ran it on and there's a bunch of headers there that nobody uses that's funny and the thing is if you delete everything, recompile it it works they actually are not used

Starting point is 00:35:37 right so does it work well with header only libraries also? it will work completely with any kind of structure of library you have. It doesn't actually look at headers or source code files to see which one is which. It looks just at whether somebody from outside your project includes it to see if it's a header file. Okay. If somebody has an include statement pointing to it, then it must be a header file, even if it's called.cpp.

Starting point is 00:36:03 Right. Yes, I had this conversation actually at my meetup the other day that the C++ compiler doesn't care if it's what the file is called, for the most part. It doesn't even check whether the thing you're compiling is actually meant to be compiled. It just tries. But typically, if you compile a header file, there's no actual thing being instantiated,

Starting point is 00:36:22 so the output file will be empty. Right. Right. Right. Okay. But that does kind of leave you with the interesting result that you don't have headers and source files. You have headers, you have source files, you have files that are both, and you have files that are neither. So then,

Starting point is 00:36:38 yeah, how do you produce that into an output file? Right. I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors. JFrog is the leading DevOps solution provider that gives engineers the freedom of choice. Manage and securely host your packages for any programming language with Artifactory.

Starting point is 00:36:56 With highly available registries based on on-prem or in the cloud and integrations with all major build and continuous integration tools, Artifactory provides the only universal, automated end-to-end solution from development to production. Artifactory now provides full support

Starting point is 00:37:11 for Conan C and C++ Package Manager, the free open source tool for developers that works in any OS or platform and integrates with all build systems, providing the best support for binary package creation and reuse across platforms. Validated in the field to provide a flexible, production-ready solution, and with a growing community,

Starting point is 00:37:29 Conan is the leading multi-platform C++ package manager. Together, JFrog and Conan provide an effective, modern, and powerful solution for DevOps for the C++ ecosystem. If you want to learn more, join us at the JFrog User Conference SwampUp 2017, running the 24th to 26th of May in Napa Valley, California, or visit Conan.io. So if you want to run CPP dependencies against your own code base, how easy is it to do that? Do you just download from GitHub and run it, or do you need to kind of configure your repository to work with it? Well, right now it uses the CMake files that should be present in a project to figure out where components are,

Starting point is 00:38:09 but it only looks at it to figure out where they start, because essentially the only assumption it does is if it's a source file, it must be compiled, and any grouping of source files that I can find will be a project or a component. So if you're not using CMake and you run it, it will tell you I have one component and it's your entire thing, everything.

Starting point is 00:38:30 Okay. Which is kind of pointless. You can ask it to do the opposite, which is to assume that any folder that contains a compileable file will be a component. And that works if you put your headers and source in the same folder, but it typically gives you too many results

Starting point is 00:38:45 because it takes your well-structured big component that is actually in four folders and makes it into five components that are then, of course, cyclically dependent. Oh, okay. So both of the extremes are not the most easy to use, but if you use CMake, it will typically find you the right projects. Okay. So what is the... You said that you use this to ensure that you're not introducing cyclical dependencies

Starting point is 00:39:13 or new dependencies in your code base at TomTom. How do you actually apply that? Do you do this on check-in with a Git hook or something like this? We're not using Git yet. Okay. So it It's a git hook or something like this? We're not using git yet, so it's not a git hook per se, but we do do this in the continuous integration system. So in every commit

Starting point is 00:39:34 that somebody tries to make to the mainline, we run the tool first to see how many dependencies it generates, and if it's more than the amount of cyclic dependencies that we had before, so just to count, then we refuse to build it. Oh. Well, you're very serious about it then.

Starting point is 00:39:49 It's not just an... Yeah, we do try to actually get it down. And we've managed to get it down from a total of 120-ish to around, I think we're around 20 now. Wow. That's total components that are in any way cyclically dependent. That's pretty cool. Yeah. So you want to tell us a little bit more about working at TomTom?

Starting point is 00:40:10 You have a pretty large code base. Are there any other special considerations you have while working on such a large code base? You said a million and a half lines code, right? Yep. Well, it's like many other large code bases, actually. If you work at Microsoft on Windows or on Visual Studio on Chromium, if you're at Google, those codebases are, as far as I know, even bigger. And they will have the same problems, the same situations that you have in our codebase,

Starting point is 00:40:38 which is it's really big. The people that originally set it up did a good job, according to the 1995 thing of doing a good job, which may be different from now. And you have so much code and so many things that have to keep working that that's going to be the main determinant in how fast you can develop anything. So what do your compile times currently look like, if you can share that? I'm not sure if I can share that. Okay. look like if you can share that i'm not sure if i can share that okay um do you have a large monolithic code base or do you have it split up among several repositories uh we have it split up among several repositories now because the new code that's being developed has been

Starting point is 00:41:17 avoided making any new cycles by just putting it in a separate repository you can't include a header that you can't see. And that's a pretty good approach. But we are trying to take the old code base and get it into a state where we can also split it up and divide it into many separate components that build separately. And that has many

Starting point is 00:41:37 advantages. For example, your compile time drops by a lot because you're not compiling most of the code anymore. You cannot make any new cyclic dependencies because you can't make one. And you can deploy parts of your software knowing that it will not depend on anything else. But it also has

Starting point is 00:41:53 the obvious downsides of that, which is that if something breaks in a downstream dependency, it will affect you upstream. So if somebody makes a breaking change on a component and then distributes that as the new version, then your build system will avoid any dependency on that, any user of it from upgrading

Starting point is 00:42:09 until they have also fixed all their dependencies. And if you have that three layers deep, by the time your new version of the bottom component gets propagated all the way to the top, that could be years. Yeah. I think it's a great conversation to have because we've discussed in the past, like, Google having this giant monolithic codebase with tens of millions...

Starting point is 00:42:32 Hundreds of millions of lines of code. Hundreds of millions of lines of code in one codebase. And I've wondered about this kind of thing that you're discussing. So it seems like you're strongly on the side of saying, split up your codebase and don't have a bunch of spaghetti dependencies. If possible, make your codebase such that any dependency has an obvious up or down direction. So you should never have a cross-direction or an up-pointing link. Okay. And that does

Starting point is 00:42:57 heavily point towards being able to put things into separate repositories, but I would advocate not doing that. Not doing separate repositories, but I would advocate not doing that. Okay. Not doing separate repositories? Yes. Because you get the immediate problem of, given that I changed something in the bottom end component, I have to fix it, check it into the code base,

Starting point is 00:43:15 get all the tests running, ship it as a new release, then go to the next repositories, download the new releases, get everything working again, ship all of those components, go to new repositories. Well, if you didn't have the separate repositories but had a single build, and you should be able to reuse most of the builds from somebody else, but if you had it in a single repository,

Starting point is 00:43:35 you should be able to start on the bottom end and work all the way through all the components, say, for three days in a row, and then be able to ship it. So you would go for the monolithic repository, but with the CPP dependencies checks to make sure that you're not going up or across in your dependencies. It sounds like an advert, and maybe it is, but yeah, that's the direction I would go into.

Starting point is 00:44:02 Okay, yeah, that's, I mean, I've never worked on codebases that were more than maybe a couple hundred thousand lines of code, not in the million, ten, hundred million, for sure. I do know that the people at Google have slightly different opinions about this because they have even more code affected by big changes. For example, if you change a function on a string in our codebase that results in multiple days of fixing all the places where it's used,

Starting point is 00:44:28 but then you're done. If you do that on a codebase that's 15 times as big, you'd be busy for a month. Right. And that's not going to work because A, your boss will not let you, and B, if you try that, everybody else will have changed their code in the meantime

Starting point is 00:44:43 and you will have 20,000 new places where they're using it. So it doesn't even work. So by keeping everything in a monolithic structure, it gives you the freedom, I guess, you need to do refactoring when it's necessary. It gives you the freedom of at least seeing what the impact will be of your change. Because it means that when you change a single thing at the bottom end, you can find out how many things will actually break if you do that. And if only for the exploratory thing of there is this bit of code, it's very ugly, I would like to fix it.

Starting point is 00:45:12 Just knowing how big is the problem that I would create if I do allows you to figure out if it's a thing you want to do. It's a solid point. I've never, like I said, never had to deal with this. I still have a hard time swallowing it because I'm just thinking that's a ginormous repository. But I see the point now more, I think, than I did. To give an example of a case where it worked very well, we had our own shared pointer implementation. And it was very much like a normal shared pointer. And in this case, it happened to have the same names for functions. So thank you, whoever made it that way, because that helped a lot.

Starting point is 00:45:52 I was thinking, can we just take this, throw it out the window, and replace it with the standard shared pointer, or the boost shared pointer in our case? So I tried on my own system, just replace it, inherit from the boost shared pointer, and see what happens. And I found out that everything was just fine. Nothing broke at all. I could even make it a time death. And I found out that everything was just fine. Nothing broke at all. I could even make it a time death.

Starting point is 00:46:09 And that still worked. So I did that, shipped it, and I think half a day later it was in. Wow. If we had the separate code bases, we would have been struggling with this for at least a few weeks in planning, figuring out how big it could be, how much work it would be,

Starting point is 00:46:22 and then possibly not even doing it. Right. Right. Right. And it makes sense to get rid of your own implementation of shared pointers since there's already so many other high-quality options available. Yeah, did you see any performance differences after changing it? We've looked for performance differences in many changes, but we haven't found any.

Starting point is 00:46:40 Interesting. But in this case, it was also a bit about threat safety because on one hand, you have your own shared pointer implementation. It worked correctly for every platform we had so far. But we didn't test it on all the new platforms. And there's somebody else who has a shared pointer that looks just like ours. And he did all the testing for us. Right.

Starting point is 00:46:59 So that's the implicit reason to reuse code. Somebody else made it. They tested it. It works. There's a thousand people using it. You should also be at least considering that. Yeah. Yeah, if a thousand other companies are using this library,

Starting point is 00:47:13 there's a good chance that it has fewer bugs than the library your company wrote. And it has a very good chance of not having any bugs when you go to a platform that you're not using yet, but they are. Right. That's a great selling point. Make a t-shirt or something out of that somehow.

Starting point is 00:47:27 I should give that to John Kelp from Boost. That's essentially the whole argument behind why everybody should be using Boost. Yeah. I don't know. Do you have anything else you want to ask, Jason? No, I think that covers it for me. Peter, is there anything else you wanted to ask Jason? No, I think that covers it for me. Peter, is there anything else you wanted to go over? I don't think so. Okay.

Starting point is 00:47:54 Well, Peter, it's been great having you on the show today. Where can people find more information about you or find CPP Dependencies? The library. If you want to know more about me, you can go to github.com slash duskandy. And if you want to know more about CPPDependencies, the library? If you want to know more about me, you can go to github.com slash duskandy. And if you want to know more about CppDependencies, you can go to either my own repository,

Starting point is 00:48:11 which is duskandy slash CppDependencies, or you can go to github.com slash tomtom-international slash cpp-dependencies. Okay, great. And as typical, you can find the more experimental things on my own branch and the more well-tested things on the main branch. Right. Right.

Starting point is 00:48:29 Okay, well, thanks so much for coming on the show today, Peter. Thanks for having me. Thanks for joining us. Thanks so much for listening in as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in. Or if you have a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast

Starting point is 00:48:55 on Twitter. You can also follow me at Rob W. Irving and Jason at Leftkiss on Twitter. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

Your Ad Here

CppCast - Hippomocks and cpp-dependencies

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.