CppCast - Hippomocks and cpp-dependencies
Episode Date: April 20, 2017Rob and Jason are joined by Peter Bindels to discuss the Hippomocks mocking library and the cpp-dependencies analyzer. Peter Bindels is a C++ software engineer who prides himself on writing co...de that is easy to use, easy to work with and well-readable to anybody familiar with the language. He's worked for a contractor for a few years and then made the switch to work at Tomtom, where he's been working on various parts of the software chain, last of which was a major cleanup in the navigation code base. In doing so he developed a tool to determine, check and improve dependencies between components, which allows quicker structural insight in complicated systems. He also created HippoMocks in 2008, one of the first full fledged C++ mocking frameworks that is still a relevant choice today. He has given two talks at Meeting C++ 2016 and will be giving his third talk, on Mocking in C++, at CppNow 2017. News Fluent C++ - The Design of the STL Fluent C++ - Inserting several elements into an STL container efficently 2017 Keynote - Ryan Newton - Haskell Taketh Away CLion 2017.1 released: C++14, C++17, PCH, disassembly view, Catch, MSVC and more An introduction to Reflection in C++ Peter Bindels @dascandy42 Peter Bindels' GitHub Links Hippomocks framework cpp-dependencies Meeting C++ 2016: Peter Bindels - How to understand million-line C++ projects Lightning Talks Meeting C++ 2016: Peter Bindels - Mocking C++ Sponsors Conan.io JetBrains
Transcript
Discussion (0)
This episode of CppCast is sponsored by JFrog, the universal artifact repository including C++ binaries thanks to the integration of Conan, C, and C++ Package Manager. Start today at jfrog.com and conan routine ones. JetBrains is offering a 25% discount for an individual license on the C++ tool of your choice.
CLion, ReSharper, C++, or AppCode.
Use the coupon code JETBRAINS for CppCast during checkout at jetbrains.com.
CppCast is also sponsored by Pacific++,
the first major C++ conference in the Pacific region, providing great talks and opportunities for networking.
Get your ticket now during early bird registration until June 1st.
Episode 98 of CPP Cast with guest Peter Bindles, recorded April 19th, 2017. In this episode, we discussed reflection and new features in the sea lion.
Then we talked to Peter Bindles from TomTom.
Peter talks to us about his hippomox library and the cpp dependencies analyzer so Welcome to episode 98 of CppCast, the only podcast for C++ developers by C++ developers.
I'm your host, Rob Irving.
Joe, I'm your co-host, Jason Turner.
Jason, how are you doing today?
I'm doing good, Rob.
I'm operating from a new laptop today.
Here's hoping it doesn't mess up the podcast.
That's both of us. We're both operating on new laptops. So yeah, fingers crossed for us.
And just a warning to our listeners, I think both of us are dealing with some allergies right now.
I don't know how bad it is in Colorado, but here in North Carolina in April, everything is green.
Like my car is normally blue, but right now it looks teal because it's covered in green pollen.
It's so disgusting.
We've got,
I think they might be like some sort of choke cherry tree or something that
has these little white blossoms all through our neighborhood.
And there's enough of the flower petals around that I actually saw kids out
with snow shovels.
I mean,
now granted they only had maybe a few cubic feet worth of them,
but it was still enough.
Yeah, nature can be pretty gross sometimes.
Anyway, at the top of our episode, I'd like to read a piece of feedback.
This week we got a tweet from Jonathan Bokara,
and we've talked about several of his blog posts recently. And
he said to us, CppCast is awesome. And your feedback on episode 88, let me publish two more
articles, which he linked to. And that was our episode with STL, of course. And I think we talked
about his blog post where he was actually creating, I think he called it like an STL learning resource, right?
Right.
Yeah, that sounds right.
Yeah.
So he, based on feedback from us and probably most importantly from STL,
he wrote a couple of new articles.
One is the design of the STL,
and the other is inserting several elements into an STL container.
I haven't had a chance to look at either of those articles in depth quite yet,
but I'm sure they're quite good because he's been making some pretty good blog posts.
I glanced at them, and if he's listening, I'll give another little comment here.
I would like to see an article that covers move iterators also.
That'd be a good one.
Well, as always, we appreciate the feedback,
and Jonathan will be sending you the JetBrains raffle giveaway.
And we'd love to hear your thoughts about the show as well.
You can always reach out to us on Facebook, Twitter, or email us at feedbackatcpcast.com.
And don't forget to leave us a review on iTunes.
So joining us today is Peter Bindles.
Peter is a C++ software engineer who prides himself on writing code that is easy to use, easy to work with, and well readable to anybody familiar with the language. He's worked
for a contractor for a few years and then made the switch to work at TomTom, where he's been
working on various parts of the software chain, last of which was a major cleanup in the navigation
code base. In doing so, he developed a tool to determine, check, and improve dependencies between
components, which allows quicker structural insight into complicated systems. He also created Hippomox
in 2008 and one of the first full-fledged C++ mocking frameworks that is still a relevant
choice today. He has given two talks at Meeting C++ 2016 and will be giving his third talk on
mocking in C++ at CPP Now 2017. Peter, welcome to the show.
Hi.
So I'm going to ask a question I haven't asked in a while now.
How did you get started with C++?
That's a funny story.
I was playing a computer game called Crusader No Remorse back in 1995, I think, or 1996.
And as I was playing it, I ended up with an overheating computer.
It was one of the first ones that needed a cooler on the CPU
and it had physically fallen off because
the mechanisms to attach it weren't as good
back then as they are now.
So as it was overheating,
the game gave an assert and it said
there was an error in this and this line
of C++ code.
So I was looking at that and thinking I can program
in BASIC and it's okay,
but this is a full-fledged game running on my system.
It's running 100 times as fast as the BASIC code I'm doing,
and this is written in C++.
Therefore, I will be learning C++ now.
As a 12-year-old.
It's not the best language to learn as you're 12 years old.
So you're saying as one of the first computers
with a cooling fan, I'm thinking this had to be
like early Pentium, late 486
era? That was
medium Pentium. It's Cyrix M2.
So around the Pentium MMX era.
Okay. So
how did it work? I mean, did you end up
learning enough C++ to solve
the problem? Did you
even have access to that source code? Did JVN have access to that source code?
I didn't have any access to the source code.
It was apparently a debug build they put on the CD-ROM.
I managed to open my case because I noticed that the amount of time I got to play kept
shrinking by a few minutes every time until I let it alone for a few hours, and then it
went back up to five minutes.
So I unscrewed the case and figured out that it was actually just lying on the floor.
Okay.
So you started with a software problem, but ended up solving the hardware problem.
Exactly.
And the software took much, much longer to actually learn to do well.
That's fun.
That's an interesting story, though, for getting you motivated for C++.
Yeah.
That's pretty cool.
Okay, well, Peter, we got a couple articles to discuss. Feel free to jump in and comment on any of these, interesting story though for uh getting you motivated for c++ that's pretty cool okay well
peter we got a couple articles to discuss uh feel free to jump in and comment on any of these and
then we'll start talking to you about uh some of your talks and the work you're doing at tomtom
okay okay okay so the first one is we have yet another keynote announcement from c++ now which
peter and jason you will both be attending soon.
And in the same theme where they had other talks and keynotes from the D and Rust community,
this third one is going to be from Haskell.
Yeah. Well, what do you guys think?
This is a pretty unusual conference at this point.
Yeah.
What do you think, Peter? Are you looking forward to this?
I have to say that I did kind of expect a Haskell announcement,
given that there was an announcement about D and Rust.
It kind of fits the theme.
It looks close to C++ as it is now,
because it's one of the core functional languages.
And if you look at the talks given by Phil Nash, for example,
there's a lot of functional in C++ now,
and it's getting more and more.
So I was kind of expecting this one.
Yeah, that's a good point. It makes sense to bring in a functional language. I was
expecting Swift or something would be the third one, because I wasn't thinking
quite along those lines.
I'm sure there are a lot of other languages that maybe they considered, but
SQL definitely does seem like a good fit. I'm really interested to see what you guys think of these keynotes after C++ now
and whether they're all received and generate some good discussions.
Definitely. I'm looking forward to it.
I think it'll be a good fit for this conference.
I do think it's an interesting choice to have different languages present at a C++ conference
because usually it's a case of very much introspection as in looking at
other C++ developers among
each other. And in this case, you get an
outside view to join that, which
should be good for the language.
Yeah, I think so too.
Okay, next we got an update
from CLion. This is their
first major release of the new year,
CLion 2017.1.
And they got a pretty big
feature list in here. Extended
support for C++14,
their first bit of support
for C++17, which is nested
namespaces, support for
precompiled headers, disassembly
view, they added catch
as a unit test framework, which makes sense
since Phil Nash is over there.
And they also added experimental support for the Visual C++ compiler,
which I thought was interesting.
I'm looking over this list myself, and I see extended support for C++14.
In brief, all except constexpr.
And pretty much constexpr is the only programming I'm doing right now.
Getting ready for my C++ Now talk.
And when they say they don't support constexpr,
I guess that just means they're not going to give you
any code generation help when you're writing something with constexpr
because they're just not going to recognize it.
But you can still compile because you're still using GCC or Clang,
which is capable of constexpr code, right?
Right, yes.
It might be a code highlighting thing,
where, like the problems you had in Visual Studio
around the 2011-2012 timeframe,
where if you wrote correct code,
you got three different responses from the IntelliSense,
from the code highlighter, and from the compiler.
That sounds awful. Right, because the IntelliSense compiler is not code highlighter, and from the compiler. That's not helpful.
Right, because the IntelliSense compiler
is not the same as the MSVC compiler.
They used to have three complete frontends.
They reduced it to two now, so it's better.
But it might be the same thing for CLion
because I think they have a Java parser,
so they might be expecting...
They might be checking the code themselves.
And in that case, I would have also postponed constexpr to the last
because in C++ 17, you're getting if constexpr,
which is a completely different way of parsing again.
Right.
Do you use an ID in UCLine yourself, Peter?
I have tried it, but I found it a bit too slow.
And I know this is an unfair statement to make without quantifying it.
I tried to use it on a really big project.
Okay.
And in that case, it was kind of slow.
I'm still working with FilmNash to find out why it's slow there,
because it's supposed to be fairly fast.
Is that one of the earlier builds, or something more recent?
It was a relatively recent one, as in nine months ago, I think.
Yeah, it wasn't that long ago
that they made some improvements on the speed,
but I think it was within the last year.
I'm losing track of that.
I noticed one of the things in here,
a comment from someone, actually,
not in the list of coverage,
was from Olaf, who says,
you forgot to bang your drum for the feature
that in my opinion is by far the best improvement to the zero latency typing. And I don't use IDEs
enough to really know what that's in reference to. But it makes me wonder if that's one of the
reasons why I've never really liked using IDEs. Yeah, I'm not really sure what that's in reference
to either. But he also mentions in the same comment, which I did notice, he's asking, is there an ETA for when the Vim plugin will be
released using this zero latency API? So I guess the Vim plugin is not using zero latency typing
currently? I guess. We're all talking about stuff that we don't have direct experience with at the moment
I guess I've definitely never noticed a problem where I'm typing in an IDE and I feel like there's
a noticeable latency like that's just never something that's occurred to me as an issue
personally I feel like when the autocomplete is constantly popping stuff up when I'm typing
even if it's not actually slowing me down, I feel like it's slowing me down
because my brain is wondering,
is it going to autocomplete something I don't want to autocomplete?
Or what? I don't know.
Maybe it's just a mental issue for me.
I'm in the same boat with you for that mental issue.
I try to turn off any autocompletes
or adding things on when I'm typing as much as I can.
So I'm not alone when it comes to that.
You're not the only one.
I feel a little better.
Especially if you have an IDE that tries to help you in adding more brackets, for example.
Oh, I hate that.
And then doesn't delete them when you close them.
And then I'm just typing my full sentence, look up at the screen, and there's five different
parentheses behind my line.
Yeah.
Your idea is not actually helping out.
It's just slowing you down.
And I ended up at some point in some, I believe, some misconfiguration of Visual Studio, where when I would put the opening brace, and then I'd put the close one, and it would be smart enough to delete the one that it automatically added.
It would be indented incorrectly, the one that it automatically added, it would be indented incorrectly than
the one that was left over.
And I'm like, okay, now I just have to go back in and delete a couple of spaces to re-indent
it to where I want it to be.
That was, yeah.
I do sometimes get frustrated with Visual Studio when it automatically adds quotes for
strings.
That bothers me sometimes. sometimes oh that's interesting
yeah okay anyway um next up we have another blog article from jackie k and i think this is actually
only her second one uh the first one generating a lot of controversy uh this one i don't think
should be as controversial but it was a very long and thorough, well-researched article, An Introduction to Reflection in C++, where she goes on about kind of the use case for why we need reflection in C++ and what you can currently do with some of the well-known C++ reflection libraries. And I thought that was pretty interesting.
And she kind of did a deep dive into how these different reflection libraries work.
Yeah, it's a very well-written article.
Yeah.
And she did hint that there will be a part two of this article where she'll actually
go over what types of things you can do with C++ reflection.
This one being just an overview of how reflection works.
Now, considering the Hippomux library that you've been working on, Peter,
you probably have some insight into this world of reflection.
Yes, but right now SG7...
Is it SG7?
It is SG7.
They are currently looking into doing all reflection except for functions,
which means you can make libraries to serialize, deserialize.
You can introspect into classes to find out what members they have
and metaprogram with that.
But you cannot actually look at a function
and use that to create new functionality
or create a new implementation of an interface.
That's the current proposal going through the standards committee, you're saying?
It is. To not have method reflection?
It is without method reflection.
I think that is to try to keep the size of it down so you don't get a new concept, for example,
which has been in the standards committee for 12 years by now.
So I think it's a good choice in them in not adding it.
But on the other hand, for me, the biggest thing would be adding functionality
that allows you to create a new instance of an interface or inherit an interface
so that you can create proxies, you can create logging wrappers,
you can do aspect-oriented programming with that.
Yeah, this is the kind of thing that I need for ChaiScript.
I need to not only be able to walk over what members
a class has, but ideally be able to generate a new class that implements virtual members,
for example. That's exactly the thing that I would like to do. Right. And just thinking,
for example, I'm thinking out loud, if you take Chris Juziak's DI library, so dependency injection,
it allows you to create an implementation
of an interface. You can register which one you want.
If you had method reflection,
you could take that type
and then create a new implementation that's a decorator
around the same type
and put in the actual implementation and create
logging around every function call
automatically.
And then we'd be doing it
at compile time
instead of having to do weird runtime hooks
like other languages that allow that kind of thing.
And then you do that at compile time
and you can still make it a runtime switch
because you can insert the decorator.
Either you can or you cannot insert the decorator.
Then if you don't insert it,
you don't get any overhead from it.
Wow.
Well, hopefully we get something
that lets us do all these fun, crazy
things soon. Hopefully.
I am really going
to be attending Jackie Kay's talk at
C++ Now this year, because
I think she's talking about reflection there.
Is she? I didn't even...
Yeah. I wonder if it conflicts with
anything else I have going.
Okay, well,
Peter,
it says here you just got back from Revision 2017.
What was that conference?
That's actually not a conference.
That's a demo party.
Do you know what a demo party is?
I don't think I do.
So if you look back 20,
30 years, you get to the time when
the Commodore 64 and the Commodore Amiga
were really big computers.
And at that time, people were still
kind of exchanging software directly, because
you could either buy it at high prices
if you knew where to find it, or you could find it
at people coming together and just
sharing software. And the people
tried to share all the software they had, but
sometimes people tried to add a copy
protection, which is of course their good
right, making it very hard to do that. So groups popped up that cracked a copy protection, which is, of course, their good right, making it very hard to do that.
So groups popped up that cracked the copy protection and then released it.
And in doing so, wanted to tag it with their own, sorry, call it an intro that says, this is our group and we did this.
And over time, the second part of it became more important.
And the first part kind of dropped out because, well, copying software is not really nice to the software vendors.
So they kept making new intros and full-scale demos,
which are essentially a long version of an intro,
which is a graphics performance with audio that looks really awesome.
And they do that in a very, very small amount of size.
So if you think about a bit of software,
I think the latest Facebook on your Android phone
would be 270 megabytes.
A demo would typically be 64 kilobytes.
And a small demo would be only 4 kilobytes.
And it would still be 3 minutes of full-screen performance
with audio.
That's really impressive.
And that's still the targets they're reaching for,
is these 4 and 64K demos,
even on Windows systems or today?
They have a Windows system with a 1080 Ti,
so a really high-end Windows system.
And they play demos of 64 kilobytes on that.
And you wouldn't be able to tell that it's only 64 kilobytes.
So, I mean, they're still able to use system libraries, I assume,
like DirectX and such.
They do import DirectX and such.
They do import DirectX or OpenGL,
but they do the importing in a second way so that you don't have the length of the name of the function you're calling.
So they do everything they can to get the size down
and then cram in as much as possible
to get the longest demo with the best music you can
within that size limit.
Okay, so I was getting ready to say, you know, is this kind of cheating compared to what
you had to do on a Commodore 64, but they still are really having to do a lot to cram
it all in.
Well, the thing is, the more you cram in, the more stuff you can add.
And if you can cram that in just a little bit tighter, you can add in just two more
minutes of demo, or you can make the models just a little bit better, or you can add an
extra soundtrack.
So what was the most impressive stuff that you saw at Revision this year?
There was an 8K demo, so that's 8 kilobytes,
about the size of an average email,
and that did voice synthesis.
Wow.
As part of a demo.
There was also a 4K demo that had essentially a replica of Star Wars.
You know the final scene in Episode VI?
Yeah.
Where they fly over the Death Star, launch a rocket into it, and so on.
Oh, okay, yeah.
He basically recreated that, including an exploding star and five fighters and so on, within 4K.
That's amazing.
It is.
So do you do any demo coding yourself?
I would love to answer yes, but the answer is actually no.
Okay.
So what do you do as an observer at Revision?
I mean, are these just presented or are people kind of working on this live?
I'm just trying to get a sense of what happens here.
Well, it's a really big room.
There's a lot of people there who bring their laptops, desktops,
their ancient Commodore Amigas,
because there are still Amiga demos being made, even this year.
And you go around, you talk to people,
you find people who are writing demos,
and sometimes you can join them to help them out a bit.
Wow.
And there's also a bunch of creative people.
For example, there's a compo that's about making executable music.
So there's just making the music from an executable, or somebody making a drawing.
That's stuff I've never really played with. I do, you know, follow the Commodore 64
scene enough to know that they're still releasing Commodore 64 demos also. And recently, even,
it's like they're still discovering new video modes that
they can coerce the hardware into generating and it's hardware that you would have thought would
have been fully understood 35 years ago. It's amazing. Yeah, some of the things that they're
doing is truly crazy. For example, they have figured out that you can switch the amount of
pixels per line in the middle of a frame. So it means you can make the top
half of the frame a lesser resolution
and the bottom half a bigger resolution
and then display something at really high
resolution in the bottom part.
And they're doing that on these Windows demos?
Or on the older hardware?
No, that's all on the older hardware. On the new machines
you don't get that kind of access to hardware anymore.
Yeah, I didn't think so. I was
really wanting to know more about that
if that's what I had understood.
But on the new hardware, you don't really need to
because you can just make a 4K resolution demo.
Right.
That's amazing.
Okay, well, you are going to be attending C++ now.
It looks like you're going to be giving a talk
called Mocking C++.
What are you going to be covering?
That's going to be actually, I think, at the same
time as Jason's talk.
Yes, one of them. I'm sorry, Jason, I will not
be attending your talk on that.
Yeah, I won't be attending yours either. I was planning
to. That's too bad.
The talk will be about
the
same subject I gave a lightning talk about
at Meeting C++ last year,
which is basically,
given that you know what marketing libraries do,
and given that you know that C++
basically doesn't allow it right now,
because you need the reflection proposal
that we just discussed is not going to happen,
how do you do it anyway?
Okay.
So that will be going into the low-level bits
of how does linking work, how do functions work, how do member function pointers look like?
How do you actually go into that level of detail, get out the information you need, and then use that to create a class that looks and acts like an actual class without making an actual class?
So, I mean, this is a topic we've discussed a little bit on our podcast before, but just give us maybe some teaser, high level, like what this looks like and why people are going to want to come and hear your talk at C++ now.
So at a high level, you have the basic idea of a mocking framework, which is to be able to indicate I have an instance of this class,
and I would like it to behave like that.
As in, for this test, it needs to return false or true or throw an exception, that kind of behavior.
Okay.
And regularly in C++, you would create a new subclass, implement all the methods,
add a lot of functionality to it yourself,
and essentially that's a lot of don't-repeat-yourself being violated.
Because you have an interface specification,
then you have a mock class specification,
which is an exact copy of it.
Then you do a lot of calls on it saying,
I expect this function to be called and then it should return false.
And that's again a duplication of the same function name
with the same arguments.
So the mock class doesn't actually add anything there.
It's just a bit of busy work you have to do
and for frameworks that try to stay completely within the C++ language boundaries
there is no way around it
so that's for Trompe l'oeil from Bjorn Fowler
and for your Google mock for example
and I am going to show you that you can definitely do without
and that has a bunch of very, very interesting advantages.
For example, if you delete a mock object and then call a function on it,
so on a dangling pointer, I can just throw you an exception and say,
hey, you did that.
You call a function on the zombie mock, and now it returns.
Okay, how do you do that?
How would you hook into a deleted pointer?
By making it not deleted.
Okay.
One of the interesting things that I found out while doing this
is that when you have a destructor in your Vtable,
there are actually, in GCC and Clang, there are two entries,
and both of those also do the delete.
Okay.
Which essentially means that if you hook them and hook in a different function, then you are also supposed to do the delete. Okay. Which essentially means that if you hook them
and hook in a different function,
then you are also supposed to do the deleting.
Which means that if you don't actually do the deleting,
you're fine, you still have an object.
Okay.
So it means that somebody can do delete object X,
and then you still have the object there,
and you can still hook in functions.
That's interesting, but you can still call the object there, and you can still hook in functions.
That's interesting, but you can still call the destructor
and do the cleanup, you just don't free the
memory.
The user tries to call a destructor,
and he does that by saying
delete pointer to x.
Right. And that invokes a
function on your object
that should be doing the deletion and the destructing.
Okay.
And in this case, you hook in a mock function that does neither of those it basically just says this is now a zombie
mock check marks placed and we're done okay and then you return now hypothetically maybe i'm
getting a little too far into the weeds but um for the sake of my test, I needed that destructor to actually...
I needed the body of the destructor to get called.
Is there any way to still have your mock
execute the body of the destructor without freeing
and do this zombie functionality that you're talking about?
Practically, that would be possible,
except that in the case of the mock object,
it's actually not the class that it claims to be.
Right. Which means that it never called your constructor, it's actually not the class that it claims to be. Right.
Which means that it never called your constructor.
It doesn't have your members initialized.
So running the constructor wouldn't be a logical operation.
Okay.
It is a total mock object replacing the entire object you have, including base classes.
Okay.
Yeah, I would like to attend your talk.
Okay.
Before we get too deep in the weeds, so we're talking about your hot mocking library, Hippomox, now,
and you briefly mentioned Trumplay and Google Mock.
What do you think sets Hippomox apart from some of those other libraries?
Well, the biggest thing is that you don't have to define any mock objects.
You can just use a class as a mock without defining anything in between.
So the moment that you type the semicolon after your interface definition,
you can start a test and use it as if you had implemented it
without having any implementation of it at all.
That's fascinating.
I would really like...
That's a pretty big difference.
I wish that we could show some sample code right now, actually.
Yeah, that would be handy,
but I don't think a podcast lends itself well to showing samples.
It does not.
And I'm afraid that reading it out loud is going to be
confusing.
Yes, but it's a great hook for people
to pay attention and look forward to your talk.
And I'll just give
one more hint.
There's also the ability to hook a free function.
Okay.
So you can make a test that some
object calls assert,
and then check that it calls assert,
and then say, well, that's fine.
The test has now succeeded.
And the same goes for abort and exit.
Huh, interesting.
Now, that's something I've played with a little bit myself,
actually trying to hook free functions
and did some research on that.
So that one I have slightly more uh knowledge about
but certainly not the replacing the entire class part replacing the class part is actually easier
interesting because replacing free functions comes with a few corner cases that aren't exactly the
case that you want and it means that it doesn't always 100. So you do need to be aware of the corner cases there in order to avoid them.
Right. Okay.
Okay.
So we mentioned that you're working at TomTom,
and you're also working with them on an open-source tool called CPP Dependencies.
Do you want to tell us a little bit about that?
Yes.
So around three and a half years ago,
I joined the navigation team,
and they have a fairly big navigation code base. I think the total is at least 1.5 million lines of self-written code. And that adds on to a whole lot of third-party libraries that are used to not do everything yourself. But even then, you have one and a half million lines of code. So I did a bit of mental math in figuring out how does anybody understand all of this and why can't I figure out how everything hooks together?
And I figured out that if you actually just read the code
one line at a time and tried to finish,
you would be busy for more than a year.
So given that the average developer
would be working on it for two years
and people don't actually read one line a second continually,
it's kind of unrealistic to expect anybody to know the code base and that makes it fundamentally different from a small project just putting it a contraposition where you can basically expect
somebody to go read the code base fully and then come back when you understand it okay so as doing
as we were doing that i figured out well, most of these dependencies are not as people have written down.
So there's a bunch of definitions we use CMake.
So there's a target link library statement, which says this library depends on these other libraries.
And we found out that actually a bunch of those aren't there anymore.
There's no link there.
It doesn't use any of those headers.
But there are a few that it does use, and it doesn't mention.
Hmm. Okay.
So that's confusing, as in
there's a bit of
lacking maintenance there.
So, suppose that we try
to fix that, and then I started adding
the dependencies that I knew were supposed to be there,
and the build failed.
Because if you add all the dependencies
that should be there, without filtering out the ones
that shouldn't be there,
you actually create a giant circular dependency
including everything.
And CMake by default
only repeats everything twice.
So that means your link breaks.
You can say repeat it 20 times
and then the link works again,
but it takes forever.
And given the size of project,
adding a couple of tens of minutes to the build
time was not something people were happy with. So we basically said, well, let's figure out if we
can just figure out how the dependencies should be, and then make it that. Because when you look
at source code, you think I have an include to standard and I have an include to my interface,
I should be able to figure out where this comes from. And humans are pretty good
at that. I mean, if you look at an include statement,
you don't have to look up all the files
in the libraries to figure out which one it goes to.
You already know which one it's supposed to be.
So I figured,
well, let's just write a tool that tries
to do that, and then do it
automatically. See how well it works.
Just the first time and see what the results are.
And the first one that see what the results are. And
the first one that I wrote was in a
shell script and it took about two hours to run
on the codebase.
So that's a terrible user interface.
But it did result in
getting dependency information about everything
and everything we checked was right.
So we figured, well, this is good. Let's
develop it a bit further. And long
story short, we now have it running in Let's develop it a bit further. And long story short,
we now have it running in two seconds on the same code base.
Wow.
And we can extract drafts from that.
We can extract information from it.
And we can't watch everything that's in the code base.
So we are actively preventing new cycles from being introduced.
So you said the original version was a shell script.
What's the final version look like?
The final version was written in, great surprise, C++.
And it's not actually that big.
I think the total is around 2,000 lines of code.
Oh, wow.
And with permission from my managers, I was able to open source it. So anybody can download it and run it on your own code base.
So are you using libclang or anything like that to help you with this, or is it all hand-rolled?
It's exactly the opposite, actually. The thing with parsing C++ is that it's a really heavy
duty task to do, and it depends on all the headers you included before that point.
Right.
Which means that if you include a header, you can't always use a pre-compiled header for that one, because it might be different due to other things you included before that point. Right. Which means that if you include a header, you can't always use a precompiled header for that one
because it might be different
due to other things you included before that.
Right. Okay.
Which means that if you're actually trying to understand
what the headers do,
you have to use something like libclang,
do a full parse of everything,
have, say, a tenth of a second per file,
times 70,000 files is, well, wait for an hour or so.
So that's an unrealistic approach if you want to do this, but the only,000 files is, well, wait for an hour or so. So that's an unrealistic
approach if you want to do this, but the only information
I need is, given this header file,
what does it include?
And that's a really simple parser to write.
Right. You're just looking for...
Yeah, go ahead.
At first I just looked for lines and
parsed per line, but now I've actually
implemented a per-character parser,
which is more accurate and a little bit faster still.
But that's even still only about 100 lines of code.
Wow.
So what's the end result of actually applying this to your code base?
I know you said it's keeping your dependencies clean,
but have you noticed an increase in compile time,
or excuse me, a decrease in compile time,
or anything tangible like that?
Well, the biggest thing it does is
introspection. It allows you to see what's in your code base, what dependencies you actually
currently have, and essentially get some numbers from that. The biggest direct result you can get
from that is which file is the biggest impact on my compile time and which files do we have that
do nothing at all. For example, if you have a component that nobody uses, it will just tell
you, hey, there's a component here
and nobody's linking to that.
You probably just want to delete that.
There's a bunch of headers here that you have in your project.
And I've tried running that on multiple projects.
I think every project I tried had at least 10.
10 headers that were not used.
So 10 headers that were not used by anybody at all.
So that's from the smallest hobby project from somebody to the biggest
LOVM I ran it on and there's a bunch of headers there
that nobody uses
that's funny
and the thing is if you delete everything, recompile it
it works
they actually are not used
right
so does it work well with header only libraries also?
it will work
completely with any kind of structure of library you have.
It doesn't actually look at headers or source code files to see which one is which.
It looks just at whether somebody from outside your project includes it to see if it's a header file.
Okay.
If somebody has an include statement pointing to it, then it must be a header file, even if it's called.cpp.
Right. Yes, I had this conversation actually at my meetup the other day
that the C++ compiler doesn't care if it's what the file is called,
for the most part.
It doesn't even check whether the thing you're compiling
is actually meant to be compiled.
It just tries.
But typically, if you compile a header file,
there's no actual thing being instantiated,
so the output file will be empty.
Right. Right.
Right. Okay.
But that does kind of leave you with the interesting result that
you don't have headers and source files.
You have headers, you have source files,
you have files that are both, and you have files that are
neither. So then,
yeah, how do you
produce that into an output file?
Right. I wanted to interrupt
this discussion for just a moment to bring you a word from our sponsors.
JFrog is the leading DevOps solution provider
that gives engineers the freedom of choice.
Manage and securely host your packages
for any programming language with Artifactory.
With highly available registries
based on on-prem or in the cloud
and integrations with all major build
and continuous integration tools,
Artifactory provides the only universal,
automated end-to-end solution
from development to production.
Artifactory now provides full support
for Conan C and C++ Package Manager,
the free open source tool for developers
that works in any OS or platform
and integrates with all build systems,
providing the best support for binary package creation
and reuse across platforms.
Validated in the field to provide a flexible, production-ready solution,
and with a growing community,
Conan is the leading multi-platform C++ package manager.
Together, JFrog and Conan provide an effective, modern, and powerful solution for DevOps
for the C++ ecosystem.
If you want to learn more, join us at the JFrog User Conference SwampUp 2017, running the 24th to 26th of May in Napa Valley, California, or visit Conan.io.
So if you want to run CPP dependencies against your own code base, how easy is it to do that?
Do you just download from GitHub and run it, or do you need to kind of configure your repository to work with it?
Well, right now it uses the CMake files that should be present in a project
to figure out where components are,
but it only looks at it to figure out where they start,
because essentially the only assumption it does is
if it's a source file, it must be compiled,
and any grouping of source files that I can find
will be a project or a component.
So if you're not using CMake and you run it, it will tell you
I have one component and it's
your entire thing, everything.
Okay. Which is kind of pointless.
You can ask it to do the opposite,
which is to assume that any folder that contains
a compileable file will be
a component.
And that works if you
put your headers and source in the same folder,
but it typically gives you too many results
because it takes your well-structured big component
that is actually in four folders and makes it into five components
that are then, of course, cyclically dependent.
Oh, okay.
So both of the extremes are not the most easy to use,
but if you use CMake, it will typically find you the right projects.
Okay.
So what is the... You said that you use this to ensure that you're not introducing cyclical dependencies
or new dependencies in your code base at TomTom.
How do you actually apply that?
Do you do this on check-in with a Git hook or something like this?
We're not using Git yet. Okay. So it It's a git hook or something like this? We're not using git
yet, so it's not a git hook
per se, but we do do this
in the continuous integration
system. So in every commit
that somebody tries to make to the mainline,
we run the tool first to see how many
dependencies it generates, and if
it's more than the amount of cyclic dependencies that we
had before, so just to count,
then we refuse to build it.
Oh.
Well, you're very serious about it then.
It's not just an...
Yeah, we do try to actually get it down.
And we've managed to get it down from a total of 120-ish to around, I think we're around 20 now.
Wow.
That's total components that are in any way cyclically dependent.
That's pretty cool.
Yeah.
So you want to tell us a little bit more about working at TomTom?
You have a pretty large code base.
Are there any other special considerations you have while working on such a large code base?
You said a million and a half lines code, right?
Yep.
Well, it's like many other large code bases, actually.
If you work at Microsoft on Windows or on Visual Studio on Chromium, if you're at Google,
those codebases are, as far as I know, even bigger.
And they will have the same problems, the same situations that you have in our codebase,
which is it's really big.
The people that originally set it up did a good job, according to the 1995 thing of doing a good job, which may be different from now.
And you have so much code and so many things that have to keep working that that's going to be the main determinant in how fast you can develop anything.
So what do your compile times currently look like, if you can share that?
I'm not sure if I can share that.
Okay. look like if you can share that i'm not sure if i can share that okay um do you have a large
monolithic code base or do you have it split up among several repositories uh we have it split
up among several repositories now because the new code that's being developed has been
avoided making any new cycles by just putting it in a separate repository
you can't include a header that you can't see.
And that's a pretty good approach.
But we are trying to
take the old code base and get it into a state
where we can also split it up and
divide it into many separate components that build
separately. And that has many
advantages. For example, your compile time drops
by a lot because you're not compiling most of the code
anymore. You cannot
make any new cyclic dependencies because you
can't make one.
And you can deploy parts of
your software knowing that it will not depend on anything
else. But it also has
the obvious downsides of that, which is that
if something breaks in a downstream dependency,
it will affect you upstream.
So if somebody makes a breaking change
on a component and then distributes that as the
new version, then
your build system will avoid any dependency
on that, any user of it from upgrading
until they have also fixed all their dependencies.
And if you have that three layers
deep, by the time your new version of the bottom component
gets propagated all the way to the top,
that could be years.
Yeah.
I think it's
a great conversation to have because we've discussed in the past, like, Google having this giant monolithic codebase with tens of millions...
Hundreds of millions of lines of code.
Hundreds of millions of lines of code in one codebase. And I've wondered about this kind of thing that you're discussing. So it seems like you're strongly on the side of saying, split up your codebase and don't have a bunch of spaghetti dependencies.
If possible, make your codebase
such that any dependency has
an obvious up or down direction.
So you should never have a cross-direction or an
up-pointing link.
Okay. And that does
heavily point towards being able to put things
into separate repositories,
but I would advocate not doing that.
Not doing separate repositories, but I would advocate not doing that. Okay. Not doing separate repositories?
Yes.
Because you get the immediate problem of,
given that I changed something in the bottom end component,
I have to fix it, check it into the code base,
get all the tests running, ship it as a new release,
then go to the next repositories,
download the new releases, get everything working again,
ship all of those components, go to new repositories.
Well, if you didn't have the separate repositories
but had a single build,
and you should be able to reuse most of the builds from somebody else,
but if you had it in a single repository,
you should be able to start on the bottom end
and work all the way through all the components,
say, for three days in a row,
and then be able to ship it.
So you would go for the monolithic repository, but with the CPP dependencies checks to make
sure that you're not going up or across in your dependencies.
It sounds like an advert, and maybe it is, but yeah, that's the direction I would go
into.
Okay, yeah, that's, I mean, I've never worked on codebases that were more than maybe a couple hundred thousand lines of code,
not in the million, ten, hundred million, for sure.
I do know that the people at Google
have slightly different opinions about this
because they have even more code affected by big changes.
For example, if you change a function on a string in our codebase
that results in
multiple days of fixing all the places where it's used,
but then you're done.
If you do that on a codebase that's 15 times as big,
you'd be busy for a month.
Right.
And that's not going to work
because A, your boss will not let you,
and B, if you try that,
everybody else will have changed their code in the meantime
and you will have 20,000 new places where they're using it.
So it doesn't even work.
So by keeping everything in a monolithic structure,
it gives you the freedom, I guess, you need to do refactoring when it's necessary.
It gives you the freedom of at least seeing what the impact will be of your change.
Because it means that when you change a single thing at the bottom end,
you can find out how many things will actually break if you do that.
And if only for the exploratory thing of there is this bit of code, it's very ugly, I would like to fix it.
Just knowing how big is the problem that I would create if I do allows you to figure out if it's a thing you want to do.
It's a solid point.
I've never, like I said, never had to deal with this.
I still have a hard time swallowing it because I'm just thinking that's a ginormous repository. But I see the point now
more, I think, than I did. To give an example of a case where it worked very well, we had our own
shared pointer implementation. And it was very much like a normal shared pointer. And in this
case, it happened to have the same names for functions.
So thank you, whoever made it that way, because that helped a lot.
I was thinking, can we just take this, throw it out the window,
and replace it with the standard shared pointer,
or the boost shared pointer in our case?
So I tried on my own system, just replace it,
inherit from the boost shared pointer, and see what happens.
And I found out that everything was just fine.
Nothing broke at all. I could even make it a time death. And I found out that everything was just fine. Nothing broke at all.
I could even make it a time death.
And that still worked.
So I did that, shipped it,
and I think half a day later it was in.
Wow.
If we had the separate code bases,
we would have been struggling with this for at least a few weeks in planning,
figuring out how big it could be,
how much work it would be,
and then possibly not even doing it.
Right.
Right. Right.
And it makes sense to get rid of your own implementation of shared pointers
since there's already so many other high-quality options available.
Yeah, did you see any performance differences after changing it?
We've looked for performance differences in many changes,
but we haven't found any.
Interesting.
But in this case, it was also a bit about threat safety
because on one hand, you have your own shared pointer implementation.
It worked correctly for every platform we had so far.
But we didn't test it on all the new platforms.
And there's somebody else who has a shared pointer that looks just like ours.
And he did all the testing for us.
Right.
So that's the implicit reason to reuse code.
Somebody else made it.
They tested it.
It works.
There's a thousand people using it.
You should also be at least considering that.
Yeah.
Yeah, if a thousand other companies are using this library,
there's a good chance that it has fewer bugs
than the library your company wrote.
And it has a very good chance of not having any bugs
when you go to a platform that you're not using yet,
but they are.
Right.
That's a great selling point.
Make a t-shirt or something out of that somehow.
I should give that to John Kelp from Boost.
That's essentially the whole argument behind why everybody should be using Boost.
Yeah.
I don't know. Do you have anything else you want to ask, Jason?
No, I think that covers it for me.
Peter, is there anything else you wanted to ask Jason? No, I think that covers it for me. Peter, is there anything else you wanted to go over?
I don't think so.
Okay.
Well, Peter, it's been great having you on the show today.
Where can people find more information about you
or find CPP Dependencies?
The library.
If you want to know more about me,
you can go to github.com slash duskandy. And if you want to know more about CPPDependencies, the library? If you want to know more about me, you can go to github.com slash duskandy.
And if you want to know more about CppDependencies,
you can go to either my own repository,
which is duskandy slash CppDependencies,
or you can go to github.com slash tomtom-international
slash cpp-dependencies.
Okay, great.
And as typical, you can find the more experimental things
on my own branch and the more well-tested things on the main branch.
Right.
Right.
Okay, well, thanks so much for coming on the show today, Peter.
Thanks for having me.
Thanks for joining us.
Thanks so much for listening in as we chat about C++.
I'd love to hear what you think of the podcast.
Please let me know if we're discussing the stuff you're interested in.
Or if you have a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to
feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast
on Twitter. You can also follow me at Rob W. Irving and Jason at Leftkiss on Twitter. And of
course, you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode is provided by podcastthemes.com.