CppCast - Blaze

Episode Date: November 2, 2016

Rob and Jason are joined by Klaus Iglberger to discuss the Blaze high performance math library. Klaus Iglberger has finished his PhD in computer science in 2010. Back then, he contributed to s...everal massively parallel simulation frameworks and was an active researcher in the high performance computing community. From 2011 to 2012, he was the managing director of the central institute for scientific computing in Erlangen. Currently he is on the payroll at CD-adapco in Nuremberg, Germany, as a senior software engineer. He is the co-organizer of the Munich C++ user group (MUC++)and he is the initiator and lead designer of the Blaze C++ math library. News Recommendations to speed C++ builds in Visual Studio void foo(T& out) How to fix output parameters Routing paths in IncludeOs Klaus Iglberger Klaus Iglberger Links Blaze Munich C++ User Group CppCon 2016: Klaus Iglberger "The Blaze High Performance Math Library" Sponsor JetBrains

Transcript
Discussion (0)
Starting point is 00:00:00 This episode of CppCast is sponsored by JetBrains, maker of excellent C++ developer tools, including CLion, ReSharper for C++, and AppCode. Start your free evaluation today at jetbrains.com slash cppcast dash cpp. And by Meeting C++, the leading European C++ event for everyone in the programming community. Meeting C++ offers five tracks with seven sessions and two great keynotes. This year, the conference is on the 18th and 19th of November in Berlin. Episode 77 of CPP Cast with guest Klaus Igleberger recorded November 2nd, 2016. In this episode, we discussed output parameters and speeding up your builds. Then we talked to Klaus Igelberger.
Starting point is 00:01:02 Klaus tells us about the Blaze High Performance Math Library for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? Good, Rob. How about you? Doing pretty good. How would Halloween go for you? I think we had something like 225 kids maybe oh wow we we must have only had a handful because we just left the bowl out for a while and uh it was still pretty full when we got home okay so my neighborhood has a facebook page
Starting point is 00:01:59 where people post like you know with there's kids doing vandalism, that kind of thing, right? People talk about security or whatever. Someone said they left their bowl out because they needed to go to bed or whatever with candy in it. And now they're wondering, has anyone seen my large mixing bowl? Because the kids took the bowl with the candy. Why would you do that?
Starting point is 00:02:23 I don't know. Yeah. I live on a cul-de-sac, and my driveway is a little steep. So I kind of think kids just towards the end of the night were intimidated by the steep driveway and just didn't want to go through the extra effort to reach my bowl. So that's why it was so full still. That's why I guess they didn't deserve that extra piece of candy. I guess not. Anyway, at the top of our episode, I'd like to read a piece of
Starting point is 00:02:50 feedback. This week, Joey writes in saying, I really enjoy the podcast. Keep up the good work. I just had to make one comment though on your offhand remark about a possible celebration for your 100th podcast. And he says, why did you pick such an arbitrary number? You should have had a celebration celebration for your 100th podcast. And he says, why did you pick such an arbitrary number?
Starting point is 00:03:06 You should have had a celebration back on your 64th episode, or maybe for your 120th, which is still a bit far away. Nice round numbers for programmers. Just to be clear, we have not actually planned any kind of spectacular celebration. No, but we could. We could. Plenty of ways away. I do like his idea with uh you know
Starting point is 00:03:26 the bite oriented uh celebration numbers yeah something 256th episode maybe if we should plan for that one that'd be quite a ways away well actually we're zero counted 255th episode maybe yeah well are we zero counted i think the first episode is the first episode no yeah you should be you should be wasn't thinking ahead on that one i didn't think about that with c++ weekly either anyway uh we'd love to hear your thoughts about the show you can always reach out to us on facebook twitter or email us at feedback at cppcast.com and don't forget to leave us reviews on itunes joining us today is Klaus Igelberger. Klaus has finished his PhD in computer science in 2010.
Starting point is 00:04:10 Back then, he contributed to several massively parallel simulation frameworks and was an active researcher in the high-performance computing community. From 2011 to 2012, he was the managing director of the Central Institute for Scientific Computing in Erlangen. Currently, he's on the payroll at CD Adapco in Nuremberg, Germany, as a senior software engineer. He's the co-organizer of the Munich C++ User Group, and he's the initiator and lead designer of the Blaze C++ Math Library. Klaus, welcome to the show. Hi, Rob. Hi, Jason.
Starting point is 00:04:42 Hey, thanks for joining us. So how long has the Munich C++ user group been going? How long it has been going? For about three years. Oh, very good. And is that... I think it's one of the most active communities in Germany. So we have now 830 about active members. Okay, active.
Starting point is 00:04:57 Well, about 80 to 90 show up every month. Wow. Meet once a month. That's good. That is impressive. Is that the same C++ user group that Jens Weller would attend? Jens is doing Berlin and I think Dusseldorf.
Starting point is 00:05:12 And Munich is probably too far for him. Gotcha. Yeah. And I co-organize them, doing this together with Daniel Pfeiffer. Very nice. Well, we have a couple news articles to go through, Klaus,
Starting point is 00:05:23 and then we'll start talking about Blaze. So feel free to jump in on any a couple news articles to go through, Klaus, and then we'll start talking about Blaze. So feel free to jump in on any of these news articles. The first one is from the Visual C++ blog. And this one is recommendations to speed up your C++ builds in Visual Studio. And they have a number of good recommendations, many that our listeners might already be familiar with. The first one is going to go into detail about guidelines for using precompiled headers.
Starting point is 00:05:52 Yeah. Go ahead, Jason. Precompiled headers are great, but it is hard to use them consistently in cross-platform code. Right, because Clang and GCC don't have as good support for it, right? Very different support, too. And trying to kind of fit all that into CMake and get it all to do what you want
Starting point is 00:06:13 to do is a bit of a pain. Right. In addition to pre-compiled headers, they suggest using slash mp, which will paralyze your compilation, slash incremental for incremental linking,
Starting point is 00:06:30 and debug fast link to generate partial PDB files. That's interesting. I don't think I'm too familiar with that one. No, I've never seen that one. Yeah. And the last guideline is to possibly make use of third-party build accelerators one of which is incredibuild which we've talked about before on the show uh and used to sponsor the show and they have another one here named electric cloud which i don't think i've heard of
Starting point is 00:06:55 but i'm guessing they do a similar um distribution of your build process as incredibuild does it seems that way yeah anything you want to mention on this one, Klaus? I think slashmp is something that everybody can use. This would be my favorite. Because today everybody has more than two cores, probably four, perhaps even eight.
Starting point is 00:07:18 That's an easy game. Yeah, absolutely. Okay, next one is from Foo Nathan's blog and it's about how to fix output parameters. And I thought this was a pretty interesting blog post going through. He doesn't want to either defend or advocate against using output parameters, but basically says if you're going to use output parameters, he created a new type to help manage the use of output parameters. What did you think about this one, Jason? I liked that.
Starting point is 00:07:51 I'm like a quarter of the way into it, and I'm like, ooh, I see where he's going with this. And I totally like where he went with it. It seems like a good idea to me to take advantage of the type system as much as we can. Yeah. I like it too. So it's a very nice approach. Also very clean, very obvious what's happening. Yeah, definitely. And he points out just at the end, I'm not sure if you
Starting point is 00:08:15 got down this far, Jason, that he's making use of a bit more sophisticated implementation of this in his type safe library. And I don't know if that was one of the libraries we discussed with him when we had Jonathan on the show. I do not believe we discussed it with him. I did see that on the, on the article.
Starting point is 00:08:37 Yeah. You know, it looks like the library is only a few months old actually. Okay. So this might be a new one. He started after we had him on the podcast. September 29th was the first commit. Oh, okay.
Starting point is 00:08:51 Yeah, so we had him over the summer, I think, right? Yeah, certainly before CppCon. Yeah. Well, we'll put this link to this library in the show notes in addition to the link to the blog post if anyone wants to check out the type safe library yeah okay and the last one is routing paths and include os from javascript to c++ jason do you want to introduce this one i know you're a little bit more familiar with include os
Starting point is 00:09:17 at this point yeah so include os is you know well we we we've had them on the show and talked about it, right? Yeah. But you can write C++ application that basically compiles to something that's bootable and then boot that up in a virtual box or QEMU or whatever. And you can just make an OS from a C++ file. Really easy. It's awesome. And one of their example uses is as a standalone web server mana i guess they're calling it and it is um what they have here is an example for how to do normal like url routing inside of your include os
Starting point is 00:09:55 written c++ web server and it looks um i mean it looks like what you would expect to be able to do with something like Apache. Right. And, oh man, I forget what that plugin is called in Apache. But whatever, the standard kind of URL routing stuff that Apache can do. Okay. Although it's apparently a JavaScript library, so I guess you can do it in JavaScript also. Oh, I guess that explains the title, going from JavaScript to C++. Yeah, they ported a path to RegExp, some popular JavaScript library, into C++ RegEx.
Starting point is 00:10:33 Very cool. For IncludeOS. Okay, well, let's start talking to Klaus about Blaze. Where should we get started? Just want to give us an overview on the library? Yeah, why not? So Blaze is, first of all, a high-performance C++ linear algebra library. So the idea is, you have some vectors, some matrices, you put them together in whatever operation you want,
Starting point is 00:10:56 and hopefully you get the maximum performance out of it. So, I know there's a handful of libraries. We'll probably discuss this later on. But the primary intent is to have a library that really gets the maximum out of a CPU. So why, I mean, you just said that there's a handful of other libraries. Why did you start working on Blaze? What drove you to work on that? So now we have to go back in time, approximately 2010. So by that time, we already had multi-core CPUs. And I was looking for a math library and found that there were a lot of them, so not
Starting point is 00:11:30 as many as today, but quite a few. But none of them were parallel. So none of them provided any kind of real parallelism, two cores, four cores, everything was single core. There were a few exceptions, of course. A couple of libraries had a few operations that were parallelized,
Starting point is 00:11:48 but none of them were really parallel. And I also did a couple of performance comparisons and found that, well, I wasn't completely satisfied. And so I took on a new hobby project. This is what it was in the beginning. I started this in about 2010, and in 2012 I published the first version. Admittedly this version was not parallel but in the one point releases we yeah all the features
Starting point is 00:12:14 that were necessary to do some parallelization were added and 2.0 was really parallel not just a little parallel but we tried to be completely parallel. Everything that made sense in a way that was also, well, perhaps new. So we initially had OpenMP and Boost threads and C++11 threads, so a couple of kinds of parallelization, whatever somebody wanted to use. So what was your personal interest in a linear algebra library in the first place? Was this hobby or research-related? Initially, it was probably more hobby than research-related. So as Rob said from my bio, I was in HPC for quite some time. High performance was always important.
Starting point is 00:12:58 But the linear algebra stuff was something I was personally interested in. So I just tried out initially what you can do, and it grew and grew. I found it was fun, and at some point I just decided, okay, now it's probably good enough to show to others. I don't think I've looked at any linear algebra since my college days. What are the sorts of use cases for a C++ linear algebra library? There's quite a lot.
Starting point is 00:13:31 Whenever you have vectors and matrices, you want to do something with them. So vector additions to matrix multiplications, inverting a matrix, transposing matrices, whatever. Then such a library is useful because of course, you can
Starting point is 00:13:45 write some of these operations very easily. Vector addition is one for loop, reading from two vectors, adding the values, writing it to a third vector. But you would have to think about parallelization, about proper alignment of data, about vectorization
Starting point is 00:14:01 and all this stuff. Using such a library, you can just use whatever is in there very easily, very conveniently. You can basically write your operations as you would write them in a math book. So A is equal to B times C or whatever. That's convenient. So I guess, well, I guess I'm going to jump ahead a little bit here for what, but you just said, so it's a template expression library, right?
Starting point is 00:14:30 Correct. It's header only, based on expression templates like most of these libraries. You essentially only include one header file, and then can start doing some programming. So let's start in the, you've been working on this since before C++ 2011. Correct. How much pain was it to specify, say, the return types of your expressions without being able to do auto return types? So all this pain is not entirely gone. So we have now made the transition to C++14, and you still see a lot of the extra work we had to do without the auto keyword.
Starting point is 00:15:13 So there's a lot of traits classes that contain the return types, and you can always ask these trait classes, okay, what would be the result of this particular expression? So decltype was also not available, of course. So today we're going to use decltype, give me the return type of this expression. We did this manually. It's quite a number of classes that just encapsulate these return types.
Starting point is 00:15:39 I've been... No, no, go ahead, please. So it's not particularly nice, I know. But, okay, it didn't work in any other way back then. I've just been pondering recently how things like auto return types would appear to enable new types of expressions and APIs that simply were very nearly impossible before.
Starting point is 00:16:04 And I'm wondering if you've experienced maybe you're being able to do new things now that you are using C++11-14. Yeah, I completely agree. So we were stuck a little without C++11, but since we have now made the transition, we, for instance, could very easily add Fuse Multiply Add. So I'll explain it in a simple way.
Starting point is 00:16:29 So fuse multiply add is an operation where one addition and one multiplication are done together. One intrinsic operation. This is, of course, if you have additions and multiplications at the same time, in the optimal case, a factor of two in performance. However, you do not want to sprinkle FMA calls all over your code. You wouldn't also, like in expression template code, in other expression template code,
Starting point is 00:16:55 write some addition and some multiplication, and the compiler or the system should realize, oh, I can combine these two, make an FMA operation out of it. And now with the auto return type, what we did is just to implement FMA exactly like this. So now we have expression templates within expression templates, kind of a multi-layered expression template approach.
Starting point is 00:17:17 The other one is still the old one, the traditional one. The inner one, however, is building on the auto return type. So you just return some expression objects. The auto-keyword helps here. No extra work. And then the system assembles this to find, oh, now I can combine an additional multiplication called FMA instead. Cool.
Starting point is 00:17:39 So you recently announced Blaze 3.0. What are some of the recent changes in the new version? So in the one-point and two-point versions, we used C++ 98 primarily. One of the reasons was, okay, back then C++ 11 was not available everywhere, especially on these HPC machines. We wanted everybody to be able to use the library.
Starting point is 00:18:04 However, yeah, as just explained, C++11 and C++14 just provide so much more possibilities that we decided now it's time. Now there should be a new compiler available. And so the major change is truly the transition from C++98 to C++14. We jumped over C++11 because, well, we would have been stuck with C++11 if we would have announced that place 3.0 uses C++11 only. But now we have 14 and can use it for quite some time. So I consider this some kind of breaking change. People want to have the guarantee
Starting point is 00:18:44 that they can use a compiler for a long time. So 3.0 will be, or 3.x will be C++14. This is a guarantee. Okay. So speaking of compilers, what compilers does Blaze work with? It should hopefully work with all C++14 compilers. We do a lot of testing. We explicitly test GCC, Clang,
Starting point is 00:19:06 the Intel compiler, and Visual Studio. Okay. Of course, in all the versions that provide C++14. And this is actually a good thing to do because all compilers are different. You always find that these subtle differences.
Starting point is 00:19:22 I just recently realized that in the Intel compiler, the variable templates are not properly implemented. these subtle differences. I just recently realized that in the Intel compiler, the variable templates are not properly implemented. And so, unfortunately now I cannot use this feature, just because I want to provide C++14
Starting point is 00:19:36 compatibility with all compilers. So you have to really test all of them. I'm curious from a library maintainer perspective, if you are supporting the compilers that say they support C++ 14, or are you supporting the compilers that support C++ 1y? You know what I mean? Like the ones pre-standard 14?
Starting point is 00:19:57 So we tested a couple of 1y compilers, but for instance, GCC 4.8 gave us trouble. Okay. So we basically support every real C++ 14 compiler. Okay. I'd like to interrupt the discussion for just a moment to bring you a word from our sponsors. CLion is a cross-platform IDE for C and C++ from JetBrains.
Starting point is 00:20:20 It relies on the well-known CMake build system and offers lots of goodies and smartness that can make your life a lot easier. CLion natively supports C and C++, including C++11 standard, libc++, and boost. You can instantly navigate to a symbol's declaration or usages too. And whenever you use CLion's code refactorings, you can be sure your changes are applied safely throughout the whole code base. Perform unit testing with ease as CLI integrates with Google Test, one of the most popular C++ testing frameworks, and install one of the dozens of plugins like Vim Emulation Mode or Go Language Support.
Starting point is 00:20:55 Download the trial version and learn more at jb.gg.cppcast dash cline. How does Blaze compare to some of the other math libraries like Eigen? Okay, that's a good question. So, of course, the idea is the same. Provide some operations in such a way that users can very easily use them, and you should get the maximum performance out of it. Since you mentioned Eigen, Eigen is definitely much ahead in terms of features. Blaze is younger, we do not have that many developers, and so we are lacking a lot of features that Eigen already provides or has provided for years.
Starting point is 00:21:39 However, I think Blaze is even more focused on performance. So really the idea is every operation, whatever you use, even if it's something that's very, very rarely used, should run as efficiently as possible. And additionally, I think this is a unique feature, is the parallelization. Everything is running in parallel. So, okay, I should perhaps also explain the everything
Starting point is 00:22:07 it's not really everything for instance what is not parallel is sparse vector addition sparse vector addition just does not parallelize well, there is no performance benefit so I have to say everything that benefits from parallelization is running in parallel
Starting point is 00:22:22 I think this is something that only a few libraries, if at all, offer. Eigen, for instance, is focusing the parallelization only on the dense matrix multiplication. And a couple of other operations. It's very focused. So how do you make the determination as to whether or not something is worth parallelizing? Okay, that's also a good question. Of course, for very small vectors, matrices, parallelization doesn't
Starting point is 00:22:50 work well, so you would reduce performance instead of increase it. So there is always a threshold involved. If certain thresholds and the threshold involves the size of the vectors or matrices is exceeded,
Starting point is 00:23:06 then the parallelization kicks in. There's a configuration file for that. You just can manually adapt these configuration thresholds for your system to really get the maximum out of it. So far, it's set to, hopefully, reasonable values. So, of course, it's not parallelizing for tiny and small vectors, matrices, and it's, of course, parallelizing for large ones. And somewhere in the middle range, it has the switching. So, in some cases, it depends on the data types that you use. In some cases,
Starting point is 00:23:37 everything can be done at compile time. You know the sizes at compile time, so the decision whether to parallelize or not can be done at compile time. In some other cases, especially if you have dynamic sizes, of course, there's a small runtime check. Is it above the threshold? If yes, okay, then call the parallel kernels, else the serial kernels. So do you find that that threshold for when parallelization makes sense is determined by the compiler being used at all? It's probably mostly affected by the system you're running on, by the architecture. How much cache do I have? How much CPUs do I have available? How fast is the CPU? This is probably the main factors.
Starting point is 00:24:26 From a compiler point of view, I have to admit, I did not find a lot of difference. So small differences definitely, but not differences that would drive in to change these thresholds or to make them compiler dependent. So you're talking about parallelization, both like you do vectorization of the data, like at the CPU level and at the thread level, right? Correct. So the
Starting point is 00:24:48 vectorization means that I use SSE AVX, also AVX 512, so the upcoming AVX standard for the new Xeon CPUs. I think the name is Skylake from the new Xeon. This is basically done just
Starting point is 00:25:04 by the compiler. During compilation either the compiler figures out what architecture you're building for or you explicitly state, I want to use SSE. I want to use AVX. And then internally, the according vectorization is used. So the code is generated for this kind
Starting point is 00:25:20 of vectorization. It's just... Do you have to do anything? You're not using compiler intrinsics, you seem to be saying. I use intrinsics, but they're wrapped accordingly. Oh, okay.
Starting point is 00:25:34 Yeah, so there's a small sub-library. Depending on which kind of vectorization you choose, the encoding functionality is enabled. But the interface is all the same. The difference is just how big are those packs. The pack doubles. Is it 2, 4, or 8? Depends on the vectorization.
Starting point is 00:25:57 Okay. And then the parallelization is mainly data parallelism, which means you try to improve the performance of a single operation as much as possible. And there's a wrapper for that too. So if I find that something can be parallelized,
Starting point is 00:26:15 then I distribute the data to some threads. Then I give the threads a task. Okay, you compute this part, you compute that part. And then within each thread, the single core kernel is running, essentially. okay, you compute this part, you compute that part, and then within each thread, the single core kernel is running, essentially. So I don't know if it's clear. So at one point, it says distribute the work,
Starting point is 00:26:36 define smaller work units, and then the smaller work units are then handled by a single core. Okay. So are you following along with the C++17 parallel standard algorithms work that's being done? A little, yes. Do you think that that's something that you would be able to make use of in your library? Probably to some extent. I'm curious how they handled the...
Starting point is 00:27:00 So this is what I don't know, the detail, how they handled the size, because a small number of elements should not be vectorized, sorry, parallelized, as large numbers should be parallelized. This is something I'm very curious to see how this works. So essentially when the parallelization kicks in there, in that case. I believe it's, I don't know if it's an optional parameter or not, but I believe there is a parameter to the
Starting point is 00:27:28 parallel versions of the algorithms to say whether or not you want it to be parallel. So maybe it's user-defined. Okay. And it's easier than in my case. In my case, it's... Yeah. It's not user-defined, it's system-defined. Right. So are you currently writing all this
Starting point is 00:27:43 parallelization code yourself? You're not making use of some other library like HPX? I'm using OpenMP. This is something basically what you have to do yourself, but it's mainly pragmas. Pragmas around for loops in my case. I make use
Starting point is 00:28:00 of C++11 threads and also boost threads because they're so similar. And HPX is on the to-do list. HPX is a little different from a parallelization point of view, from a paradigm point of view, and I definitely want to try it.
Starting point is 00:28:16 I didn't have time to include it yet, unfortunately. HPX is probably very close to what you'll get from any future parallelization in the standard library, I would think. Right, Jason? From the algorithm standpoint, but they also have their lightweight threads,
Starting point is 00:28:32 fibers, or whatever they are called. Because I know in HPX, they said you can run thousands of threads, and it only actually uses so many hardware threads depending on what it needs to do. So you gave a... Oh, go ahead.
Starting point is 00:28:46 I would be very, very interested in seeing some comparison to the traditional approach. So OpenMP is what I would say is traditional. HBX is a little different. And I think they should compare pretty well. You gave a talk at CppCon, right? Correct, yes. How did that go?
Starting point is 00:29:03 So my impression was it was fine was working well unfortunately it was only about 20 people there this was a little disappointing given that there's of course 950 2000 participants however the talk was at nine o'clock in the morning and it was about linear algebra and so i should not have expected a lot of people to come. Still, I think it was an interesting talk. I hope that people watching it on YouTube will agree. I essentially showed what you can do with Blaze, showed a couple of performance comparisons,
Starting point is 00:29:41 a couple of graphs with the most common operations. So, of course, dense arithmetic, but also a couple of sparse benchmarks. Okay. How does the library compare in ease of use compared to something like IGAN or other math libraries? Is it a similar API? Very similar. Okay.
Starting point is 00:29:59 So basically, the algebra operations are always using the infix notation of operators, so plus, minus, times that's very very similar. It's a little different in those operations that it cannot be directly expressed in terms of operators like matrix inversion
Starting point is 00:30:15 and creating views on some vectors or matrices if I now say it's easier to use it's of course a very subjective view now say it's easier to use, it's of course a very subjective view. I hope it's easier. Of course, that is
Starting point is 00:30:31 in the eyes of the V holder. This started as a side project, open source project for you. Correct. If I understand right, you've had, let's say, a very rare success for your open source project, because it got noticed by your employer. Is that correct?
Starting point is 00:30:50 That is correct. So I live in Erlangen, as you said, and very close to Erlangen in Nuremberg is CDR Dapco. So that's the one office in Germany they have. And I was lucky that they actually realized that the person who's writing this library lives close by. And I was lucky that they actually realized that the person who's writing this library lives close by. And so they invited me, I talked about it, and they gave me a job. This was truly a very, very positive,
Starting point is 00:31:16 very lucky thing for me. It's not very many open source developers get that opportunity, I don't think. I have no idea, but as I said, I was truly lucky. So I'm guessing you get to use your library at work?
Starting point is 00:31:31 Yes. So it's now part of the major product from CDDapco called Stasis in Plus. It's not used within the entire product, but in a large portion already. And there's the discussion to use it in other places too so it's working pretty well yeah so it's definitely a success then i would say it is a success of
Starting point is 00:31:53 course yeah are you aware of other products or companies that are making use of it currently not yet unfortunately so i know a couple of um that use it, so probably just institutes that I know better, that know me, and so they use it. But I'm not aware of larger companies except for CDAT Upgrade that use it. This would be actually something I would be very interested in, and I would be willing to even put a couple of advertisement logos on the webpage.
Starting point is 00:32:28 But yeah, if people feel it's a good library and they use it, just send me an email. There's a possibility you'll get some emails after this airs. Hopefully. Do you have a lot of other contributors aside from yourself working on the library? So from an implementation point of view, I'm the one who is doing most of the work. I have help in the form of
Starting point is 00:32:51 a guy who is dealing with the Windows port. Visual Studio is always a little more tricky apparently, and he helps me with this. And I have a consultant from the Allen Computing Center who is helping with performance stuff. So if I run into performance trouble,
Starting point is 00:33:08 I go there, we discuss what's happening in memory, we take a look at the assembly code together, etc. And usually we come up with something that runs better. And there's a couple of smaller contributions from people who wanted to add just one thing.
Starting point is 00:33:24 So that's it. So you said the Windows port's always just a little bit tricky. Do you find yourself having to make compromises for what features you can use in C++14 because of other compilers? Sometimes, yes. Sometimes I have to revert a couple of things because one compiler does not accept it.
Starting point is 00:33:45 And probably the most reason was that I found that in the Intel compiler, 15, 16, even 17 variable templates do not work as expected. So I was implementing variable templates for basically all type traits, all the additional type traits that I have. But unfortunately, I committed everything, and then after that, tested it with Intel Compiler and found it doesn't work. This was a little disappointing.
Starting point is 00:34:12 So yes, I have to compromise to be able to provide something that works with all the compilers. But I think still it's worth the trouble, of course. If you only provide something for one particular compiler, people might not be particularly happy. And also, something like the Intel compiler is very, very common in the HPC community.
Starting point is 00:34:32 So I cannot say, oh, I don't support Intel anymore. This doesn't work. That's a good point. You mentioned earlier that one reason you started with C++98 is that supercomputing clusters tend to have older compilers on them. And I've seen tweets about this recently also from, I think Bryce mentioned it or something.
Starting point is 00:34:53 Someone did on Twitter recently. And so are they updating now? So now you, you know, what's the story like? Of course, I don't know all the big machines, but the ones that I know, mainly the ones within Germany, they have by now a compiler that is able to compile with C++14. So they try to update, and they try to provide more compilers. This might not be true for all machines, of course,
Starting point is 00:35:25 but the ones that use Intel architecture, they usually also have at least one compiler that can do C++14. So in many cases, it's GCC6. Oh, so very recent then. Pretty recent. Less frequently a client compiler, but often also Intel, of course. Intel 15 is
Starting point is 00:35:45 kind of developed everywhere. Probably by now 16, perhaps even 17. There's a couple of machines with IBM architecture, and IBM has a specific compiler for this architecture. This might be something that now has problems.
Starting point is 00:36:02 But, okay. So does IBM... IBM actually maintains their own C++ compiler? I don't know to what extent, but they have a specific compiler for their architecture. I don't know if they write a C++ compiler on their own.
Starting point is 00:36:18 If it's completely IBM, perhaps it's just a wrapper for something else. Right. Okay. But for those people who cannot use 3.0, I mean C++14, they can still use 2.6. I tried to add a lot to 2.6 before making the transition to C++14. It should be a good start, point to start at some point. Also in these machines, C++14 is available.
Starting point is 00:36:45 Do you find yourself still having to go back and do bug fixes on your 2.6 branch? I'm perhaps proud to say there is very, very few bugs in place. We do a lot of testing. We have an enormous testing effort. About one month before the release
Starting point is 00:37:00 of a version, we started to do testing. This is a really large effort. All the compilers with about 6,000 tests that do run a couple of hundred operations each. I hope, of course, the template library
Starting point is 00:37:15 is difficult in this regard, because you can combine everything with everything, but I think we have found a good compromise. So we get very few bug reports, which hopefully is not because so few people use it. It's hopefully because it's just working properly.
Starting point is 00:37:36 Right. Do you have a sense of how many people are using it compared to other math libraries? I unfortunately know the number of downloads. So it's usually in the range of 400 to 600 each version. This is, it's reasonable. Okay. All right.
Starting point is 00:37:53 You have any other questions, Jason? No, I don't think so. Okay. Klaus, where can people find more information about you and more information about Blaze online? So people can always write me an email to klaus.egelberg at gmail.com. That's the usual way to comment directly. And if you want to find Blaze online, just Google for Blaze C++ Math Library,
Starting point is 00:38:18 and you will definitely find it with a first hit. Okay. Well, thank you so much for your time today, Klaus. Yeah, thank you for the invitation. I really appreciate this. Thank you. Sure. Thanks for joining us. Thanks so much for listening in as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're
Starting point is 00:38:38 interested in. Or, if you have a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Leftkiss on Twitter. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.