CppCast - Insight Toolkit

Starting point is 00:00:00 Episode 314 of CppCast with guest Matt McCormick recorded August 20th, 2021. This episode is sponsored by C++ Builder, a full-featured C++ IDE for building Windows apps five times faster than with other IDEs. That's because of the rich visual frameworks and expansive libraries. Prototyping, developing, and shipping are easy with C++ Builder. Start for free at Embcadero.com. In this episode, we discuss another blog post on modules. Then we talk to Matt McCormick from Kitware. Matt talks to us about the Insight Toolkit Library for imaging analysis. Welcome to episode 314 of CppCast, the first podcast for C++ developers by C++ developers.

Starting point is 00:01:24 I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing? Doing just fine. I don't think I have anything particular to share. How about you? Well, I got something that's just completely random and only kind of tangentially related to the podcast. Sure, go for it.

Starting point is 00:01:41 I got an email from someone asking me if I wanted to be an influencer for their products on my YouTube channel. Can you, can you say what the product is? Is it relevant to your YouTube channel in any way? Interestingly, and I thought about this because of our guest, it is a standing desk company. And I would say there is a chance that this will actually go through and it'll get featured in an episode. I asked my patrons for their opinion first, and they're basically like, go for it. That's, you know, you know, people, programmers need decent desk.

Starting point is 00:02:14 So we'll see what happens. Interesting. Which also means that from the sake of recording these episodes, you might see me occasionally standing as well. I do like standing desk. I need to get in the habit of using it more often. I don't have one at my home office, but I do have one at the, uh, my work office. Yeah. Well, I think one of the things that kind of excites me about this desk that I'm looking at is it will even go four inches lower than my current desk. So depending on my mood, I could be like totally slouched back back like destroying my spine or standing up i need

Starting point is 00:02:47 to try to not destroy my spine that's why i need to stand up more all right well uh at the top of every episode i threw a piece of feedback uh jason this week we got this hot take from chris on include c++ saying hot take regarding justin minor's episode i think the plethora of compilers for c++ is a weakness not a strength until recently i maintained a project that targeted windows linux and mac we had hundreds of lines of cmake which was then duplicated across teams to hide the slightly different compiler flags we had to use for gtc clang and apple clang msvc was the biggest problem child and the day i can stop supporting their front end will be a choice occasion. I consider it a significant selling point of Rust and D that I can use the same compiler across platforms.

Starting point is 00:03:31 Fascinating. I will just go ahead and because I have the power to do this, address this comment. Sure. I have had to support at least three platforms and every project that i've worked on since 2003 yeah so like 18 19 years now and uh apple has consistently been the problem child for me but the fact that uh we have multiple compilers has almost never really been the problem it's been the ecosystem on the operating system that's been the problem. Sure. Yeah. And then certainly, you know, what he's talking about having to configure all of this is a legitimate thing that you need to spend time doing. And I guess you wouldn't,

Starting point is 00:04:15 if we only had one compiler, but as we've talked about before, you know, you get better code from running it on multiple compilers. You, you that you wouldn't if you had only one compiler to work with. Well, and I don't get the argument of code duplicated across the teams because that's like the whole point of CMake is I've got a couple little modules that set the flags appropriately for that platform. And then everything else is identical.

Starting point is 00:04:44 Yeah. I'm not quite sure what his point is there. I'm not sure. Well, I guess Chris will probably hear this episode. We'll have a very long protracted argument of once a week updates back and forth. Or we can talk to him on the Discord. This was on the Include C++ Discord.

Starting point is 00:05:01 Okay. Okay. We'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cppcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Matt McCormick. Matt is a principal engineer on Kitware's medical computing team in Carrboro, North Carolina. His experience spans multiple medical, biological, material science, and geospatial imaging applications.

Starting point is 00:05:27 As a subject matter expert, he makes and stewards open-source technical contributions to scientific image analysis communities. He's been coding in C++ since 2008, when his graduate studies got him involved in the Insight Toolkit and CMake communities. Matt, welcome to the show. Thank you for having me.

Starting point is 00:05:42 You're one of the lucky few who is paid professionally to work on open source projects. I'm very honored. Yeah, I count myself lucky every day to do that. And on the topic of having to support multiple platforms, assuming you were paying attention during the intro there. Just out of curiosity, I know we'll get into this later. But can you just list the platforms that you actually do deal with on a regular basis? So on a regular basis on CI, we support Mac, Linux, and Windows and different compilers across them, the Intel compiler too, a little bit. And more recently, we've been supporting ARM. And also a big a big target too is web assembly too so yeah

Starting point is 00:06:29 so on arm do you support uh what mac arm do you do windows arm do you do linux arm do you do all three we do python packages so bindings around the C++, not for Windows, but for Linux and for the Mac M1 processor, too. Windows, we haven't tried. I haven't tried that recently. I'm not sure if that works, but it might. Well, if it supports all the other platforms, I'd say there's a very good chance it wouldn't be very painful. Yeah, so I now work at Kitware, which is the company that helps Stuart CMake. And you can explain this relationship a little bit more, but I do a lot of work on CMake, of course.

Starting point is 00:07:20 And that's related to the project. And so I write my share of CMake in addition to C++ code too, as I'm sure many of you do and many people listening do too. So I feel like any question we ask you about the utility of CMake actually helping this process might be a little, that's the word, like it's almost like nepotism, right? Yeah, that's a little bit. I can complain about cmake like they're like anybody else can too but uh i'm a little biased okay all right well matt we'll we'll start talking to you more about uh you know this medical imaging stuff that you work on in a bit but first we have a couple news articles to discuss uh so feel free to comment on these, okay? Cool. So this first one is a blog post, C++

Starting point is 00:08:07 20 Modules with GCC 11. Jason, you want to tell us a little bit about this one? It seems like it's, you know, we talked about Microsoft's they had their own blog post about modules, was that last week or last episode? Here's another one using GCC.

Starting point is 00:08:24 It's very in-depth. And this one even comments that most of the articles that you see about modules seem to come from the Microsoft team. Which he points out, they do seem to have the most fully featured modules implementation so far, but he targeted GCC and got pretty far with it. My favorite part of this article, just for the record, is it kind of teaches in the same way that I like to teach, at least in like my C++ Weekly episodes, where the author makes a bunch of mistakes along the way. Like, well, if you try to do this, it's not going to compile. And sometimes I get comments from people on my YouTube episodes that are like, it seems like you had no idea what

Starting point is 00:09:00 you're doing. And I'm like, no, I was trying to make the same mistakes you would make so that you would see what happens when you make those mistakes. And I really appreciate this author did this because I feel like this made modules make more sense to me than any of the other more technical articles that we've seen in the past on the show. But it's just a step-by-step walkthrough of how you can build with modules in GCC. Nice. Is it time to make a C++ Weekly episode on that yet, by the way? I feel like I almost could based on this article now. What do you think, Matt?

Starting point is 00:09:30 Have you done anything with modules yet? I haven't. I haven't. I read the article. I agree that it was a very good article. I think it's pretty exciting. You know, I've done multiple experiments in the past with the library that we work with. It's highly templated.

Starting point is 00:09:44 So I've done a lot of experiments with pre-compiled headers. And those were painful experiments and mostly failed experiments. So it takes a lot of work, especially making it across different compilers, and the benefits just weren't there. Looking for the improved compile time, and I think it's really exciting, the module work.

Starting point is 00:10:08 And I know a lot of folks to their credit at Kitware have been working hard on the build system support there. And all the standards people in the community have done a lot of work to make that happen. It's difficult, as it explains in the article, with all the header history we have in the language. But it's really exciting working in other languages and seeing how modules and build systems and package ecosystems, kind of that opportunity coming officially to C++ is quite exciting, I think. So do you know any idea like what the actual status is there? When can we expect CMake to actually say we officially support modules or something like that? I mean,

Starting point is 00:10:50 ballpark. Yeah, I'm not sure. There is, you know, some support. I'm not sure the status of the support in the community. Maybe you guys have a better idea too. I've heard that C++23 might have, even though technically it's in 20, 23 is when you should really start trying to use it. I guess this article pointed out that it's not completely available with C++20 and GCC. Right. Yeah, I think the attitude of C++23 is when we'll actually be able to use it is just because people expect that's when our compilers and tools

Starting point is 00:11:22 will actually support it. All right. just because people expect that's when our compilers and tools will actually support it. Okay, this next one is an update to JSON for modern C++ version 3.1. I think we certainly commented on this library a couple times before. This is actually the first release they've put out in a year. And the big feature here is that in in their diagnostics if you like throw an exception from the library you'll now be able to see the like json blob of data to help you debug which sounds pretty handy that does sound handy and also a gdb print pretty printer is available very nice very nice anything else you wanted to call out?

Starting point is 00:12:06 To me, one of the more interesting things here is that they've updated their CI toolchain, which is relevant to a conversation that Rob and I were having off the air a moment ago, so that you'll help them actually release, make releases faster, if I understand that correctly. Yeah, fully reworked, overworked CI,

Starting point is 00:12:22 which performs a lot of checks for every commit, which should allow for more frequent releases in the future. Future. Goodness gracious. More frequent feature releases in the future. Very cool. Very cool. Okay.

Starting point is 00:12:34 And then the last thing we have is this post. Cute Multimedia has a new friend. Cute Multimedia is now available to run on Cute for WebAssembly. And I don't think we knew that Qt for WebAssembly was a thing. Did we, Jason? I don't recall talking about it before. I think it might have come up briefly before. I'm not sure. What you mentioned that you target WebAssembly, right, Matt? Yes, yes. Building to it and then also interfacing with it. And it's mostly numerical computing, but

Starting point is 00:13:08 I know a lot of projects too who've had success with related projects that have WebGL support and I wasn't aware that they had I know the Emscription project was made for games and they have some audio support, but I'm really

Starting point is 00:13:24 impressed by the fact that that works yeah still but i can't where you all use cute at least in some of your projects do your projects use cute with itk and web assembly or is that like two different things going on here right yes there are some projects that use cute and and itk together, but for the web work, yeah, we usually do native web interfaces, so HTML, JavaScript, and all the nice frameworks that are out there. They're quite amazing in the web world. I know Qt is very strong and very good for our desktop applications, but the web world does have a lot to offer in that place, and it's more native.

Starting point is 00:14:08 Okay. Well, Matt, we've kind of hinted at it a little bit, but do you want to start off by telling us what exactly the Insight Toolkit is? Sure. The Insight Toolkit, it's a library for image analysis, so scientific image analysis. It was built for medical images specifically, which are images like ultrasound images and MRI images, CT images. And it's used for medical imaging and also things like microscopy and material science, remote sensing images. I don't know if those are some of the projects you've been involved with, Jason, with Kitware,

Starting point is 00:14:46 but the difference is really the difference with your camera, where you have a 2D image that might be unsigned char pixels, and they're laid out, they have uniform sizes of the pixels. This type of data, it's larger, it can be 3D, and it can be oriented in different, have different sampling rates in different directions. And so the library supports processing these images and kind of doing some traditional image processing with them. Okay. So, no, I don't, I've never used ITK, I don't think. So the projects that I work on tend to have relatively simple 2D

Starting point is 00:15:27 visualizations, maybe like a heat map kind of thing, but I don't think anything as complex as what you're talking about with ITK. Yeah, so it's made for the processing, but also the few tasks that you have, especially with these scientific images that are a little more complex, then it does things like reducing noise. But in many cases, you'll have to do image registration. And that's where you're finding out the alignment between multiple images. So you have a tumor evolving over time. What is the change of volume of that tumor or you have different modalities and you want to compare pair them different imaging modalities and so it does

Starting point is 00:16:13 it does that type of operation and also segmentation too helps with segmentation so identifying structures in these these three-dimensional images so So that's kind of the role it plays. It's more the traditional image processing. It's used with a lot of the machine learning AI libraries that you have today in conjunction, although it doesn't explicitly do that itself. Before we dig more into the capabilities of the library, could you just tell us a little bit about the history?

Starting point is 00:16:44 It seems like ITK has been around for quite a while. It has been around for quite a while. So back in the GCC, two, three days or so in time. I've been with the project for over 10 years, but it's over 20 years old. It started in 1999 is when it started. And it was started to a really unique and interesting history. It was started when they had the Visible Human Project at NIH. So that was kind of like the Human Genome Project where people sequenced the entire human genome. In this case, they imaged an entire human being from toe to head, or two actual human beings. These were people who were on death row, and they contributed their body to science, and they imaged them with all the different imaging modalities they have,

Starting point is 00:17:37 and did high-resolution imaging, which was an incredible data set. But of course, just having the data doesn't tell you as much as you'd like so they created the insight toolkit so you could get insights from those pixel pixels those those bytes so when you say like all the different modalities like can you give us like slightly more specifics like we're talking like uh mri data cat scan data like actual visual image photographs or like what like how does that yes so the visible human project that was mri which is magnetic resonance and that's where you're looking at the response the medic magnetic environment of of tissues of and the water content and how it moves there.

Starting point is 00:18:29 And then there's also the CT images where you're irradiating and seeing the attenuation of the x-rays. And then they also did slicing. For this data set, they sliced them and then took an RGB image of the slices at a high resolution, which is quite unique. So we're talking, not to get too gruesome, but actual slices of the body have been imaged and then stacked together to reconstruct the 3D visualization. Exactly, exactly. So those are the types of modalities,

Starting point is 00:18:59 and now it's to use with microscopy, which is, again, kind of a lot of 3D modalities. I'm sorry, microscopy, is again kind of a lot of 3d modalities or i'm sorry microscopy if you can just there's all different types of microscopy methods but um in many of these cases they're using light or um different ways of using light um or higher energy um types of radiation, and looking at how the tissue or the medium interacts with that light. So those are the different types of modalities we deal with. Also PET, that's another medical imaging modality where radiation is coming from inside your body.

Starting point is 00:19:44 The skin injected, you might have got one of these they're looking for cancer in your body they'll inject you with something that's radioactive and and see where that goes and see if it concentrates and look for signals in that okay data yeah so then with itk all of this can be merged into one data set in some way. Right, right. And you can process it in 3D. So back when it was started, C++, you needed to have C++ because 3D is times N and larger data sets. And so the system was designed to do things like stream processing

Starting point is 00:20:22 of the data sets, not loading the entire data set into memory at one time. And multithreading and these type of operations are important too for working with that type of data. I'm just trying to wrap my mind around this. So if I wanted to say I have all these data sets that are all merged together in this visual body project, and now as the programmer or the user of the visualization in some way, what do I just say?

Starting point is 00:20:47 I'm curious about this 3D volume right here, and then you give me back the things that exist there or what? Yes. So you have a giant chunk of data bytes, and that doesn't have very meaning. That's just a lot of raw data. So segmentation is the process of identifying and labeling what I'm interested in, like this is the liver

Starting point is 00:21:11 or this is a tumor and isolating that. And then that allows you to either visualize it in a meaningful way or quantify, get quantifications, which is ideal so you can really put numbers onto what you're seeing in the data. So then does it do like, once you mentioned like 3D data, like does it do something kind of like photogrammetry?

Starting point is 00:21:36 Like it has all of these things and now it can give you a three-dimensional reconstruction and tell you like a three-dimensional structure or point cloud or anything like that that exists? So the derived data structures and types you get to are important too so it does support meshes and points point sets and uh you know identifying tubes so there's a lot of great work that's done because vessels are important in our body so working this those other data structures are important too so yeah i'm sorry i like i just for some reason in my head i thought of itk as a visualization thing not as a data processing thing because i've seen those initials around before but

Starting point is 00:22:17 i guess i missed the point well yeah visualization is important so it's it's it's the analysis side and there's another toolkit that came up, has the same pedigree, the same kind of people worked on it in the beginning. It's called the Visualization Toolkit, VTK. So you often see together ITK, VTK. And I have an ITK, VTK application. So the most common thing would be an ITK, VTK Qt application. And they build off each other, right?

Starting point is 00:22:44 So they build the analysis of just trying to figure out and isolate what is important to visualize is is coupled with the visualization too so if you're trying to do visualization you typically want to use these things together so is this like high level enough that i can like throw all the data at it you just said identifying tubular structures throw all the data at it? You just said identifying tubular structures. Throw all the data at it and be like, show me the blood vessels in here. Unfortunately, no. Unfortunately, no.

Starting point is 00:23:11 You do have to have some domain knowledge and some algorithmic knowledge. Well, there goes my plans for the weekend. Yeah, that's okay. That's okay. We have a book. We have a book that's 1,000 pages. It's only 1,000 pages. But if you read that book, you'll be able to figure it out. But yeah.

Starting point is 00:23:30 Interesting. So aside from like having, you know, lots of domain knowledge, what, you know, what does it look like using these tools to create an application? Are lots of applications being built all the time using this or is it just kind of a handful of tools built around them? Yeah, so most people who use the tool, the software, are using it from an end-user application. So there's applications out there like 3D Slicer is a popular application. It's used in research, in many research contexts to help people do who are doing research to analyze their data quantify their data it's also used in commercial projects so a lot of the

Starting point is 00:24:12 commercial imaging systems might use it underneath the hood for the software that they use but most users are using this these end user tools that are cute or maybe a web-based tool and going down a layer from that, there's Python bindings too. So if you want to program yourself, people that aren't as technical are using the Python bindings. And then there are the wizards, C++ wizards. Of course, the C++ library at its core, and you can use it from from the c++ context too it sounds like a lot uh a computationally

Starting point is 00:24:48 intensive system are you taking advantage of any parallel cluster gpu kinds of things we do use we use multi-threading is is kind of ubiquitous throughout the toolkit, just the native. We've moved from handspun platform-specific threading to the C++11 thread pool is kind of standard what we use as the default. And then we've been looking to add more and more support for GPUs. That's been difficult. We've tried many things in the past. Of course, GPU programming is difficult to do in general, GPUs. That's been difficult. We've tried many things in the past. Of course, you know, GPU programming is difficult to do in general.

Starting point is 00:25:30 We also have a big focus on cross-platform support and making things work cross-platform. So there's some OpenCL and CUDA support for GPU, but more is needed in the future.

Starting point is 00:25:47 I'm really excited. I haven't done any work with the C++ executor support that's supposed to be coming, but I've seen that and I think it might finally allow us to do that and program it in a reasonable way and hopefully get cross-platform support in a reasonable way. Someday in the future, C++ and the community will help us make that happen.

Starting point is 00:26:08 I'm hoping. I want to end up the discussion for just a moment to bring you a word from our sponsor, C++ Builder, the IDE of choice to build Windows applications five times faster while writing less code. It supports you through the full development lifecycle to deliver a single-source codebase

Starting point is 00:26:24 that you simply recompile and redeploy. Featuring an enhanced Clang-based compiler, Dyncomware STL, and packages like Boost and STL2 in C++ Builder's Package Manager, and many more. Integrate with continuous build configurations quickly with MSBuild, CMake, and Ninja Support, either as a lone developer or as part of a team. Connect natively to almost 20 databases like MariaDB, Oracle, SQL Server, Postgres, and more with FireDAC's high-speed direct access. The key value is C++ Builder's frameworks, powerful libraries that do more than other C++ tools. This includes the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey framework for cross-platform UIs. Smart developers and agile software teams write better code faster using modern OOP practices and C++ Builder's robust frameworks and feature-rich IDE.

Starting point is 00:27:12 Test drive the latest version at Embarcadero.com. So are you actively, you know, keeping track of the latest C++ versions based on what you were just saying, I'm guessing? So, you know, are you using C++ 14, 17, 20 in ITK? Right now we require C++ 11, and C++ 14 we're going to have a requirement to soon. But we have a large community and many different users and many different platforms,

Starting point is 00:27:41 so we have to support the... What's the oldest compiler that people are using out there? GCC 2.95. The greatest. Oh my goodness. It sounds like from what you said about your experience and when you got started, I'm guessing when you got started,

Starting point is 00:27:58 people were still using 2.95, even though it was already out of date. Yes. Yes. And, and then yeah, three came and it it's it's really amazing to see i know you've been doing c++ for even longer jason but i think it's really amazing to see how much better it is now and how much faster things have evolved especially in the last five years or so um so yeah i used the non anders, non-compliance across Visual Studio

Starting point is 00:28:25 to a much greater degree than there is today. And big kudos to folks like the compiler developers who do things in a standard way and are advancing the standards now. It's really exciting to be working with C++ and working with the community now. Yeah, it does seem like there's much faster adoption of newer compilers today. I know my first job, GCC 3.2, was already pretty well established as a good solid version of GCC. But across the board, you still saw people refusing to move past 2.95, specifically 2.95. I don't know what made that particular version so special.

Starting point is 00:29:05 I should go back and look that up at some point, but it was crazy. So with all the platform support you just mentioned, I'm curious if people are doing things like this kind of advanced image processing on like Android or iOS. Yes. So that's one part of the motivations for the ARM support, too, in terms of the interesting systems that people are targeting, Android and related to the image processing. One of the topics I work on is ultrasound, and ultrasound has gotten smaller and more portable recently,

Starting point is 00:29:43 and that's really exciting. You can go out there and buy handheld portable ultrasound systems for less than $10,000, less than $5,000, and power it and hopefully program it with these armed tools. So that's one of the potential areas for really exciting development and growth. I saw something recently about how super portable ultrasounds could like literally change people's lives and developing nations. Absolutely. Absolutely. You know, it's, it's affordable and it can be used for so many things. So, um, and it's a great opportunity for

Starting point is 00:30:22 people who want to try things and program against it. You don't have to buy a multimillion-dollar MRI system. You can get access to these systems, which is really exciting. So you've mentioned MagMedical has come up a bunch. Is there other kinds of image datasets that people process with ATK? Another big one that we're working on recently are microscopy. There's new types of microscopy images that will generate data that is terabyte size for a single data set. Yeah, so that there's a lot of new challenges there in terms of computing. And you know,

Starting point is 00:30:57 how do you handle that, that type of data? How do you work with that type of data? And so that's where a lot of the new challenges, the new developments are, are being able to take that data and not just collect it now, just like we were doing in the past with the visible human data. It is being able to learn quantitatively from those data sets and process them. You got to forgive my ignorance on this again. So if we can just come back to microscopy, we're talking about imaging of very small things, right? Yes, yes. But there are ambitions to take that in aggregate and get the entire body. So there's some groups that would like to characterize at a microscopic level, the entire body. It's a very ambitious goal and and it's very uh difficult to do but um but there that you know there are new frontiers that are being taken on in that sense yeah so

Starting point is 00:31:53 it's cells and subcellular structures is what these systems are oh wow yeah for the entire body next step transporters you bet so we're talking you said you know like visible light so microscopes or related or electron microscopes that kinds of things for imaging at the very small level correct right okay i just feel like there's other imaging technologies that we just haven't mentioned that i don't know anything about. But I don't know what questions to ask about how this could be used in other ways. I think, I mean, there are a lot of interesting imaging modalities. There's more and more open datasets out there, which is exciting. You know, that's a big focus of the, not just the software we work with with open source software, but open source data sets is something that we try to advocate for and make available in the community.

Starting point is 00:32:50 So there are more open data sets that you can get access to to process and explore that are coming online. For medical imaging data sets, the NIH has a great resource, the TCIA, the Cancer Imaging Archive. So there's a lot of datasets out there that you can actually get access to. So are you implying that there's new things to discover in these publicly available datasets and people are still analyzing those things? I think it's exciting. It's exciting and interesting. And, you know, medical, the medical field is our bodies and how we work is, has been fascinating in the past few years.

Starting point is 00:33:32 I think a lot of students have kind of been enamored with AI. Everything is AI and everybody in the industry is looking at AI, but, um, like maybe the, the enamorment there might wear off a little bit and people should come back. And there are a lot of interesting things in the field and welcome collaborators and contributors to the project. Let's talk more about that. I mean, how large is the community working on ITK and are you always looking for more developers to come join?

Starting point is 00:34:04 We're definitely open and welcoming new developers to join the project. We have maybe 50 contributors every release. We do roughly biannual releases, feature releases, and patch releases after that. But it's a broad spectrum of people from working in universities or companies or people in the open source community contributing. And we welcome contributions. We're on GitHub. So you can do the Insight Software Consortium is kind of the group that is a nonprofit consortium group that supports the development of the toolkit. What is your specific role on the team? I don't think we've actually discussed that yet. So I do a lot of the release management

Starting point is 00:34:51 and supporting and stewarding development in the community. So code reviews and providing introductions and guidance for people in the community. Is it? I'm sorry, go ahead. That's what I spend a lot of my time doing, but it's also, you know, applying it for different, uh, applications, projects. So we do commercial projects and we do government funded, um, projects from, from the national labs and, and research grants to at kitware. So it's probably, is it fair to assume you do have a fairly extensive

Starting point is 00:35:28 automated build environment of some sort? Yeah, we moved to GitHub CI a few years ago. It was a new challenge with C++ and a very large templated library to make things run under the time constraints and the resource constraints that they have. But, yeah, the GitHub CI keeps us on the straight and narrow

Starting point is 00:35:51 and developing confidently, relatively confidently. And we also use, I don't know if, I'm sure everyone's familiar with CMake, but there's also a dashboarding system, C-Dash and C-Test. So we use all those tools. They're part of the CMake suite of tools for our testing

Starting point is 00:36:09 and monitoring, parsing all those different warnings and errors that we get as part of our builds. So I'm not sure if we've talked about those before. C-Dash? No, I believe we've never mentioned it. I have actually used it, but it's been a long time. Do you guys use CTest? I use CTest, definitely, yes.

Starting point is 00:36:30 So CTest can be used with GTest and other instrumentation, but it outputs a specific XML format that is uploaded to this web-based dashboard that will parse and tell you what is the platform, what are the different warnings, and what are the different test failures, the errors, what is the output, and provides a nice visualization. You can see the open source projects that are using it.

Starting point is 00:37:02 Our hosted instance is open.cdash.org. It's hosted by Kitware. And you can use that. Recently, I think CMake also added support for JUnit output too. So you can use that in addition, but C-dash is one option. I think for the sake of our listeners here,

Starting point is 00:37:22 you mentioned G-test, and I only just realized that g test support parsing of uh of what what am i trying to say extracting your g test tests is built into c make and there's also a module that ships with catch two to do the same behavior of you just point it at your C++ file and it'll extract the list of tests so that you can get those straight from C test. Cool, cool. I'm not familiar with catch-2. What is catch-2? Another unit testing framework. Another unit testing framework, yeah. It's very similar to gtest.

Starting point is 00:37:59 Well, in nature, in principle, it's quite a bit different. You mentioned that you know puts out i think multiple releases a year what are some of the new features being added to the toolkit after it's been around for like 20 years right right yeah good question so more recently we've been bridging with the machine learning ai libraries is a lot of our focus so and mostly that's done on the python level so that's kind of our how we talk to a lot of our focus. And mostly that's done on the Python level. So that's kind of how we talk to a lot of other things is through our Python bindings. But bridging with PyTorch,

Starting point is 00:38:33 and there's actually a very nice higher-level interface for machine learning called Manai, and especially medical applications that makes it easier to do reproducible and effective machine learning. So we've been bridging with the Manaya community. And then also for these very large datasets, we're talking about these terabyte-sized microscopy images.

Starting point is 00:38:57 We do some bridging with Dask, which is a way to do distributed computing effectively in Python Python too. So those are some of the developments there. In terms of C++, we've been, right, we're at C++14. We have some people in the community that are amazing, do some amazing C++ work. They've done some, taken advantage of the nice syntax that you can create with C++, modern C++ work, they've done some, they've taken advantage of the nice syntax that you can create

Starting point is 00:39:25 with C++, modern C++. So we have these n-dimensional images, right? So we have a contributor who created a way to do a range over an n-dimensional set of pixels. So some just nice modern C++ syntax. Yeah, so how many dimensions can there be to an image? N, but in practice, you know, three or four. Three or four is usually what works well without running out of RAM. Okay, four I can get. Four gives me like, okay. I mean, three, excuse me, three, I can get that's like a voxel or something, right? Like that's what I am imagining when I see, when I think three or a point cloud or something like that.

Starting point is 00:40:14 What, what is the, what is the fourth dimension of an image? Yes. So in the library, you know, it has good support too. That's kind of kind of unique versus a lot of the other image processing libraries. For the pixels, they can be floats. It's more common to have a float pixel when you're working with an ITK or a long 64-bit int. And so you can have that type of pixel type, but you can also have these multi-component pixel types so that can be the different ways of looking at a piece of tissue from different imaging modalities and what is very common now with machine learning right is is you have many different channels so it used to be the different components multiple components of a voxel or pixel were components, but now it's usually called channels, and those are the different ways of regressing or modeling a certain location in space.

Starting point is 00:41:16 So that's the fourth dimension. Okay. mm was offered a job what feels like a lifetime ago now where i would have been working for medtronic which i know on their toolkit this is this is a it was a cute based job um doing visualization of the placement and so the idea is someone's having brain surgery done they've got special tools with 3d tracking on them and they can overlay where in the human's body that tool is based on ct scan that was taken before the surgery and i'm thinking that if i had gone down that route in a different lifetime i might have ended up spending the last 10 years or whatever working with ITK and VTK and those things.

Starting point is 00:42:10 Yes, that's exactly the type of application where it gets used. So we use a lot of care because of how we handle things like metadata. It turns out it's really important because you don't want someone sticking a needle in the wrong part of your brain in that type of context. So there's some sort of responsibility you feel working on libraries like this. Yes, surgeon, could you please not put the needle in the wrong part of my brain? I would pay extra money to have that happen, I think. That's why tests are important. How do you actually test a system like this? So CTest is just looking at the output.

Starting point is 00:42:59 So a lot of our tests are regression tests of a baseline image. So we have an expected image and do some processing, and we compare the output, and then we look at what how that compares to the input and of course you find all these wonderful things that you experience with c++ and different platforms and different processors and especially floating point numbers and things that you you know you have this experience where you learn that computers are not as deterministic as you expect them to be, which is very beautiful. And multi-threading, how that impacts things.

Starting point is 00:43:34 That's interesting because I sometimes get an attitude from people that's basically like, a dash-f fast math is good enough. You should be allowed to use that whenever you want to. And, and then I hear things from people like, um, you know, compiler developers that say only use fast math if you don't actually care about the results. And I'm curious now, like in your quest for both performance and accuracy being extremely important here, are there like optimization flags that you're like, no, you can't use that. It breaks the system. There are, yeah, I mean, that's a great

Starting point is 00:44:09 question. Yeah, fast math, we don't use that, but there are a lot of issues with floating point numbers, especially floats that we encounter and have to deal with. For the different processors,

Starting point is 00:44:25 you get the x86 extensions. And then also when you're working with a lot of pixels, adding up floating point numbers, it just becomes unstable. That's kind of another problem that we see rise in many cases. So we have, I ended up putting a lot of work into making sure that we can add there's a there's a algorithm called um khan's summation if you've heard of

Starting point is 00:44:52 this before i think i have actually yeah but no please explain it um i mean it's something that you know maybe folks may encounter where you know floating floating point number only has so much precision. And if you have a number and you're adding a small number to a sum that you're summing up, of course, that sum, its precision, it doesn't bring in the precision of the smaller number in a very effective way, essentially. And there's techniques that you can use to avoid that causing issues in your results, losing precision there. That's something that we encounter in practice. Another issue that we encounter a lot in practice is when you're doing things like summing

Starting point is 00:45:39 or doing that computation with multiple threads, and you know how those operations occur depending on your system you might have many different threads and then you get you get a different result just depending on how many threads you're using even if it's the same input data the same operation so those are the kind of issues that we we encounter do you well i know you said you have regression tests but i I just am imagining this world now when you're, you're working on it, someone adds a new bit of math, and you have to call into question like every summation or multiplication and floating point that they do, you're like, No, no, no, no, no. Like, is it that bad? It's, it's not like a lot of these nuclear

Starting point is 00:46:21 simulate nuclear simulation codes where you cannot lose any energy in the system. The answer is really, it depends. It depends on the task that you're trying to do, how critical it is. Sometimes a little wiggle room is okay. Okay. I feel like we should ask you a little bit about CMake since we have you here and you work on that too. What does the CMake build of ITK look like? Well, of course, we use CMake.

Starting point is 00:46:54 We advise it strongly. It looks pretty good. You can blame me for CMake to some degree or thank I think depending on who you're talking to. But something that we are pretty proud of is CMake, it actually came out of ITK many years ago. Yes, yes. So the project was started with this visible human goal, and CMake was just the itch scratching for allowing different users to build the code across platforms.

Starting point is 00:47:28 And it's just open source projects started from there, and then it's finally made its way to its use today. So if I go and look at the way CMake is used in IDK, is this like the premier example that I should duplicate in all of my projects? No, don't. Yeah, we need to do some modernization, too, of our CMake as it evolves, too. But do look at the CMake docs and kitware as an example of what to do. I did have one more question.

Starting point is 00:48:11 I think we're getting about close to wrap up here, but you mentioned that your Python libraries are a large part of this. How do you create Python bindings for C++? Do you use any tools or is it all like hand spun or what? Yeah, that's another great point and something that folks might be interested in. Brad King, who leads a lot of development of CMake now, he worked on 9toK for a while. He helped develop some of the tooling for our Python bindings. That's very useful. It used to be a project called GCC XML

Starting point is 00:48:45 because it was using GCC. Now it's using Clang, and it's called CastXML. And so that's something that parses C++ and generates an abstract syntax tree in XML, and it can be useful for many different things. So we use that for generating our Python bindings as a very large toolkit. They're more or less auto-generated

Starting point is 00:49:10 in that way with CMake code. A lot of CMake macros in this tool and some Python. And then we generate Swig bindings and use that there. Cast.cl is a great tool folks should check out and use.

Starting point is 00:49:26 I have never heard of that. Yeah. That's interesting. Okay. Well, it was great having you on the show today, Matt. Where should people go to go and learn and find out more about ITK, and where can they find you online? Great.

Starting point is 00:49:39 Yeah. Thank you so much for having me. Yeah, if you want to learn more about ITK and use it or contribute to it, very welcome. Go to itk.org, and that's where you can find the project. You can find me at Twitter or GitHub at T-H-E-W-T-E-X is my handle there. And I look forward to talking to people. Okay.

Starting point is 00:50:02 Thanks so much. Thanks a lot. Thank you. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your

Starting point is 00:50:16 thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at RobWIr and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Left2Kiss on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on

Starting point is 00:50:40 the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

CppCast - Insight Toolkit

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.