CppCast - OpenVDB

Starting point is 00:00:00 Episode 227 of CppCast with guest Ken Museth, recorded December 18th, 2019. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. And by JetBrains, makers of smart IDEs to simplify your challenging tasks and automate the routine ones. Exclusively for CppCast, JetBrains is offering a 25% discount for a yearly individual license, new or update, on the C++ tool of your choice, CLion, ReSharper C++, or AppCode. Use the coupon code JETBRAINS for CppCast during checkout at www.jetbrains.com. In this episode, we talk about Stood in Bed and a cute update.

Starting point is 00:01:07 Then we talk to Ken Musiff, CEO of VoxelTech. Ken talks to us about his work on the Academy Award-winning OpenVDB library. not that i hear yeah yep wait no but see you just got quieter again then that's skype doing that you can go in and change your skype auto auto um auto gain adjustment setting no worse no worse it's funny for it to come up right now uh it's under your audio and video settings and then there's an automatically adjust microphone settings uh slider yeah mine at the moment's at nine which is higher than it used to need to be, and I'm just a little confused by that, but I don't know if it was one of the Windows updates recently changed something. Yes? Yep. You just went away again, Rob, I swear. Yeah. It's not just me. I'm not crazy.

Starting point is 00:02:25 No. Oh, it just, it auto-adjusted even though it was turned off. I've seen that happen with Skype specifically. No, it's not Audacity doing it. It's Skype. I'm, I think also if you exit and reopen Skype, it should actually save the setting. Good old Skype, microsoft all right well theoretically if you're speaking it will do the right thing we have actually recorded well over 200 episodes uh just so you know okay it has to be a recent update to Skype causing problems.

Starting point is 00:03:06 Right. Yep. Yeah. Welcome to episode 227 of CBPCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving. Joined by my co-host, Jason Turner. Jason, how's it going today? I'm all right, Rob.

Starting point is 00:03:20 Are you doing? You know what I just realized we didn't do? What's that? Ken, I'm sorry. You said you would be okay with the video, but we might need to edit that too, right? Correct. Yes. Okay.

Starting point is 00:03:29 I'm going to turn on the Skype video recording then. The last few guests didn't want it for some reason, so I'm out of the habit. Yeah, no, it's fine. Oh, and you can blur your background if you didn't know that. You can click on your own face and choose blur my background, which is why my black background is blurred here if you care. Cool. Okay. There we go.

Starting point is 00:03:49 Yeah. Welcome to episode 227 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how's it going today? All right, Rob. We are C++ developers, but we are not professional audio engineers or podcasters. We're doing our best. You'd think after 200 episodes we would have this down, but we're having some technical difficulties today before we get started. If you notice anything weird like audio dropping out or something, we don't know what's going on. Sorry, we'll figure it out.

Starting point is 00:04:24 Skype can't keep its stuff together, it's okay. audio dropping out or something we don't know what's going on sorry we'll figure it out skype can't keep its stuff together it's okay anyway um how you doing jason we're a week away from christmas how you doing a week away yeah you're ready yeah it's probably our last episode of the year i think yeah i just found out that my in-laws are coming into town for Christmas. Yeah, I got my wife's entire family coming here too. Yeah. The entire family? She's one of five.

Starting point is 00:04:55 So her two brothers and one of her two sisters are coming. And parents? And her parents, yeah. So most of the family. And cousins or nieces and nephews from your perspective, I mean? No. There are no other. Her one sister is not coming as kids, but the other ones don't yet.

Starting point is 00:05:08 Yeah. Wow. That will be a pretty full house, though. Yes, it will be a full house. Okay. Well, at the top of every episode, I'd like to read a piece of feedback. This week, we got an email from Matt related to the ongoing discussion we've been having about c++ abi he said another situation where abi is important is expensive third-party proprietary libraries a company where i work requires a high level of support for all software we don't write so they go to software companies for this in my current project that company requires a new support

Starting point is 00:05:39 contract for each compiler which runs in the six-figure cost so cost itself is another common situation industry where abi compatibility is important luckily our latest project is on new for each compiler, which runs in the six-figure cost. So cost itself is another common situation in the industry where ABI compatibility is important. Luckily, our latest project is on new hardware and compiles, which required an ABI break, and they moved from C++ 98 to 14. Love the podcast. Thanks for your efforts. So, yeah, I mean, that's not something we talked a whole lot about,

Starting point is 00:06:00 but that makes sense that if you are dependent on libraries that you pay for, that you're not going to want to, you know, want them to change very often. You know, I'm thinking I could have come up with a support contract model for ChaiScript where I charge six figures for each bill that needed to be made. It's not too late. Okay, well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter

Starting point is 00:06:26 Or email us at feedback.cbgas.com And don't forget to leave us a review on iTunes Or subscribe on YouTube Joining us today is Ken Museth Ken is CEO and co-founder of Voxel Tech Which contracts to tech companies Primarily in the movie and aerospace industries Ken currently does contract work for SpaceX

Starting point is 00:06:42 And Weta Digital Previously he was director of R&D And Senior Principal Engineer at DreamWorks Animation. He was also a professor in computer graphics at Linkoping University and a visiting faculty member and research scientist at Caltech. He worked on trajectory design for the Genesis space mission at NASA's JPL, and in 2015 received an Academy Award for the development of OpenVDB. Ken, welcome to the show. Thanks. Great. Great to be here. Yeah, so your bio has built right into it a question that I've been wondering about recently, and that's professorship. So you were a professor. Do you have a PhD in a related field,

Starting point is 00:07:16 or is that because of your experience that you were made a professor? How did that work out? That's actually a long story. So my background, my PhD is in physics, in quantum mechanics. Okay. And I came to Caltech in, I believe it was 1999, as a physicist or chemist. And after about a year and a half, I changed field to computer science. So I spent about five years sort of transitioning from one field to another. And after that, became a professor in computer graphics in Sweden. Okay.

Starting point is 00:07:50 So strictly speaking, my background is actually not computer science, but I've been a professor in computer science. Interesting. And you said your background was quantum physics? Yep. Quantum dynamics. So the idea that you can simulate molecules very small molecules I should say like three or four body molecules

Starting point is 00:08:12 using the principle of quantum mechanics from scratch so it's funny because it's a field where physicists would say it's not really physics because there are too many particles and chemists would say it's not really chemistry because it's a field where physicists would say you know it's not really physics because there are too many particles and chemists would say it's not really chemistry because they're not

Starting point is 00:08:29 enough so sort of in between the two fields but it involves it sounds like a strange transition to go from quantum mechanics to computer science but it's not that big of a stretch in the sense that what we did was everything was simulating on computers.

Starting point is 00:08:46 Does that background give you any insight into the hype around quantum computing? Probably not, no. But I think it has certainly, if you look at what I've done in computer graphics and computer science, it's definitely flavored by the physics background. Sure. So most of what I've done in the movie industry has been physics-based simulation. Also the stuff we do at SpaceX is physics-based,

Starting point is 00:09:17 computational fluid dynamics. So it's there somewhere in the background. Wow. Well, Ken, we have a lot to talk to you about, obviously, with OpenVDB and everything else you work on. But first, we just have a couple news articles to discuss. Feel free to comment on any of these and then we'll start talking more about everything

Starting point is 00:09:34 you do, okay? Cool. Okay, so this first one is an article from a PhD. We've had him on twice on the podcast before, and it's going full circle on embed in C++. And he's kind of going over the latest efforts to get the embed proposal standardized. And it starts off great.

Starting point is 00:09:57 He presented the paper in front of the Evolution Working Group, and he got a very great response. But then he went on to SG7, where they kind of pushed back on him a lot, said it should have a more generic API, and told him to go investigate the way things are done with this Circle programming language, which I had not heard of before. Have you, Jason? Maybe.

Starting point is 00:10:20 Maybe? I'm not sure, actually. Okay. Well, and then the interesting thing is he kind of does a bunch of tests to kind of show the committee how embed works. of that programming language who decides he's going to go ahead and make a embed feature for circle instead of doing whatever it was the committee is trying to have him do yeah right so i think maybe filling in a little bit of like background in this story as i understand it that that wasn't explicitly listed here is with circle uh if i understand correctly circle can do anything at compile time that could be done at compile time okay so they are able to open a file just use

Starting point is 00:11:12 i o streams or whatever and load that into a compile time buffer right and i believe that's what the standard committee was asking john he to look into into. Like what, as far as a more, because I read this several times going, what the heck do you mean by a more generic approach? Like read over that sentence. I believe that's what they're looking for is can we do this like regular file IO at compile time? And so you can do that in circle, but it's crazy,

Starting point is 00:11:40 crazy slow. So the author of circle is like, Oh no, it's actually better to have a standard language feature for this. Exactly. Which is what John Heed is trying to do for C++. Yeah. So that's why he felt like he came full circle.

Starting point is 00:11:54 So I looked at some of the comments on Reddit and I know John Heed seems a little sad at the end of this blog and a little unmotivated maybe, but I do hope he kind of goes back to the committee and kind of

Starting point is 00:12:09 presents this. Says, look, even the language author of Circle is going to go and make this a feature because it doesn't make sense to do it the way you suggested. I hope he goes forward with it. It seems like a great feature. Yeah, I'm totally in favor of this and we haven't let our guests speak at all uh on this like uh from i've worked on several projects and in fact i'm working

Starting point is 00:12:31 on one right now we're being we need to embed raw file data in the c++ executable and there's certainly linker hacks and there's uh conversion to like bin hex kinds of things that you can do. And they're all just ClueG. And being able to just directly like embed the file right there to me would be hugely helpful. Is this anything you run into in your work? We typically roll our own IO routines. So IO is very, very important for actually NVDB.

Starting point is 00:13:04 We support delayed loading. So the idea that you load data on demand and can sometimes unload it again. And the best way of getting really good performance is to sort of make use of the fact that you know, you have firsthand knowledge about the data layout in memory, and you reflect that in your out-of-memory representation. So, yeah, we tend to roll our own, to be honest. So then you're never really concerned about, like, including things at compile time, like embedding resources in? No, I don't think so. No. We can just go through these other articles real quick.

Starting point is 00:13:45 Qt has a 5.14 update. A couple interesting things here. They have a new graphics stack that is not based on just OpenGL anymore. You can have an abstract layer on top of Vulkan, Metal, or Direct3D, which seems great. But I'm not really an expert on Qt, so I can't speak too much to this. I do have a question, I guess, for our listeners, if anyone feels like chiming in on Twitter or Reddit later, is the last large Qt project that I started on was now about 10 years ago. And when I was doing the GUI work on it, so I had a little

Starting point is 00:14:24 bit of a gap in my Qt experience. It started in like 2003 and I used it for a few years and then I took a couple of years off and then I went back to it. And I'm like, well, it seems that everyone's using like QML or whatever now to do like the layout of your GUI instead of hard coding that directly into the C++, which is the old school way of doing it. I could not get anyone else on that team to go along with like using QML or QT. Well, I don't think I don't think QT quick existed yet. But QML anyhow to do the like markup for the GUI layout. And I'm just curious for our listeners, what do you do? Do you hard code your GUIs in C++, or do you use one of these modeling languages that Qt has provided?

Starting point is 00:15:08 Yeah, I mean, I know we've done episodes on Qt before, but maybe it'd be nice to get someone on who has been using a lot of Qt to make GUI applications. And ideally someone who's been doing it for like 20 years so they know the whole history of how these things went. Right. Okay. And then the last thing is people supposed now is accepting student

Starting point is 00:15:30 volunteer applications. So if you're interested in going to Aspen in may, I should definitely sign up. Absolutely. Yeah. Anything else you want to mention about that, Jason? Well,

Starting point is 00:15:39 I mean, we've had so many students and ex students come on the show. It's a, there's a couple of testimonials feel from people about how it kind of changed their career and stuff. And, yeah, I mean, it's almost certainly not a bad idea to get a free trip to Aspen as a student and meet a bunch of people in the C++ industry. Yeah, definitely. Okay. Well, a listener wrote to us a few months ago, Ken, suggesting we do an episode on OpenVDB, which I mentioned in your bio you won an Academy Award for.

Starting point is 00:16:11 Could you tell us a little bit about what OpenVDB is? Sure. So OpenVDB is essentially two things. It's a compact data structure. It's a data structure that essentially allows you to associate values with coordinates, three-dimensional coordinates. So given an XYZ

Starting point is 00:16:34 index, which actually can be negative, they can have negative values too, you can associate a value and it's templated so it can be a float, it can be a vector, it can even be a set of points. And the idea is that this data structure,

Starting point is 00:16:54 this layout is sparse in the sense that it's only allocating exactly, or not exactly, it's allocating as little memory as allows for fast performance. So the footprint grows as you add more and more values to this volume. So that's one component. And the other component are a very large library of tools that are written on top of this data structure. So the data structure itself is what's called VDB. And then there is this pretty

Starting point is 00:17:27 significant tool sets. And it's been developed primarily at DreamWorks Animation and it's now become sort of the de facto standard for volumes in visual effects. But it's also used outside of the visual effects

Starting point is 00:17:44 industry. That's sort of the visual effects industry. That's sort of the short version. So I think it's important to realize that it's fairly easy to write a fast volume data structure. It's also fairly easy to write one that is sparse and doesn't occupy a lot of memory, but it's actually surprisingly difficult to write one that's both fast and sparse. And that's sort of what this,

Starting point is 00:18:09 that's a claim to fame for this data structure is that it's fast. It's obviously not as fast as a dense volume, but pretty close. It has the same API, so you can set values. This is what we call random access. So given an XYZ, you can set a value or you can set values. This is what we call random access. So given an XYZ, you can set a value or you can get a value.

Starting point is 00:18:29 And it also uses iterators to visit values that you already set inside the data structure. So it's a little bit, I mean, you can draw a lot of analogies to the standard library containers like an STD vector or some of the other ones. An STD vector, just to use as an example, has very fast random access because we can look up a value associated with an offset in O1 time complexity, so super quick.

Starting point is 00:18:58 It's not as fast to insert values because, as we all know, an STD vector doubles its allocated memory pool when we sort of exceed the current buffer. So in worst case, we actually have to reallocate and copy values over, whereas VDB actually achieves O1 complexity

Starting point is 00:19:19 both for inserting and deleting values. And then, again, so the same analogy. So we have, just like we have iterators on an STD vector, we also have iterators on a VDB container. So as you set values, you can then revisit the values and sort of massage the values, either if you do physics simulations or geometry manipulation. That's a very convenient way of doing it.

Starting point is 00:19:47 I think some of the other highlights is it's unbounded, which actually turned out to be... I was very excited about it when I achieved it, but there was a lot of pushback from a lot of the users because it's sort of a paradigm shift. Because normally when you deal with volumes, you sort of define a box. It's almost of a paradigm shift because normally when you deal with volumes, you sort of define a box. It's almost like a sandbox.

Starting point is 00:20:09 And you do your computation inside that box. And you just pray that you never sort of run out of space because if you do, then you have to reallocate everything and sort of continue. Whereas VDB, there is no box. Like you can literally associate a value with any index value within 32-bit precision. So let's say you're doing a fluid simulation and you may not exactly know where the fluid

Starting point is 00:20:38 is actually ending up or you do a smoke simulation. The traditional way of doing it is you set up a box, you release your fluid, and you keep simming, and as soon as it hits the boundary, depending on your boundary condition, you may, you're sort of, you're certainly losing anything that goes on beyond the boundary.

Starting point is 00:20:58 Whereas for these types of grids, they keep adding values. So it can literally go on almost forever. Now, of course, you're still paying the cost of allocating the memory. There's no free lunch.

Starting point is 00:21:14 At least you're only allocating what you need. That's really the main benefit. I think another nice feature is that it has of has a built in what we call a bounding value hierarchy. So it's almost like it's an acceleration structure that tells you where you have interesting information in the volume. And that's very useful for things like rendering or

Starting point is 00:21:41 ray tracing. So when you're shooting a ray, you're trying to create essentially an image of, let's say, a smoke cloud or a water simulation. You need to know where you have information. You don't want to do ray integration through all of space because potentially there's a lot of empty space. And because VDB is actually underneath, it's a tree, it's a special type of tree. The nodes of the tree also tell you, they give you spatial information about where you have

Starting point is 00:22:12 information where you don't. So you can sort of do intersection tests of the ray against these, the extensions of the nodes. So that's very useful for several applications. So when you iterate, just taking a step back for a moment, when you iterate, you're not going to iterate over every possible point in space. You're going to iterate over all the points that have a value currently. That's correct. Yep. So sort of the one extreme, hypothetically, the one extreme would be I'm only going to store in this data structure the exact values that you push into it. That's typically how an STD vector would work. The problem with that is that the cost of inserting an element and looking it up can be costly in 3D because you don't know where these values are. They can be anywhere. They can be

Starting point is 00:23:05 very sparsely populated in space. So what VDB does, instead of only inserting that one value, it inserts a tiny box, a cutout of the surrounding space that the point belongs to.

Starting point is 00:23:22 So it is over allocating a little bit. And so as you allocate a new this is actually what's called a leaf node in the tree. As you allocate a new leaf node, put a value into the leaf node, you

Starting point is 00:23:38 need a way of marking that this is a value that you inserted. And it does this using bitmasks. So there's a single bit associated with the exact value that you just pushed in there. And this turns out to be extremely useful for many different applications

Starting point is 00:23:56 because this bitmask can be used not only to sort of identify the value again, but you can also use it for iterations, for these, when you implement the iterators, because the iterators now essentially become they're searching through the bitmask. They're searching for on bits and off bits.

Starting point is 00:24:20 And this can be done very efficiently. You can also alter other operations that you can perform. So imagine you have two different grids, two different trees, and you want to perform a Boolean operation. Like you want to do essentially an intersection. You can actually do that using these bitmasks as well because you can do bitwise OR operations in this case.

Starting point is 00:24:42 So how large – I'm sitting here basically thinking, okay, you're trying to minimize the number of dynamic allocations, you're trying to maximize your cache friendliness and performance. How large are the over-allocated hunks that you do? So that's... you can configure that. The default configuration

Starting point is 00:25:00 has a leaf node size that is 8x8x8. So we've done many experiments that seem to suggest that that is an optimal sort of balance between performance and memory footprint. On the GPU, interestingly, this tree tends to operate faster

Starting point is 00:25:21 if it's a larger leaf node. So 16x16x16 is typically what we use there. But you can actually configure it. So the implementation is highly templated. In fact, the leaf node itself is templated on the value type. It could be double, it could be float, it could be a vector, it could be a point. And then the child node of the leaf node, the internal node, is then templated on the leaf node type. And you sort of sequentially or hierarchically template

Starting point is 00:25:52 until you reach the root node. And the root node is then templated on all the nodes below it. And this turns out to be, it's very, very good for performance because we get a lot of inlining and we can do template metaprogramming. So a lot of the executions

Starting point is 00:26:11 that are needed to traverse the tree are actually resolved at compile time. So that's part of the secret sauce to the good performance. Does that affect compile time adversely? It does. So this is one of the complaints. In fact, if you listen to the keynote

Starting point is 00:26:31 from the CPP Con, the keynote speaker, Marc Allain, actually used OpenVDB as an example of, I think he even called it overuse of templating. I obviously disagree with that, but it is true. You can get some pretty hairy compile times. But honestly, I would much prefer longer compile times and faster execution run times.

Starting point is 00:26:57 Okay. I'm not going to argue with you. I've been involved in lots of discussions recently about which do you prefer and how do you set that balance and how do you weigh the developer's time versus what is the most important thing for your project. And it sounds like runtime performance is the most important thing to you. Yeah. But I see the other side of this as well. I understand that it can be frustrating at times. Yeah. But I see the other side of this as well. I understand that it can be frustrating at times. Right. So just

Starting point is 00:27:29 out of curiosity, how long are long build times for tools using your OpenVDB? It's funny because building the library itself takes no time because it's just essentially header files. Right. But building client code,

Starting point is 00:27:45 actually building, let's say, the unit test that the library comes with, that can take, I think, depending on your hardware, it can take five minutes building and running. I've heard much, much, much worse. Right, right. Actually, I think the biggest complaint is the memory usage that you use.

Starting point is 00:28:09 So if you just, let's say you have a 64-core machine or even 32-core machine, and you just let it go, like make minus J, you can actually run out of memory even with 64 gigabytes because it just keeps spawning build jobs and they take quite a lot of memory. So sometimes you've got to be careful.

Starting point is 00:28:36 I just want to rewind a little bit and make sure listeners understand what it is that OpenVDB is used for. You mentioned smoke and fluid analysis. Right. So as I said, it was developed initially for visual effects, so volumetric simulation.

Starting point is 00:28:58 But it's not sort of limited to that application. It is actually being used in engineering as well. So the stuff that I'm involved in at SpaceX makes use of VDB. So there's nothing limiting it to visual effects. But anything that involves a sparse volume, so a volume that doesn't require information everywhere. And, you know, that is...

Starting point is 00:29:24 Not every application lends itself to these sparse applications or sparse data structures, but a lot of them do. So, yes, fluid animation is a great example.

Starting point is 00:29:36 Manipulating geometry, fire simulation, there's also, in engineering, things like topology optimization, where you're trying to determine the shape of a material to meet certain constraints, like

Starting point is 00:29:52 material constraints, stress, things like that. It can be used for scientific visualization, medical imaging. Yeah. It's really up to you. We've had lots of

Starting point is 00:30:08 people reach out from fields that we never thought of would make use of this, which is quite exciting. Since I've never done anything like this, I find myself slightly surprised that integer coordinates give you the resolution that you need for these kinds

Starting point is 00:30:24 of simulations. I would have just assumed that you would want floating point resolution or something. Right. So that's a good point. So oftentimes when you solve problems in engineering, and that translates to graphics too, you're trying to solve these partial differential equations. And a very common technique is to sort of discretize them on a grid, a background grid. But you are right.

Starting point is 00:30:52 There are other ways of doing it. You could also have used particles instead. Where the particles, you could essentially think of it as grid points that are free to move around. And VDB actually supports both. So you can also store particles associated in the grid

Starting point is 00:31:08 so each voxel so a voxel is a 3D analogy of a 2D pixel so it's just a it's the smallest unit that you can address in a volume and you can have in a volume. And you can have particles that live inside that voxel,

Starting point is 00:31:33 or you can have a single value, a single floating point value. But we tend to use these grids for solving things like fluids. But as I said, you can certainly also use particles and floating point positions. I wanted to interrupt the discussion for just a moment to talk about the sponsor of this episode of CppCast, the PVS Studio team. The team promotes the practice of writing high-quality code, as well as the methodology of static code analysis. In their blog, you'll find many articles on programming,

Starting point is 00:32:02 code security, checks of open-source projects, and much more. For example, they've recently posted an article which demonstrates not in theory, but in practice, that many pull requests on GitHub related to bug fixing could have been avoided if code authors regularly use static code analysis. Speaking of which, another recent article shows how to set up regular runs of the PVS Studio static code analyzer on Travis CI. Links to these publications are in the show notes for this episode. Try PVS Studio. The tool will help you find bugs and potential vulnerabilities in the code of programs written in C, C++, C Sharp, and Java.

Starting point is 00:32:38 You mentioned that there's a lot of tools that make use of OpenVDB. Are these part of the library? Yeah, so the library itself has implemented a lot of high-level algorithms. Things like, you know, how do I go from a polygonal representation of a surface to one of these VDBs? How do I go back again? How do I go from a volume

Starting point is 00:33:05 to a polygonal representation? This is called scan conversion. So the truth is most of the surface representations that we encounter certainly in visual effects but also in engineering are actually modeled

Starting point is 00:33:20 using triangles or quads. So polygons. Think of a computer-aided design CAD model. It's typically triangles. So how do you take that and sort of represent it as a volume instead? That's quite a challenging task, and we have algorithms in the library that does that for you. But there's also all the

Starting point is 00:33:46 mechanics you need to solve these equations that I talked about earlier. Let's say you're doing a fluid animation, you're doing fire or explosion. We actually base that on what's called the Navier-Stokes equation or equations. These are

Starting point is 00:34:03 equations that come out of physics. It tells you exactly how the fluid is changing over time. And it involves these partial differential equations. And we can discretize it. We can solve it on the grid itself. And we have libraries, support libraries in there. So most of what ships with VDB is quite low level. And so there are many

Starting point is 00:34:27 commercial packages that sort of pick up VDB and implement it inside their own node graph or inside their own renderer. So I think probably the most well-known example is Houdini. So Houdini is a

Starting point is 00:34:44 third-party commercial product, sort of the de facto standard for visual effects. And it internally uses VDB for sparse volumes and makes use of a lot of these algorithms. And you mentioned on the side, and I let it slip past, you said when you're doing these things on the GPU. So does that imply you have built-in support for doing these calculations on GPU?

Starting point is 00:35:10 Great question. No. So VDB, which if you download it today, is strictly for the CPU. But I and others have certainly been playing, toying with what does it mean to take the ideas behind the data structure and porting them to the GPU.

Starting point is 00:35:27 So I would say a lot of this is ongoing work. But yeah, so my own discovery, and that's been backed up by other people I've talked to, found that the configuration that we use on the CPU is not ideal for the GPU. So especially the leaf node CPU is not ideal for the GPU. So especially the leaf node sizes are too small for the GPU. But this is actually an area I would love to work some more on, porting the library and porting some of the algorithms to GPUs.

Starting point is 00:36:01 I mean, obviously you're using C++, and you were just talking about templates and template metaprogramming. I'm just curious, have you been following the C++ standards? What do you require today? Yes, we do. The industry

Starting point is 00:36:17 as a whole, the visual effects industry, tend to be a little bit slow. There's a committee called the Visual Effects VFX Platform that keeps track of the different versions of libraries and compilers

Starting point is 00:36:33 that the industry as a whole uses. And right now we're using we're up to C++ 14. So that's also what we use in OpenVDB. And we're very eager to move forward, but we have to sort of follow this standard. So we're trying to be good citizens.

Starting point is 00:36:54 Sounds almost as bad as the auto industry. Oh, yeah. I can imagine. Yeah. Yeah. So we're at GCC, I think it's 6.3 or 6.2. Oh, well, that asks all of C++17 language support,

Starting point is 00:37:09 but not all of C++ library support. I was just looking this up yesterday. Oh, okay. Coincidentally. Cool. I was looking at the... Oh, I'm sorry. Go ahead, Jason. No, no, go ahead. Go ahead, please, Rob.

Starting point is 00:37:20 I was going to say I was looking at the last changelog for, I think, version 7 of OpenVDB, and it mentioned new measurement analytics for level sets. And I was curious, what exactly is a level set? So this is, I have to admit, this is one of my absolute favorite topics. So I'll try and restrain myself and be short. So as I mentioned earlier, most of the representations that we use for surfaces are polygons. And polygons are great for modeling, but they're not ideal when a surface deforms or changes. Imagine that you have a surface that represents the interface between water and air, so essentially a water surface. And let's say you throw something into the water

Starting point is 00:38:08 and you create a splash, a crown splash, with lots of droplets that are being, you know, that are formed and breaking off the main surface. So essentially you have a very, very complicated surface that changes over time, that deforms over time. Now this is a great example you have a very, very complicated surface that changes over time, that deforms over time. Now, this is a great example where these polygon representations really struggle.

Starting point is 00:38:32 Because as you can imagine, as a surface breaks up and collides with itself again, figuring out exactly how to represent that with polygons and triangles and vertices is a very, very challenging task. So it turns out that there's a much, much simpler

Starting point is 00:38:51 way of representing surfaces that change what we call topology. So essentially they break up or merge. And that is to use what's called a level set. And a level set is sort of a glorified isosurface. What is that?

Starting point is 00:39:10 What is that? So I think most of us are familiar with height fields, right? So if you have a topographic map and you have – it's in 2D. And every point on this map gives you the height of the landscape or the mountain. And you can sort of, you can draw contour lines

Starting point is 00:39:32 and these contour lines connect all points at the exact same height. Right. So it's a great way of sort of, you know, conveying visual information about how tall is the landscape, how does it vary. Now if you take that idea

Starting point is 00:39:48 and add on an extra dimension, so we're no longer talking about a map, we're talking about a volume. So at every single point, x, y, c, we have a single value associated with it. So now we have a volume with values.

Starting point is 00:40:04 And if you now imagine that you defined a surface that consists of all the points with one specific value. So it's sort of a height field but in 3D. This is what we call an isosurface. It's like trying to visualize

Starting point is 00:40:20 a hypercube, basically. It's very difficult. Yeah. If you've never tried it before, I'm sure you'll get a headache. But so that's sort of the simplest idea, an isosurface. That's been around for many, many years. Now LevelSets takes this slightly further,

Starting point is 00:40:38 to put it mildly. So it adds a lot of mathematics underneath that tells you how to massage the values in order to deform the surface. So let's say you're actually doing a fluid simulation. We represent the surface, the water surface itself, using one of these level sets. And we can move the water surface around and sort of create the illusion of a crown splash or breaking dam or whatever, some complicated behavior of the water by solving equations that massage these values, these levels and values.

Starting point is 00:41:18 And then you may ask, so what do these values really correspond to? And typically they are the distance to the surface. And it's actually often the sign distance. And the convention is typically every point that is inside the surface has a negative value and every point that is outside the surface has a positive value.

Starting point is 00:41:40 And then obviously the isosurface that I was talking about before now is the zero level set. So all points that are exactly zero are exactly on the surface. And that's the surface that you're tracking. That's the surface you're interested in. And there's one more link to VDB. So as you can imagine, since you're really only interested in the zero ISO surface, because

Starting point is 00:42:06 that's where the surface is, everything else is kind of redundant. You don't actually need to track the exact values that are very far away from the surface. So if you take this to the extreme, all you need is a very narrow band around the surface itself. This is what's called a narrow band level set. It just means I only need distance values in close proximity to the surface that you're deforming. And that, as you can imagine, is a very sparse data set, right? Because as soon as you're a certain distance away from the surface, you don't care about the value anymore. But when you're close, that's when you want to store the value. And this is where VDB is ideal because you have a very sparse volume that only acquires values in close proximity to the surface. So instead of storing a box of values everywhere

Starting point is 00:43:06 which as you can imagine will be very very memory expensive or memory heavy especially if this box grows over time, VDB allows you to only allocate values in close proximity to the surface and the values will be updated as the surface starts

Starting point is 00:43:22 moving. So in that sense it's almost the ideal memory footprint. So you guys look like a continuous sledgehammer. I'm thinking, like, when you first presented this topic of what OpenVDB is and how it can store, you know, any objects in the 3D grid, my immediate assumption is like, okay, at this point, one, two, three, I have a particle with these properties that's moving in this direction, or I've got a voxel that is this color. But it sounds like that is just the wrong way to think

Starting point is 00:43:58 about this entirely. Like it can be used for those things. But if you really want to get the most benefit out of it, you need to think of it for a meta level, like it can just be used for those things but if you really want to get the most benefit out of it you need to think of it for of a meta level like it can just be used to do anything that you need to be keeping track of that happens to be or like properties about properties or something sure but but you actually nailed it because at the core it is exactly what you said. It's the ability to associate values with X, Y, C, or I, J, K coordinates. Right. If you can do that, there's a lot of other things you can do. Like you can actually deform it.

Starting point is 00:44:35 You can – as the surface is moving through space, you need to allocate and move these grid points around. So when I say move, I don't mean move in the sense of pushing particles around. What I mean is you need to allocate new grid points and maybe remove old grid points. So as you think of a grid

Starting point is 00:45:03 in three-dimensional space, and as the surface sort of sweeps out this 3D space, it's intersecting certain grid points in this volume. Just so I can pause you. Just saying, make sure I'm following along. Because at the moment, I'm visualizing, as you're saying, this surface of the water, right? Because this is where your example started. I'm visualizing the outside of a fish tank where I can see the surface of the water in a 3D volume. That's limited. Right. Is that okay?

Starting point is 00:45:41 That's great, except I want to get rid of the tank itself. Okay, okay. In the sense that, I mean, it's a good metal picture because this is actually how people would traditionally do volumes. You would have a fish tank, and inside the fish tank you would have a grid, a 3D grid. And you can associate a value with every single grid in the tank. What VDB does, it gets rid of the tank itself.

Starting point is 00:46:07 Right. Because there's sort of no spatial limitations to where you can have values. Okay. But it also sort of trims away values that are away from the surface itself because they're kind of redundant. You don't really need them.

Starting point is 00:46:22 Water at the bottom of the tank is not going to move. That's right. So can you set a limit and say, anything that's outside of the bottom of the tank is not going to move. That's right. Can you set a limit and say anything that's outside of the range of these particles I don't care about or do you manually have to prune them? That's part of

Starting point is 00:46:35 the application that you write on top of VDB. VDB has low-level methods to add values and remove values and even dilate values. You can imagine that I have inserted a few discrete values into the volume, and now I want to add all the neighbor points. That's an operation called dilation.

Starting point is 00:46:59 You can do the opposite. You can say, I've added lots of values, and I now want to shrink it. That's erosion. So the data structure itself supports these fairly primitive operations and the client code can make use of it to do things like tracking how a surface of

Starting point is 00:47:18 water moves or taking geometric pieces and gluing them together, unioning them together. These are the higher level applications you write on top of it. But the data structure itself supports these fairly simple operations

Starting point is 00:47:36 that become sort of the building blocks. But yeah, so the whole gag is that you want to perform certain operations using as little memory as possible. Right. And the reason you do that, obviously, is that you want fidelity. You want your surface to be high resolution, as high as possible. and if you take the traditional approach which is when you have your fish tank if you make the fish tank really big then the overhead of allocating this redundant memory can very

Starting point is 00:48:13 quickly kill your performance and even your computer you can quickly run out of memory so you can care about just the particles that are on the surface and if the surface moves up trim the particles that the data that used to be below it. If it moves back down again, trim the old data.

Starting point is 00:48:31 And you're always using the optimal amount of memory. Yeah, that's exactly right. So how many particles is a lot in your world? I'd say over a billion. Over a billion is a lot? Yeah, that's starting to push. But, I mean, yeah, it obviously depends on your hardware.

Starting point is 00:48:48 It depends on your application. So there's sort of a golden rule in visual effects, which is your simulation should never be longer than you can, you know, an artist can run it overnight. Whereas in scientific computing, you know, oftentimes we run simulations

Starting point is 00:49:03 for weeks or even months. So, yeah, it depends on the application. So a lot of the kind of questions that I'm personally asking you right now is I've been having classrooms full of physics students this past year. And I've been trying to think, okay, how do I tell them about this library and what application does it have to them when simulating subatomic particles, basically. Ooh, right. Oh, I don't know. Does your application involve volumes of some kind? So actually, there's another thing,

Starting point is 00:49:38 there's another line of applications that it's also very useful for, which is, let's say you have, you actually have particles and the particles move around. And in many applications, you need to know about the local sort of topology of the points.

Starting point is 00:49:57 You need to search your immediate nearest neighbors. So, you know, the classic way of solving this problem is to use things like a KD tree as an acceleration structure. The problem with a KD tree is as you add more and more points, these search operations have an unfavorable scaling. They can get very, very slow.

Starting point is 00:50:22 So this is another example where VDB can actually be used and has been used.. So this is another example where VDB can actually be used and has been used. So you can imagine that you use VDB as a way of bucketing points. So you have your points and you shove them into a VDB tree and they may fall in different voxels, different nodes, but because VDB can do all of one lookup, so you can very quickly

Starting point is 00:50:48 ask the following questions. How many particles are stored at this position, IJK? Now, that can be sort of a building block for exactly answering the question of how many particles do I have in my immediate neighborhood? So that's another.

Starting point is 00:51:04 Maybe that is an application for your students. I don't know. I think, well, we're, I think, running low on time now. But their simulations currently are closer to ray tracing. And they need to know, is there an object in this point in space that I'm going to intersect? And what are its properties? And then what does that do to the refraction of this particle?

Starting point is 00:51:28 Right, yep. That is definitely sort of a common application. So you can, if, whatever object you're trying to ray intersect with, if you can sort of voxelize it, if you can give it a gridded representation, imagine that it's, I don't know, it could be a polygon, it could be some shape. If you have

Starting point is 00:51:51 a way of essentially detecting what grid nodes does my object intersect with, these grid nodes you can then mark as active in the grid, and then you can shoot a ray against

Starting point is 00:52:08 the tree structure itself. Because you can do very quick ray intersection tests against the tree structure. So essentially what you want to know is does my ray intersect any grid, any objects as it's going in a certain direction.

Starting point is 00:52:24 And exactly that type of test can be done very efficiently against the tree itself. It sounds like you could choose between accuracy, like you could say, well, no, nothing was there, so I'm going to give up a little bit of accuracy. Or you can say, okay, there might have been something there. Now I have to actually verify that, yes, I did exactly intersect something.

Starting point is 00:52:46 Correct. Yep. Okay. So you can skip empty space quickly, and then you have a candidate. Right. So you're right. You still have to do a ray intersection test

Starting point is 00:52:57 against whatever object you have, but you want to limit that because that's typically very costly, and you certainly would prefer only to check the array against a single candidate or maybe a handful of candidates, not all of them. So yeah, that's a very common type of test.

Starting point is 00:53:15 Thanks for letting me pick your brain on that. Hopefully for our listeners, that was interesting as well. Yeah, cool. Absolutely. Well, I think we are running out of time, but before we let you go, I don't think we ever said, what does VDB actually stand for in OpenVDB? It's a funny question. I get it a lot.

Starting point is 00:53:34 The dirty secret is actually nothing. It's an acronym. And the story behind it is, it was called many, many different things in the past. And sort of the last iteration, I was calling it DB plus grid. And as we were sort of releasing it into the wild at DreamWorks, a lot of the artists were saying, you know, this is too nerdy. We have no idea what that means. So we actually had, we asked people like, what should we call it then?

Starting point is 00:54:04 And VDB was one of the top candidates that came out. I mean, you could still, you know, an obvious candidate would be a volumetric database or volumetric dynamic B plus tree or whatever. But the truth is it means nothing.

Starting point is 00:54:20 It's just a name. And you also mentioned that there's been Academy Awards involved here. Is that for any particular movie's work, or is it one of these Technical Academy Awards for Technical Achievement or something like that? Yeah, it was a Technical Academy Award, Technical Achievement Award. But it has been used in more than 150 movies at this point.

Starting point is 00:54:43 But you're right, it wasn't for a particular movie. It was as a technique. 150 movies, that's all fairly good. Yeah. Was there a particular movie you were working on that kind of initiated the development of OpenVDB? I think the first one that we applied it to was Puss in Boots at Dremux.

Starting point is 00:55:03 We were modeling these very intricate and detailed clouds. I wouldn't say it was developed for that movie, but that was our first test. But since it's now found its way into some of the commercial packages,

Starting point is 00:55:20 many artists may not even realize that they're using it. So it is a free open source license that it can be integrated into other commercial packages. Absolutely. Yeah, absolutely. And we didn't discuss that at all. That's true.

Starting point is 00:55:34 Actually, I should probably mention that it was primarily developed at DreamWorks. I had three other guys joining the project. It was Peter, Mihaly, and David. Since then, all of us have left and it's now actually run not by DreamWorks, but

Starting point is 00:55:56 by the Academy Software Foundation. So it's an official library, I guess. Oh yes, it's now sort of a neutral entity that's running it. So that's quite exciting. So the project is getting a second wind. There are some very talented people from other studios

Starting point is 00:56:17 that are now pitching in and helping. Looks like everything's up on GitHub if any of our listeners are interested. Yep, that's right. Very cool. Okay, well, thanks so much for coming on the our listeners are interested. Yep, that's right. Very cool. Okay, well, thanks so much for coming on the show today, Ken. Well, thanks for having me. Yeah, thanks for coming on.

Starting point is 00:56:30 Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon.

Starting point is 00:57:00 If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

CppCast - OpenVDB

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.