CppCast - OpenVDB
Episode Date: December 19, 2019Rob and Jason are joined by Ken Museth the CEO of Voxel Tech. They first discuss a blog post about std::embed and the new version of Qt that was just released. Then they talk to Ken Museth about OpenV...DB a C++ library for working with volumetric data used in Visual Effects, Scientific Simulations and more. News Going Full Circle on Embed in C++ Qt 5.14 released C++Now 2020 Accepting Student/Volunteer Applications Links OpenVDB OpenVDB's GitHub Repository 2014 Sci-Tech Awards: Ken Museth, Peter Cucka and Mihai Aldén Sponsors Write the hashtag #cppcast when requesting the license here One Day from PVS-Studio User Support JetBrains
Transcript
Discussion (0)
Episode 227 of CppCast with guest Ken Museth, recorded December 18th, 2019.
Sponsor of this episode of CppCast is the PVS Studio team.
The team promotes regular usage of static code analysis and the PVS Studio static analysis tool.
And by JetBrains, makers of smart IDEs to simplify your challenging tasks and automate the routine ones.
Exclusively for CppCast, JetBrains is offering a 25% discount for a yearly individual license,
new or update, on the C++ tool of your choice, CLion, ReSharper C++, or AppCode.
Use the coupon code JETBRAINS for CppCast during checkout at www.jetbrains.com.
In this episode, we talk about Stood in Bed and a cute update.
Then we talk to Ken Musiff, CEO of VoxelTech.
Ken talks to us about his work on the Academy Award-winning OpenVDB library. not that i hear yeah yep wait no but see you just got quieter again then that's skype doing that you can go in and change your skype auto
auto um auto gain adjustment setting no worse no worse it's funny for it to come up right now
uh it's under your audio and video settings and then there's an automatically adjust microphone
settings uh slider yeah mine at the moment's at nine which is higher than it used to need to be, and I'm just
a little confused by that, but I don't know if it was one of the Windows updates recently changed
something. Yes? Yep.
You just went away again, Rob, I swear. Yeah. It's not just me. I'm not crazy.
No.
Oh, it just, it auto-adjusted even though it was turned off.
I've seen that happen with Skype specifically.
No, it's not Audacity doing it.
It's Skype.
I'm, I think also if you exit and reopen Skype, it should actually save the setting.
Good old Skype, microsoft all right well theoretically if you're speaking it will do the right thing we have actually
recorded well over 200 episodes uh just so you know okay it has to be a recent update to Skype causing problems.
Right.
Yep.
Yeah.
Welcome to episode 227 of CBPCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving.
Joined by my co-host, Jason Turner.
Jason, how's it going today?
I'm all right, Rob.
Are you doing?
You know what I just realized we didn't do?
What's that?
Ken, I'm sorry.
You said you would be okay with the video, but we might need to edit that too, right?
Correct.
Yes.
Okay.
I'm going to turn on the Skype video recording then.
The last few guests didn't want it for some reason, so I'm out of the habit.
Yeah, no, it's fine.
Oh, and you can blur your background if you didn't know that.
You can click on your own face and choose blur my background, which is why my black background is blurred here if you care.
Cool.
Okay.
There we go.
Yeah.
Welcome to episode 227 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how's it going today?
All right, Rob.
We are C++ developers, but we are not professional audio engineers or podcasters.
We're doing our best. You'd think after 200 episodes we would have this down, but we're having some technical difficulties today before we get started.
If you notice anything weird like audio dropping out or something, we don't know what's going on. Sorry, we'll figure it out.
Skype can't keep its stuff together, it's okay. audio dropping out or something we don't know what's going on sorry we'll figure it out skype
can't keep its stuff together it's okay anyway um how you doing jason we're a week away from
christmas how you doing a week away yeah you're ready yeah it's probably our last episode of the
year i think yeah i just found out that my in-laws are coming into town for Christmas.
Yeah, I got my wife's entire family coming here too.
Yeah.
The entire family?
She's one of five.
So her two brothers and one of her two sisters are coming.
And parents?
And her parents, yeah.
So most of the family.
And cousins or nieces and nephews from your perspective, I mean?
No.
There are no other.
Her one sister is not coming as kids, but the other ones don't yet.
Yeah.
Wow.
That will be a pretty full house, though.
Yes, it will be a full house.
Okay.
Well, at the top of every episode, I'd like to read a piece of feedback.
This week, we got an email from Matt related to the ongoing discussion we've been having about c++ abi he said another situation where abi is important is expensive third-party proprietary libraries a company where i work requires a high level of support for all software we don't write
so they go to software companies for this in my current project that company requires a new support
contract for each compiler which runs in the six-figure cost so cost itself is another common
situation industry where abi compatibility is important luckily our latest project is on new for each compiler, which runs in the six-figure cost. So cost itself is another common situation in the industry
where ABI compatibility is important.
Luckily, our latest project is on new hardware and compiles,
which required an ABI break,
and they moved from C++ 98 to 14.
Love the podcast. Thanks for your efforts.
So, yeah, I mean, that's not something we talked a whole lot about,
but that makes sense that if you are dependent on libraries
that you pay for, that
you're not going to want to, you know, want them to change very often.
You know, I'm thinking I could have come up with a support contract model for ChaiScript
where I charge six figures for each bill that needed to be made.
It's not too late.
Okay, well, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook, Twitter
Or email us at feedback.cbgas.com
And don't forget to leave us a review on iTunes
Or subscribe on YouTube
Joining us today is Ken Museth
Ken is CEO and co-founder of Voxel Tech
Which contracts to tech companies
Primarily in the movie and aerospace industries
Ken currently does contract work for SpaceX
And Weta Digital
Previously he was director of R&D And Senior Principal Engineer at DreamWorks Animation.
He was also a professor in computer graphics at Linkoping University and a visiting faculty member and research scientist at Caltech.
He worked on trajectory design for the Genesis space mission at NASA's JPL, and in 2015 received an Academy Award for the development of OpenVDB.
Ken, welcome to the show.
Thanks. Great. Great to be here.
Yeah, so your bio has built right into it a question that I've been wondering about recently,
and that's professorship. So you were a professor. Do you have a PhD in a related field,
or is that because of your experience that you were made a professor? How did that work out?
That's actually a long story. So my background, my PhD is in physics, in quantum mechanics.
Okay.
And I came to Caltech in, I believe it was 1999, as a physicist or chemist.
And after about a year and a half, I changed field to computer science.
So I spent about five years sort of transitioning from one field to another.
And after that, became a professor in computer graphics in Sweden.
Okay.
So strictly speaking, my background is actually not computer science, but I've been a professor
in computer science.
Interesting. And you said your background was quantum physics?
Yep. Quantum dynamics. So the idea that you can simulate
molecules
very small molecules
I should say
like three or four body molecules
using the
principle of quantum mechanics
from scratch
so it's funny
because it's a field where
physicists would say it's not really physics because there are too many particles and chemists would say it's not really chemistry because it's a field where physicists would say you know it's not really
physics because there are too many particles and chemists
would say it's not really chemistry because they're not
enough so sort of in between
the two fields
but it involves it sounds like a
strange transition to go from
quantum mechanics to computer science
but it's not that big of a
stretch in the sense that what we did was
everything was simulating on computers.
Does that background give you any insight into the hype around quantum computing?
Probably not, no.
But I think it has certainly, if you look at what I've done in computer graphics and computer science,
it's definitely flavored by the physics background.
Sure.
So most of what I've done in the movie industry
has been physics-based simulation.
Also the stuff we do at SpaceX is physics-based,
computational fluid dynamics.
So it's there somewhere in the background.
Wow.
Well, Ken, we have a lot to talk to you about, obviously,
with OpenVDB and everything else you work on.
But first, we just have a couple news articles
to discuss. Feel free to comment on any of these
and then we'll start talking more about everything
you do, okay? Cool.
Okay, so this first one is
an article from
a PhD. We've had him on twice on the podcast
before, and it's going full
circle on embed in C++.
And he's kind of going over the latest efforts to get the embed proposal standardized.
And it starts off great.
He presented the paper in front of the Evolution Working Group, and he got a very great response. But then he went on to SG7,
where they kind of pushed back on him a lot,
said it should have a more generic API,
and told him to go investigate the way things are done
with this Circle programming language,
which I had not heard of before.
Have you, Jason?
Maybe.
Maybe?
I'm not sure, actually.
Okay.
Well, and then the interesting thing is he kind of does a bunch of tests to kind of show the committee how embed works. of that programming language who decides he's going to go ahead and make a embed feature for
circle instead of doing whatever it was the committee is trying to have him do yeah right
so i think maybe filling in a little bit of like background in this story as i understand it that
that wasn't explicitly listed here is with circle uh if i understand correctly circle can do anything at
compile time that could be done at compile time okay so they are able to open a file just use
i o streams or whatever and load that into a compile time buffer right and i believe that's
what the standard committee was asking john he to look into into. Like what, as far as a more,
because I read this several times going,
what the heck do you mean by a more generic approach? Like read over that sentence.
I believe that's what they're looking for is can we do this like regular file
IO at compile time?
And so you can do that in circle,
but it's crazy,
crazy slow.
So the author of circle is like,
Oh no,
it's actually better to have a standard language
feature for this. Exactly.
Which is what John Heed is trying to do for C++.
Yeah. So that's
why he felt like he came full circle.
So I
looked at some of the comments on Reddit
and I know John Heed
seems a little sad
at the end of this blog
and a little unmotivated
maybe, but I do hope he kind of goes
back to the committee and kind of
presents this.
Says, look, even the language
author of Circle is going to go
and make this a feature because it
doesn't make sense to do it the way you suggested.
I hope he goes forward with it. It seems like
a great feature. Yeah, I'm
totally in favor of this and we haven't let our guests speak at all uh on this like uh from i've worked on several projects and in fact i'm working
on one right now we're being we need to embed raw file data in the c++ executable and there's
certainly linker hacks and there's uh conversion to like bin hex kinds of things that you can do.
And they're all just ClueG.
And being able to just directly like embed the file right there
to me would be hugely helpful.
Is this anything you run into in your work?
We typically roll our own IO routines.
So IO is very, very important for actually NVDB.
We support delayed loading.
So the idea that you load data on demand and can sometimes unload it again.
And the best way of getting really good performance is to sort of make use of the fact that you know,
you have firsthand knowledge about the data layout in memory, and you reflect that in your out-of-memory representation.
So, yeah, we tend to roll our own, to be honest.
So then you're never really concerned about, like, including things at compile time, like embedding resources in?
No, I don't think so. No.
We can just go through these other articles real quick.
Qt has a 5.14 update.
A couple interesting things here.
They have a new graphics stack that is not based on just OpenGL anymore.
You can have an abstract layer on top of Vulkan, Metal, or Direct3D, which seems great.
But I'm not really an expert on Qt,
so I can't speak too much to this. I do have a question, I guess, for our listeners,
if anyone feels like chiming in on Twitter or Reddit later, is the last large Qt project that
I started on was now about 10 years ago. And when I was doing the GUI work on it, so I had a little
bit of a gap in my
Qt experience. It started in like 2003 and I used it for a few years and then I took a couple of
years off and then I went back to it. And I'm like, well, it seems that everyone's using like
QML or whatever now to do like the layout of your GUI instead of hard coding that directly into
the C++, which is the old school way of doing it. I could not get anyone
else on that team to go along with like using QML or QT. Well, I don't think I don't think QT
quick existed yet. But QML anyhow to do the like markup for the GUI layout. And I'm just curious
for our listeners, what do you do? Do you hard code your GUIs in C++, or do you use one of these modeling languages that Qt has provided?
Yeah, I mean, I know we've done episodes on Qt before,
but maybe it'd be nice to get someone on
who has been using a lot of Qt to make GUI applications.
And ideally someone who's been doing it for like 20 years
so they know the whole history of how these things went.
Right.
Okay.
And then the last thing is people supposed now is accepting student
volunteer applications.
So if you're interested in going to Aspen in may,
I should definitely sign up.
Absolutely.
Yeah.
Anything else you want to mention about that,
Jason?
Well,
I mean,
we've had so many students and ex students come on the show.
It's a,
there's a couple of testimonials feel from people about how it kind of changed their career and stuff.
And, yeah, I mean, it's almost certainly not a bad idea to get a free trip to Aspen as a student and meet a bunch of people in the C++ industry.
Yeah, definitely.
Okay.
Well, a listener wrote to us a few months ago, Ken, suggesting we do an episode on OpenVDB, which I mentioned in your bio you won an Academy Award for.
Could you tell us a little bit about what OpenVDB is?
Sure.
So OpenVDB is essentially two things.
It's a compact data structure. It's a data structure that essentially
allows you to associate
values
with coordinates, three-dimensional coordinates.
So given an XYZ
index,
which actually can be negative,
they can have negative values too,
you can associate a value
and it's templated so it can be
a float, it can be a vector,
it can even be a set of points.
And the idea is that this data structure,
this layout is sparse in the sense that
it's only allocating exactly, or not exactly,
it's allocating as little memory as allows for fast performance.
So the footprint grows as you add more and more values to this volume.
So that's one component.
And the other component are a very large library of tools that are written on top of this data structure.
So the data structure itself is what's called VDB.
And then there is this pretty
significant tool sets.
And it's been
developed primarily
at DreamWorks Animation
and it's now become sort of the
de facto standard for volumes
in visual effects. But it's also used outside
of the visual effects
industry. That's sort of the visual effects industry.
That's sort of the short version.
So I think it's important to realize that it's fairly easy to write a fast volume data structure.
It's also fairly easy to write one that is sparse
and doesn't occupy a lot of memory,
but it's actually surprisingly difficult to write one
that's both fast and sparse.
And that's sort of what this,
that's a claim to fame for this data structure
is that it's fast.
It's obviously not as fast as a dense volume,
but pretty close.
It has the same API,
so you can set values.
This is what we call random access. So given an XYZ, you can set a value or you can set values. This is what we call random access.
So given an XYZ, you can set a value or you can get a value.
And it also uses iterators to visit values that you already set inside the data structure.
So it's a little bit, I mean, you can draw a lot of analogies to the standard library containers like an STD vector or some of the other ones.
An STD vector, just to use as an example,
has very fast
random access because we can
look up a value associated with an offset
in O1 time
complexity, so super quick.
It's not as fast to insert
values because, as we all know,
an STD vector doubles
its allocated memory pool
when we sort of exceed the current buffer.
So in worst case, we actually have to reallocate
and copy values over,
whereas VDB actually achieves O1 complexity
both for inserting and deleting values.
And then, again, so the same analogy.
So we have, just like we have iterators on an STD vector,
we also have iterators on a VDB container.
So as you set values, you can then revisit the values
and sort of massage the values,
either if you do physics simulations or geometry manipulation.
That's a very convenient way of doing it.
I think some of the other highlights is it's unbounded,
which actually turned out to be...
I was very excited about it when I achieved it,
but there was a lot of pushback from a lot of the users
because it's sort of a paradigm shift.
Because normally when you deal with volumes, you sort of define a box. It's almost of a paradigm shift because normally when you deal with volumes,
you sort of define a box.
It's almost like a sandbox.
And you do your computation inside that box.
And you just pray that you never sort of run out of space
because if you do,
then you have to reallocate everything
and sort of continue.
Whereas VDB, there is no box.
Like you can literally associate a value with any index value within 32-bit precision.
So let's say you're doing a fluid simulation and you may not exactly know where the fluid
is actually ending up or you do a smoke simulation.
The traditional way of doing it is you set up a box, you release your fluid,
and you keep simming, and as soon
as it hits the boundary,
depending on your boundary condition,
you may, you're sort of,
you're certainly losing anything
that goes on beyond the boundary.
Whereas for these types of grids,
they keep adding values.
So it can literally go
on almost forever.
Now, of course, you're still
paying the cost of allocating
the memory.
There's no free lunch.
At least you're only allocating
what you need.
That's really the main benefit.
I think
another
nice feature is that it has of has a built in what we call a bounding
value hierarchy. So it's almost like it's an acceleration structure that tells you where you
have interesting information in the volume. And that's very useful for things like rendering or
ray tracing. So when you're shooting a ray,
you're trying to create essentially an image of, let's say, a smoke cloud or a water simulation.
You need to know where you have information.
You don't want to do ray integration
through all of space
because potentially there's a lot of empty space.
And because VDB is actually underneath,
it's a tree, it's a special type of tree. The nodes of the tree also tell you, they give you spatial information about where you have
information where you don't. So you can sort of do intersection tests of the ray against these,
the extensions of the nodes. So that's very useful for several applications. So when you iterate, just taking
a step back for a moment, when you iterate, you're not going to iterate over every possible
point in space. You're going to iterate over all the points that have a value currently.
That's correct. Yep. So sort of the one extreme, hypothetically, the one extreme would be I'm only going to store in this data structure the exact values that you push into it.
That's typically how an STD vector would work.
The problem with that is that the cost of inserting an element and looking it up can be costly in 3D because you don't know where these values are.
They can be anywhere. They can be
very sparsely populated in space.
So what VDB does,
instead of only inserting
that one value, it
inserts a tiny
box,
a cutout of the surrounding
space that the point belongs to.
So it is over
allocating a little bit.
And so
as you allocate a new
this is actually what's called
a leaf node in the tree. As you allocate
a new leaf node, put
a value into the leaf node, you
need a way of marking
that this is a value
that you inserted.
And it does this using bitmasks.
So there's a single bit associated
with the exact value that you just pushed in there.
And this turns out to be extremely useful
for many different applications
because this bitmask can be used
not only to sort of identify the value again,
but you can also use it for iterations, for these,
when you implement the iterators, because the
iterators now essentially become
they're searching through
the bitmask. They're searching for
on bits and off bits.
And this can be done
very efficiently. You can also
alter other operations that you can perform.
So imagine you have two different grids, two different trees,
and you want to perform a Boolean operation.
Like you want to do essentially an intersection.
You can actually do that using these bitmasks as well
because you can do bitwise OR operations in this case.
So how large – I'm sitting here basically thinking,
okay, you're trying to minimize the number of dynamic
allocations, you're trying to maximize your
cache friendliness and performance.
How large are the
over-allocated hunks that you do?
So that's... you can
configure that. The default configuration
has a leaf node size that
is 8x8x8.
So we've done many experiments
that seem to suggest that that is an optimal
sort of balance between performance
and memory footprint.
On the GPU, interestingly,
this tree tends to operate faster
if it's a larger leaf node.
So 16x16x16 is typically what we use there.
But you can actually configure it. So the implementation is highly templated.
In fact, the leaf node itself is templated on the value type. It could be double, it could be
float, it could be a vector, it could be a point. And then the child node of the leaf node, the internal node,
is then templated on the leaf node type.
And you sort of sequentially
or hierarchically template
until you reach the root node.
And the root node is then templated
on all the nodes below it.
And this turns out to be,
it's very, very good for performance
because we get a lot of inlining
and we can do template metaprogramming.
So a lot of the executions
that are needed to traverse the tree
are actually resolved at compile time.
So that's part of the secret sauce
to the good performance.
Does that affect compile time adversely?
It does.
So this is one of the complaints.
In fact, if you listen to the keynote
from the CPP Con,
the keynote speaker, Marc Allain,
actually used OpenVDB as an example of,
I think he even called it overuse of templating.
I obviously disagree with that,
but it is true.
You can get some pretty hairy compile times.
But honestly, I would much prefer longer compile times and faster execution run times.
Okay. I'm not going to argue with you.
I've been involved in lots of discussions recently
about which do you prefer and how do you set that balance and how do you weigh the developer's time versus what is the most important thing for your project.
And it sounds like runtime performance is the most important thing to you.
Yeah.
But I see the other side of this as well. I understand that it can be frustrating at times. Yeah. But I see the other side of this as well.
I understand that it can be frustrating at times.
Right. So just
out of curiosity, how long are long build
times for tools using your
OpenVDB?
It's funny because building
the library itself takes no
time because it's just essentially header files.
Right. But building
client code,
actually building, let's say,
the unit test that the library comes with,
that can take, I think,
depending on your hardware,
it can take five minutes building and running.
I've heard much, much, much worse.
Right, right.
Actually, I think the biggest complaint is the memory usage that you use.
So if you just, let's say you have a 64-core machine or even 32-core machine,
and you just let it go, like make minus J,
you can actually run out of memory even with 64 gigabytes
because it just
keeps spawning build jobs
and they take quite a lot of memory.
So sometimes
you've got to be careful.
I just want to rewind
a little bit and make sure listeners
understand what it is that
OpenVDB is used for.
You mentioned smoke and fluid analysis.
Right.
So as I said, it was developed initially for visual effects,
so volumetric simulation.
But it's not sort of limited to that application.
It is actually being used in engineering as well.
So the stuff that I'm involved in at SpaceX
makes use of VDB.
So there's nothing limiting it to visual effects.
But anything that involves a sparse volume,
so a volume that doesn't require information everywhere.
And, you know, that is...
Not every application
lends itself
to these sparse applications
or sparse data structures,
but a lot of them do.
So, yes,
fluid animation
is a great example.
Manipulating geometry,
fire simulation,
there's also,
in engineering,
things like topology optimization, where you're trying to
determine the shape
of a material
to meet certain constraints, like
material constraints,
stress, things like that.
It can be
used for scientific
visualization, medical imaging.
Yeah.
It's really up
to you. We've had lots of
people reach out from fields that
we never thought of would make use of
this, which is quite exciting.
Since I've never
done anything like this, I find myself
slightly surprised that
integer coordinates give
you the resolution that you need for these kinds
of simulations. I would have just assumed that you would want floating point resolution or something.
Right. So that's a good point.
So oftentimes when you solve problems in engineering, and that translates to graphics too,
you're trying to solve these partial differential equations.
And a very common technique
is to sort of discretize them
on a grid, a background grid.
But you are right.
There are other ways of doing it.
You could also have used particles instead.
Where the particles,
you could essentially think of it
as grid points that are free to move around.
And VDB actually supports both.
So you can also store particles
associated in the grid
so each voxel
so a voxel is a 3D analogy
of a 2D pixel
so it's just a
it's the smallest unit
that you can address in a volume
and you can have in a volume.
And you can have particles that live inside that voxel,
or you can have a single value, a single floating point value.
But we tend to use these grids for solving things like fluids.
But as I said, you can certainly also use particles and floating point positions.
I wanted to interrupt the discussion for just a moment to talk about the sponsor of this episode of CppCast,
the PVS Studio team.
The team promotes the practice of writing high-quality code,
as well as the methodology of static code analysis.
In their blog, you'll find many articles on programming,
code security, checks of open-source projects, and much more. For example, they've recently posted an article which demonstrates not in
theory, but in practice, that many pull requests on GitHub related to bug fixing could have been
avoided if code authors regularly use static code analysis. Speaking of which, another recent
article shows how to set up regular runs of the PVS Studio static code analyzer on Travis CI.
Links to these publications are in the show notes for this episode.
Try PVS Studio.
The tool will help you find bugs and potential vulnerabilities
in the code of programs written in C, C++, C Sharp, and Java.
You mentioned that there's a lot of tools that make use of OpenVDB.
Are these part of the library?
Yeah, so the library itself has implemented a lot of high-level algorithms.
Things like, you know,
how do I go from a polygonal representation of a surface
to one of these VDBs?
How do I go back again?
How do I go from a volume
to a polygonal representation?
This is called scan conversion.
So the truth
is most of the surface
representations that we encounter
certainly in visual effects
but also in engineering
are actually modeled
using triangles or
quads. So polygons.
Think of a computer-aided design CAD model.
It's typically triangles.
So how do you take that and sort of represent it as a volume instead?
That's quite a challenging task,
and we have algorithms in the library that does that for you. But there's also
all the
mechanics you need to solve these
equations that I talked about earlier.
Let's say you're doing
a fluid animation, you're doing fire or explosion.
We actually base that on
what's called the
Navier-Stokes equation or equations.
These are
equations that come out of physics.
It tells you exactly how the fluid is changing over time.
And it involves these partial differential equations.
And we can discretize it.
We can solve it on the grid itself.
And we have libraries, support libraries in there.
So most of what ships with VDB is quite low level.
And so there are many
commercial packages that sort of pick up
VDB and implement it
inside their own
node graph or
inside their own renderer.
So I think probably the
most well-known example is
Houdini. So Houdini is a
third-party commercial product,
sort of the de facto standard for visual effects.
And it internally uses VDB for sparse volumes
and makes use of a lot of these algorithms.
And you mentioned on the side, and I let it slip past,
you said when you're doing these things on the GPU.
So does that imply you have built-in support
for doing these calculations on GPU?
Great question.
No.
So VDB, which if you download it today,
is strictly for the CPU.
But I and others have certainly been playing,
toying with what does it mean
to take the ideas behind the data structure
and porting them to the GPU.
So I would say a lot of this is ongoing work.
But yeah, so my own discovery,
and that's been backed up by other people I've talked to,
found that the configuration that we use on the CPU
is not ideal for the GPU. So especially the leaf node CPU is not ideal for the GPU.
So especially the leaf node sizes are too small for the GPU.
But this is actually an area I would love to work some more on,
porting the library and porting some of the algorithms to GPUs.
I mean, obviously you're using C++,
and you were just talking about templates and template metaprogramming.
I'm just curious,
have you been following the C++ standards?
What do you require
today?
Yes, we do.
The industry
as a whole, the visual effects industry, tend to
be a little bit slow.
There's
a committee called the Visual Effects
VFX Platform
that keeps track of
the different versions of libraries
and compilers
that the industry as a whole
uses. And right now we're using
we're up to C++ 14.
So that's also what
we use in OpenVDB.
And we're very eager to move forward,
but we have to sort of follow this standard.
So we're trying to be good citizens.
Sounds almost as bad as the auto industry.
Oh, yeah.
I can imagine.
Yeah.
Yeah.
So we're at GCC,
I think it's 6.3 or 6.2.
Oh, well, that asks all of C++17 language support,
but not all of C++ library support.
I was just looking this up yesterday.
Oh, okay.
Coincidentally.
Cool.
I was looking at the...
Oh, I'm sorry. Go ahead, Jason.
No, no, go ahead. Go ahead, please, Rob.
I was going to say I was looking at the last changelog
for, I think, version 7 of OpenVDB, and it mentioned new measurement analytics for level sets.
And I was curious, what exactly is a level set?
So this is, I have to admit, this is one of my absolute favorite topics.
So I'll try and restrain myself and be short. So as I mentioned earlier, most of the representations that we use for surfaces are polygons.
And polygons are great for modeling, but they're not ideal when a surface deforms or changes.
Imagine that you have a surface that represents the interface between water and air, so essentially a water surface.
And let's say you throw something into the water
and you create a splash, a crown splash,
with lots of droplets that are being, you know,
that are formed and breaking off the main surface.
So essentially you have a very, very complicated surface
that changes over time, that deforms over time.
Now this is a great example you have a very, very complicated surface that changes over time, that deforms over time.
Now, this is a great example where these polygon representations
really struggle.
Because as you can imagine,
as a surface breaks up
and collides with itself again,
figuring out exactly how to represent that
with polygons and triangles and vertices
is a very, very challenging task.
So it turns out
that there's a much, much simpler
way of representing surfaces
that change what we call topology.
So essentially they break
up or merge.
And that is to use
what's called a level set.
And a level set is sort of a glorified isosurface.
What is that?
What is that?
So I think most of us are familiar with height fields, right?
So if you have a topographic map and you have – it's in 2D.
And every point on this map
gives you the height of the landscape
or the mountain.
And you can sort of,
you can draw contour lines
and these contour lines
connect all points
at the exact same height.
Right.
So it's a great way of sort of,
you know, conveying visual information
about how tall is the landscape, how does it vary.
Now if you take that idea
and
add on an extra dimension,
so we're no longer talking about a map,
we're talking about a volume. So at every
single point, x, y, c,
we have a single value associated
with it. So now
we have a volume with values.
And if you now imagine that
you
defined a surface
that consists of all the points
with one specific value.
So it's sort of a height field
but in 3D. This is what we
call an isosurface. It's like trying to visualize
a hypercube, basically. It's very
difficult. Yeah.
If you've never tried it before,
I'm sure you'll get a headache.
But so that's sort of the simplest idea,
an isosurface.
That's been around for many, many years.
Now LevelSets takes this slightly further,
to put it mildly.
So it adds a lot of mathematics underneath
that tells you how to massage the values in order to deform the surface.
So let's say you're actually doing a fluid simulation.
We represent the surface, the water surface itself, using one of these level sets.
And we can move the water surface around and sort of create the illusion of a crown splash or breaking dam or whatever,
some complicated behavior of the water by solving equations that massage these values,
these levels and values.
And then you may ask, so what do these values really correspond to?
And typically they are the distance to the surface.
And it's actually often the sign distance.
And the convention is typically
every point that is inside the surface
has a negative value
and every point that is outside the surface
has a positive value.
And then obviously the isosurface
that I was talking about before
now is the zero level set.
So all points that are exactly zero are exactly on the surface.
And that's the surface that you're tracking.
That's the surface you're interested in.
And there's one more link to VDB.
So as you can imagine, since you're really only interested in the zero ISO surface, because
that's where the surface is, everything else is kind of redundant. You don't actually need
to track the exact values that are very far away from the surface. So if you take this to the
extreme, all you need is a very narrow band around the surface itself. This is what's
called a narrow band level set. It just means I only need distance values in close proximity to
the surface that you're deforming. And that, as you can imagine, is a very sparse data set, right?
Because as soon as you're a certain distance away from the surface, you don't care about the value anymore.
But when you're close, that's when you want to store the value.
And this is where VDB is ideal because you have a very sparse volume that only acquires values in close proximity to the surface. So instead of storing a box of values everywhere
which as you can imagine
will be very very memory expensive
or memory heavy
especially if this box grows
over time, VDB allows you to
only allocate values in close
proximity to the surface and the
values will be updated as the surface starts
moving. So in that sense
it's almost the ideal memory footprint.
So you guys look like a continuous sledgehammer.
I'm thinking, like, when you first presented this topic of what OpenVDB is
and how it can store, you know, any objects in the 3D grid,
my immediate assumption is like, okay, at this point,
one, two, three, I have a particle with these properties that's moving in this direction,
or I've got a voxel that is this color. But it sounds like that is just the wrong way to think
about this entirely. Like it can be used for those things. But if you really want to get the most
benefit out of it, you need to think of it for a meta level, like it can just be used for those things but if you really want to get the most benefit out of it you need
to think of it for of a meta level like it can just be used to do anything that you need to be
keeping track of that happens to be or like properties about properties or something sure
but but you actually nailed it because at the core it is exactly what you said. It's the ability to associate values with X, Y, C, or I, J, K coordinates.
Right.
If you can do that, there's a lot of other things you can do.
Like you can actually deform it.
You can – as the surface is moving through space, you need to allocate and move these grid points around.
So when I say move,
I don't mean move in the sense of
pushing particles around.
What I mean is
you need to allocate new grid points
and maybe remove old grid points.
So as you think of a grid
in three-dimensional space, and as the surface sort of sweeps out this 3D space, it's intersecting certain grid points in this volume.
Just so I can pause you.
Just saying, make sure I'm following along. Because at the moment, I'm visualizing, as you're saying, this surface of the water, right?
Because this is where your example started.
I'm visualizing the outside of a fish tank where I can see the surface of the water in a 3D volume.
That's limited.
Right.
Is that okay?
That's great, except I want to get rid of the tank itself.
Okay, okay.
In the sense that, I mean, it's a good metal picture
because this is actually how people would traditionally do volumes.
You would have a fish tank,
and inside the fish tank you would have a grid, a 3D grid.
And you can associate a value with every single grid in the tank.
What VDB does, it gets rid of the tank itself.
Right.
Because there's sort of no spatial limitations
to where you can have values.
Okay.
But it also sort of trims away values
that are away from the surface itself
because they're kind of redundant.
You don't really need them.
Water at the bottom of the tank is not going to move.
That's right.
So can you set a limit and say, anything that's outside of the bottom of the tank is not going to move. That's right. Can you
set a limit and say anything that's
outside of the range of these particles I don't care
about or do you manually
have to prune them?
That's part of
the application that you write on top of VDB.
VDB has
low-level methods to add values
and remove values and
even dilate values.
You can imagine that I have inserted a few discrete values into the volume,
and now I want to add all the neighbor points.
That's an operation called dilation.
You can do the opposite.
You can say, I've added lots of values, and I now want to shrink it.
That's erosion.
So the data structure itself supports these fairly
primitive operations and the client
code can make use of it to
do things like
tracking how a surface of
water moves or
taking
geometric pieces and gluing them
together, unioning them together.
These are the higher level applications
you write on top of it.
But the data structure itself
supports these fairly simple operations
that become sort of the building blocks.
But yeah, so the whole gag is that you want to perform certain operations using as little memory as possible.
Right.
And the reason you do that, obviously, is that you want fidelity.
You want your surface to be high resolution, as high as possible. and if you take the traditional approach which is when you have your fish tank if you make the fish tank really big
then the overhead of
allocating this
redundant memory can very
quickly kill your performance
and even your computer
you can quickly run out of memory
so you can care about
just the particles that are on the surface
and if the surface moves up
trim the particles that the data that used to be below it.
If it moves back down again, trim the old data.
And you're always using the optimal amount of memory.
Yeah, that's exactly right.
So how many particles is a lot in your world?
I'd say over a billion.
Over a billion is a lot?
Yeah, that's starting to push.
But, I mean, yeah,
it obviously depends on your hardware.
It depends on your application.
So there's sort of a golden rule
in visual effects,
which is your simulation
should never be longer than you can,
you know, an artist can run it overnight.
Whereas in scientific computing,
you know, oftentimes we run simulations
for weeks or even months.
So, yeah, it depends on the application.
So a lot of the kind of questions that I'm personally asking you right now is I've been having classrooms full of physics students this past year.
And I've been trying to think, okay, how do I tell them about this library and what application does it have to them when simulating subatomic particles, basically.
Ooh, right.
Oh, I don't know.
Does your application involve volumes of some kind?
So actually, there's another thing,
there's another line of applications
that it's also very useful for,
which is, let's say you have,
you actually have particles
and the particles move around.
And in many applications,
you need to know
about the local sort of topology of the points.
You need to search your immediate nearest neighbors.
So, you know, the classic way of solving this problem
is to use things like a KD tree
as an acceleration structure.
The problem with a KD tree is
as you add more and more points,
these search operations have an unfavorable scaling.
They can get very, very slow.
So this is another example where VDB
can actually be used and has been used.. So this is another example where VDB can actually be used
and has been used.
So you can imagine that you use VDB as a way of bucketing points.
So you have your points and you shove them into a VDB tree
and they may fall in different voxels, different nodes,
but because VDB can do all of one lookup,
so you can very quickly
ask the following questions.
How many particles are stored
at this position, IJK?
Now, that can be sort of a building block
for exactly answering the question
of how many particles do I have
in my immediate neighborhood?
So that's another.
Maybe that is an application for your students.
I don't know.
I think, well, we're, I think, running low on time now.
But their simulations currently are closer to ray tracing.
And they need to know, is there an object in this point in space
that I'm going to intersect?
And what are its properties?
And then what does that do to the refraction of this particle?
Right, yep.
That is definitely sort of a common application.
So you can, if, whatever object you're trying to ray intersect with,
if you can sort of voxelize it,
if you can give it a gridded representation,
imagine that it's,
I don't know, it could be a polygon, it could be some
shape. If you have
a way of
essentially detecting what
grid nodes does my
object intersect with,
these grid nodes you can then
mark as
active in the grid, and then you can shoot a ray
against
the tree structure itself.
Because you can do very quick ray intersection
tests against the tree structure.
So essentially what you want to know
is does my ray intersect
any grid,
any objects as it's
going in a certain direction.
And exactly that type of test can be done
very efficiently against the tree itself.
It sounds like you could choose between accuracy,
like you could say, well, no, nothing was there,
so I'm going to give up a little bit of accuracy.
Or you can say, okay, there might have been something there.
Now I have to actually verify that, yes,
I did exactly intersect something.
Correct.
Yep.
Okay.
So you can skip empty space quickly,
and then you have a candidate.
Right.
So you're right.
You still have to do a ray intersection test
against whatever object you have,
but you want to limit that because that's typically very costly,
and you certainly would prefer
only to check the array against a single
candidate or maybe a handful of candidates,
not all of them.
So yeah, that's a very common
type of test.
Thanks for letting me pick your brain on that. Hopefully
for our listeners, that was interesting as well.
Yeah, cool. Absolutely.
Well, I think we are running out of time, but
before we let you go, I don't think we ever said,
what does VDB actually stand for in OpenVDB?
It's a funny question.
I get it a lot.
The dirty secret is actually nothing.
It's an acronym.
And the story behind it is,
it was called many, many different things in the past.
And sort of the last iteration, I was calling it DB plus grid.
And as we were sort of releasing it into the wild at DreamWorks, a lot of the artists were saying, you know, this is too nerdy.
We have no idea what that means.
So we actually had, we asked people like, what should we call it then?
And VDB was one of the top candidates
that came out.
I mean, you could still,
you know, an obvious candidate
would be a volumetric database
or volumetric dynamic B plus tree
or whatever.
But the truth is it means nothing.
It's just a name.
And you also mentioned
that there's been Academy Awards involved here.
Is that for any particular movie's work,
or is it one of these Technical Academy Awards for Technical Achievement
or something like that?
Yeah, it was a Technical Academy Award, Technical Achievement Award.
But it has been used in more than 150 movies at this point.
But you're right, it wasn't for a particular movie.
It was as a technique.
150 movies, that's all fairly good.
Yeah.
Was there a particular movie you were working on
that kind of initiated the development of OpenVDB?
I think the first one that we applied it to
was Puss in Boots at Dremux.
We were modeling these
very intricate and detailed
clouds.
I wouldn't say it was developed for that movie,
but that was our first test.
But since
it's now found its way
into some of the commercial packages,
many artists may not
even realize that they're using it.
So it is a free open source license
that it can be integrated into other commercial packages.
Absolutely.
Yeah, absolutely.
And we didn't discuss that at all.
That's true.
Actually, I should probably mention that
it was primarily developed at DreamWorks.
I had three other guys joining the project.
It was Peter, Mihaly,
and David.
Since then, all of us have left
and it's
now actually run not by DreamWorks, but
by the Academy Software Foundation.
So it's an official
library, I guess.
Oh yes, it's now sort of a neutral
entity that's running it.
So that's quite exciting.
So the project is getting a second wind.
There are some very talented people from other studios
that are now pitching in and helping.
Looks like everything's up on GitHub
if any of our listeners are interested.
Yep, that's right.
Very cool. Okay, well, thanks so much for coming on the our listeners are interested. Yep, that's right. Very cool.
Okay, well, thanks so much for coming on the show today, Ken.
Well, thanks for having me.
Yeah, thanks for coming on.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in,
or if you have a suggestion for a topic, we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter.
You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter.
We'd also like to thank all our patrons who help support the show through Patreon.
If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode is provided by podcastthemes.com.