CppCast - Data Oriented Design
Episode Date: January 18, 2018Rob and Jason are joined by Balázs Török to talk about his work in the Video Game Industry and his thoughts on Data Oriented Design. Balázs Török is a Senior Tech Programmer at Techland.... He has more than 10 years of experience in the games industry. Balázs learned the ropes at Hungarian companies by making smaller titles and then moved to Poland to work on The Witcher series. He was the Lead Engine programmer on The Witcher 3 and now he is working at Techland on another promising project. News Matt Godbolt: Meltdown and Spectre CppCast YouTube Channel Free ebook on C++ Notes for Professionals Conan C/C++ Package Manager hits 1.0 Meltdown checker/PoC written in C++ Guy Davidson - Diversity and Inclusion - Secret Lightning Talks @ Meeting C++ 2017 Balázs Török @m0radin Links CppCon 2014: Mike Acton "Data-Oriented Design and C++" StackOverflow: What is Data Oriented Design? Sponsors Backtrace Embo++ Hosts @robwirving @lefticus
Transcript
Discussion (0)
Episode 134 of CppCast with guest Balazs Tarak, recorded January 17th, 2018.
This episode of CppCast is sponsored by Backtrace, the turnkey debugging platform that helps you spend less time debugging and more time building.
Get to the root cause quickly with detailed information at your fingertips.
Start your free trial at backtrace.io slash cppcast.
CppCast is also sponsored by Embo++.
The upcoming conference will be held in Bokom, Germany
from March 9th to 11th.
Meet other embedded systems developers
working on microcontrollers, alternative kernels,
and highly customizable zero-cost library designs.
Get your ticket today at embo.io. In this episode, we talked about updates to the Conan C++ Package Manager.
Then we talked to Blas Turok, game engine developer at Techland.
Blas talks to us about data-oriented design
and some of the confusion around the concept. Welcome to episode 134 of CppCast, the only podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
All right, Rob. How are you doing?
I'm doing okay. I'm hoping I don't get snowed into my office right now.
And depending on what part of the world you're from, snowed in means about two or three inches of snow.
One to three inches in North Carolina can be pretty severe.
And it's starting to build up a little bit, so I'm probably going to be heading home once this interview is over.
Yeah, to be fair, around here, if it was the first snow of the season, that would have an impact.
If it was the third or fourth snow of the season, people mostly wouldn't notice.
Yeah.
Yeah.
Anyway, at the top of our episode, I'd like to read a piece of feedback.
This week, we got a lot of tweets from last week's episode.
This one is from Sandeep, and he wrote to Matt Godbolt saying,
this is great.
I listened to your explanation about this on CPPcast as well.
Thanks for the video.
And this is in reference to Matt,
who in addition to being on our show last week
talking about Meltdown Inspector,
he also released a video on YouTube
talking about Meltdown Inspector.
So if you felt like you were missing anything out of
just hearing us talk about it and you want to see
some visuals to help go along
with the explanation, I
highly recommend going out and looking at Matt's video.
Yeah, I don't know how often Matt's
planning on releasing videos, but
should probably subscribe to his channel
just in case. Yeah, and
speaking of YouTube channels,
we talked, I think on like this last or second
to last episode of 2017 about possibly doing YouTube uploads of CB cast videos. And I did
go ahead and start doing that. Um, so I'll put a link to our new YouTube channel. Uh, so far,
I think I've put like 30 or 40 of the most recent episodes up, and I'll
continue uploading videos from the back catalog and new videos as we release them every week.
But it is still just audio content only. We're not going to start doing actual video anytime soon.
No, I'm not dressed for that. No, neither am I. But that the interesting uh side effect of giving us at least some level of
a transcription for people who care about that right and is that something you just automatically
get in youtube because i might need to find out how to dig that out do you have a transcript yes
it is automatically there and i believe you can disable it but it is their automatic transcriptions
by default and i know that at least one of our listeners had been looking at it and sent it to me.
Okay, great.
Well, we'd love to hear your thoughts about the show as well.
You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com.
And don't forget to leave us a review on iTunes.
Joining us today is Balazs Tarak.
Balazs is a senior tech programmer at Techland.
He has more than 10 years of experience in the games industry.
Balazs learned the ropes at a Hungarian company by making smaller titles
and then moved to Poland to work on the Witcher series.
He was a lead engine programmer on The Witcher 3
and now is working at Techland on another promising project.
Balazs, welcome to the show.
Hi.
Hey, I'm kind of curious because I know I've talked to a lot of people in Europe about,
you know, this movement of people through Europe and how basically a lot of the work
just gets done in English because you might have someone who's Spanish or French or German
all working in the same office.
Is it similar for you going from Hungary to Poland?
Yeah, it's kind of similar.
There's still quite a bit done in polish when it's between
only polish people but yeah i i have my own bubble of english moving around with me
okay well blage we got a couple articles to talk about feel free to comment on any of these and
then we'll start talking to you more about your game development work okay cool okay so this first one is a free ebook uh c++ notes for professionals
and this is kind of interesting because it's an ebook that's produced from stack overflow content
is that what it is that appears to be what it is yeah they they took a whole bunch of
stack overflow c++ posts and condensed it into an e-book and obviously allocated it by different chapters.
So I guess there's multiple Stack Overflow posts to produce each chapter.
But it seems to cover a lot.
Well, yes, I definitely noticed it covered a lot.
I didn't realize where the content came from.
I was looking at it thinking, I don't think I would have organized this the same way,
but there's a lot of information in there
and it covers through C++17
at least.
It says at the very bottom
the book is compiled from Stack Overflow
documentation. The content is written by the
beautiful people at Stack Overflow.
Wow.
I'm not sure if they've done
other books similar to this,
but it's a pretty neat idea.
Yeah.
Well, actually, a few years ago,
there were books done a similar way from Wikipedia.
Just like someone started a Wikipedia search on a certain page,
followed all the links, collected them into a book, and published it.
How well did that work out?
No idea.
Okay.
No idea.
It actually, people caught on pretty quick, but I have no idea how it did money-wise.
Interesting.
I mean, the Wikipedia ecosystem i guess you know allows reproduction
of the content right right yeah sure yeah the other day i was searching for something relatively
esoteric i don't recall what it was and the first link i found was wikipedia and then the next like
four or five links were all mirrors of wikipedia and'm like, no, I'm looking for different information.
Yeah.
I think as long as there is a human being selecting those
things, like in this case from
Stack Overflow,
it's still okay. I mean, if
someone checks and says like, oh,
this is a really good answer, maybe
this way it can
reach people who wouldn't check stack
overflow. I don't know.
Sure. Okay, this next
one is an announcement from the
Conan blog that the C++
package manager has hit 1.0.
And there's not really
too much discussion about any of the new
features in Conan with this
post, but they're basically committing
that now they're 1.0,
they're not going to be putting in any new breaking changes,
and they're putting out a big thank you to the community
for helping them with feedback
and allowing them to make breaking changes
while they were in earlier versions.
But now they're committing to stability.
Well, there is a comment here about help
for better cross-platform support
which that's true cross-compilation support that's neat yeah and if you haven't uh look they've got
uh shoot there's a link here that i cannot find at the moment that they are talking about their
package databases that they've got going so they're still working on building up their official set
of known stable good
packages that are maintained by them
and not just random
things.
So it'll be great to see that set
of known good packages
get larger also.
Yeah, and they're also putting out a call
for if anyone is looking
for a job in Madrid,
they're looking to hire more people.
I think they do a combination of C++ and Python for Conan.
I think it's mostly Python, but yeah.
It's certainly people who have to be familiar with C++.
Sure.
Have you made use of any package managers like Conan Balazs? Yeah, actually in one of the
projects in previous company
we used
Conan or let's say
like started using it
so I have
some minimal
experience with it but it's
nothing major
there was another guy
setting up the whole thing
but yeah, it's not as nice as Python nothing major. There was another guy setting up the whole thing.
But yeah, it's... I mean, it's not as nice as Python with everything,
but it's definitely better than nothing.
Because it would be great to have something official, obviously.
Yeah.
Something like pip or gem. yeah well sometimes we hear things
happening in the javascript world and that's a bit scary like so so let's not go that far if we can
yeah i think when we had conan on the show like a year or so ago, we talked about that, the whole, was it the left pad controversy?
Something like that, or left trim.
Yeah, something like that.
And I just read an article from a guy who said that they could easily distribute code
that would collect passwords and whatever names from websites,
even, what is it, credit card, not just the codes,
but the CVCs and stuff like that.
That's scary.
Yeah.
Okay, the next thing we have is,
we talked about Meltdown Specter a lot last week obviously
someone put out a github project where you can run some code and find out if your system is
affected by meltdown yes so this is pretty cool i i did not try it looks like it was built for
linux linux only uh yes it says linux, yeah. Do you have a chance to check it out, Jason?
No, I'm actually, I don't want to know.
Actually, most of my Linux environments that I'm running
are virtual machines that are just for work.
So the possibility of data being collected from one of them
that would be bad is low.
But no, I haven't tested it yet well if you're interested in seeing if your linux boxes uh still needs to
be patched you can definitely check this out okay and then the next last thing we have is uh
all the meeting c++ talks are now online and And we wanted to call this one out. Well, the lightning talks are not all of C++.
All the lightning talks are now out.
This one is from Guy Davidson, who we've had on the show before,
another game developer, actually.
And he put out this secret lightning talk about diversity and inclusion.
And I pretty much agree with everything he's saying here.
It's definitely a very important topic.
We should be trying to expand the pool of C++ developers that we can hire
because the industry is mostly white males.
And it'd be nice if we could have more people out there to hire.
I'm not really sure what the average C++ developer can do about this unless you're a hiring manager
or something.
Do you have any thoughts on this, Jason?
Not directly, because I haven't actually worked in a regular organization for eight years,
and I haven't hired anyone in a very long time.
I will say, the pool of people that I've ever had the chance
to interview was extremely not diverse, but I wasn't even involved in the collection process
of who got an interview in the first place.
Right.
Yeah.
I mean, maybe the HR people should be watching this too.
I don't know.
But I mean, I've definitely been talking to people during an interview but yeah I'm
not responsible for for who walks in that door yeah I don't know do you have any thoughts on
this Blas well I actually had the luck to work with female programmers in multiple companies
in my life so it's it's actually well the ratio of them wasn't very good right so so from a team of 20
people we had one one girl so that's that's not very good and i i i just don't believe that it
has to be like that but but even when i went to uni it was like we had i think a class of like 400 people or like a year you know not another class
and five of us were were women so yeah yeah i believe there are only two women in my computer
science curriculum for yeah for my my career at college yeah yeah so so i think it has to be fixed there first right so if we want to hire more of them
then let's try to have more of them like in in uni and in in high schools and stuff like that
get interested yeah one of the interesting things he pointed out is there's this really neat graph
where he showed uh the percentage of women in various industries, including computer science, medical, I think lawyers as well.
And all of them had a pretty similar trend line up until like the 1980s
where the number of women in those industries was increasing steadily
and at a similar rate.
I think it was like 35% of the industry was female in the 1980s.
And then suddenly suddenly just for computer
science the trend line went down whereas the like medical and lawyers continued going up closer to
50 percent and his belief was that that changed because uh the personal computer came out during
the 80s and it was marketed more towards men and boys. And as we started getting more programming jobs,
the people who were interested in it were more men
because they grew up being more interested in computers.
So I think Sarah Chip's project, like Jewelbots,
will hopefully help change that,
but it might be a generation before we see the results of that type of work.
Right.
Yeah. generation before we see the results of that type of work. Right.
With education, I think if we would follow
like northern European
countries, like in Finland, I think
they have programming
in the regular curriculum.
Yeah, that's correct. It doesn't matter
if you're a boy or a girl, you just learn
programming.
Yeah, makes sense.
Okay, well, Bl blosh let's start talking uh more about some of your experience as a game engine designer i just need to say
right out front i'm a huge huge fan of the witcher 3 uh which you worked on um
it has it's by far the best role-playing game i've ever played, and it's so good that I have trouble playing other RPGs now
because I go back and compare it to The Witcher 3.
So thank you for that.
Thank you.
Thank you for saying that,
and I hope that others who worked on it will hear that as well.
So what other games have you worked on, though?
Well, there were smaller titles.
As I worked in Hungary, I worked on some smaller games. The first game I worked on was called Battle Station Specific,
which is interesting because it's not so popular in the US,
as far as I remember.
It wasn't.
But it's about the Pacific front in the Second World War.
And then I worked on a small game called Skydrift. It was like an Xbox Live Arcade and PSN and Steam
title. It was about racing and fighting with
planes. Kind of like Mario Kart
with planes. Wow.
That sounds like fun. Yeah, it was
fun, definitely.
And then I
moved to Poland, worked
on
Xbox 360 version of The Witcher
2 and then Witcher 3
then early little bits 2, and then Witcher 3,
then early little bits of Cyberpunk,
and then moved to GOG.
That wasn't games, but still game-related.
Worked a little bit there,
and then moved to another company.
I can't really talk about that project,
and now I'm in Techland and I still can't talk about
this one either but
yeah
so that's it
game to be announced eventually
yeah
I'm very curious about the first game you mentioned
that you said was not popular
in the US as far as you know
is a World War 1 game
or World War 2?
World War 2 it was No, World War II.
It was like a
strategy action kind of game.
Recently,
I'm not a big gamer.
The games that I play tend to be
adventure games, not role-playing
games like Witcher, like you're talking about,
Rob. But I recently
read an article in Gamasutra about
how in Eastern europe there are
um and in russia that there's this like subculture of brutally difficult games that just don't really
make their way west as far and i'm just curious if it was like one of those that's like was a
supremely difficult thing no no, not really. It was just
not...
Back then, the publisher
was called Eidos.
Oh, yeah.
Sure.
Tomb Raider is their biggest name
back then, right?
And then they were bought
out by Square Enix.
And they basically shut down the studio,
so there were no more of these Battle Stations games.
But there were two, Midway and Pacific.
Those were the two games from that studio.
I'm also curious about your experience at GOG.
Since I am a big fan of retro stuff, what was the work like there?
So at GOG, I was working on the overlay.
I don't know if you know what an overlay is,
but it's basically in Steam as well
and in many other digital content
distribution systems for games.
There's like you press shift tab or some other key combination
and then there's something appearing on top of the game
that shows you, let's say, achievements, chat window,
news about the game,
whatever, the current time, and stuff like this.
I was working on this for Galaxy, which is the platform of GOG.
Okay. How does that kind of thing work?
How does it hook into the game or whatever to be able to...
Yeah, it's a super interesting topic.
And maybe we will do another chat about that.
Okay.
Because I could talk about that for a while.
But basically, WinAPI is very...
Well, it has some dark corners, let's say.
So you can use that
and hook into another game.
Basically, you can hook into any other process
that runs on Windows.
Okay.
Obviously, the processes of the operating system
are protected in some way,
but yeah, you can hook into them,
and then when they just try to render something,
you just say, hey, after you rendered your own thing,
then maybe you should render this,
and you just basically hook into the rendering of the game.
Interesting.
Yeah, and because of this, I had to support APIs
that I haven't worked with before,
like DirectX 8 from a very long time ago,
especially since GOG has a lot of old games on Galaxy,
so I had to work with such APIs.
Right.
Yeah, it was very interesting.
I always thought it would be fun to work with them
just because of the games they work with,
but I don't live in Poland.
Can you tell us a little bit about
what your experience has been like in general in the games industry?
I know The Witcher, I believe the release date was delayed by a few months.
We've heard from lots of game developers of a kind of intense game dev crunch with deadlines.
Have you had that type of experience or is it a little bit different in Eastern Europe?
No, no, it's the same.
I don't want to really dig deep into this because it's like a hot topic.
And companies might not like people talk about this.
But yeah, definitely this is the stigma of the game development industry,
that there's a load of crunch and people suffer
and there's relationships that get broken up.
Even marriages fail because of...
Yeah, it's not a joke.
It does happen. I've seen it. I've seen it happen.
Yeah.
This, unfortunately, is true. Nowadays,
more and more companies try to
fight
with this.
And
they introduce policies where people
cannot crunch over a certain
amount of time and so on and so on.
That sounds good.
Yeah, my answer is take note.
Yeah, like in the last few years, there has been definitely progress with this.
I keep saying, you know, not that I like read a bunch of studies directly or anything, but
at least links to studies that say that like basically no human is really productive over 35 hours a week.
You think you're being productive, but you're really not.
Your productivity keeps going down the longer you sit in that chair.
Unfortunately, this kind of studies you can show to managers or project managers or producers, as we call them sometimes in the games industry.
And they will even agree with you. They will even say sometimes that, yeah, yeah, we know.
And then still nothing changes. So, yeah. But anyways this is just uh i i heard from people who who
work in um uh movies like effects for movies that it's even worse so not really yeah so i think i
can believe that yeah yeah because because you know they get a contract, the movie is coming out in a year, the movie cannot slip,
there's like 15 scenes that need to be done, and yeah, it's just, there's a hard deadline,
and it's creative work, so there's no knowing beforehand how long it's going to take, right?
Yeah, and delaying the release of a video game, you know, delays the download from Steam or maybe the shipment of boxes
to GameStop or something, but delaying of a movie
would mess up theater schedules literally around the world.
It seems like it's an expense they would not want to pay for.
Yeah, I think it's also the fact that movies are...
Like, the way they schedule them is so precise.
Like, okay, on this day, this movie has to come out.
Yeah, this time.
It's a Christmas release or whatever.
Like, it must be done, yeah.
There's a lot of merchandise.
And with games, it's usually all this other additional stuff
is done when the game is very popular
but with movies
there's deals with cinemas
for the cup that you sell
in the kiosk
or whatever
and you can't back out of that, right?
Right.
This reminds me, I still need to see Star Wars.
Whoa.
Me too.
I've had a busy year.
So one thing we talked about before getting you on the show is that you wanted to clear up some confusions you saw around data-oriented design.
But before we get into that, I thought maybe we could do a quick overview of data-oriented design. Yeah, so before we get into this, I would like to mention that the reason I wanted to talk about this
is that this confusion came from a lot of people who contacted me and they were asking advice.
How should they think about this?
So that's why I thought that this would be a very good way to talk about this
and reach a lot of people. So data-oriented design is basically a different approach to
problems. And it has been popularized in the games industry by Mike acton through multiple presentations actually and the core principle
of data-oriented design is that programs are basically just transforming from data to data
so without knowing the data there's no way to actually make a good application because the sole purpose of this application is to
transform that data. So from this core principle basically it tries to define
how to approach the data and how to approach programming based on that data.
And this is kind of coming from a background where in the PlayStation 3 times this was
super important because back then the design of the PS3, the SPUs, kind of forced people to think super low level.
And they were thinking in ways that are not really object-oriented,
and they got used to this.
And even before that, right?
Like even in the PS2 times, there was a lot of low-level programming.
And because of this, people just got into this different mindset
where they really concentrate on the low level, while object-oriented programming, on the
other hand, is trying to build abstractions on top of that data. And very much in games but also in other applications these abstractions can be
actually limiting sometimes and limiting not necessarily in the ways we we work with the code
but limiting in terms of performance and that's that's the the core of of data-oriented design
that to achieve the best possible performance,
you actually have to think about the data
and how this data is transformed through the whole application.
And then you can reason about this performance
because you know that, okay, you have this much data,
you're processing it this way, you have these loops,
you have these instructions in these loops, and then you have these loops, you have these, like instructions in these loops.
And then you have a certain performance characteristic that you can expect,
because you know the hardware, it's it's very much a different approach to object oriented
programming. So so this is the I don't know if I gave a very good explanation of what it does or how data-oriented design fits in programming,
but this is how it is in my head.
So basically it came out of necessity from the PlayStation days, you're saying?
What I'm saying is that it came out of necessity
not necessarily from the PlayStation days,
but from any days when people had to dig so deep
to gain that one last bit of performance.
Okay.
And it's very good for, obviously,
for embedded systems as well,
where not only the instructions,
but even the size of the data is very important.
I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors.
Backtrace is a debugging platform that improves software quality, reliability, and support
by bringing deep introspection and automation throughout the software error lifecycle.
Spend less time debugging and reduce your mean time to resolution Thank you. code to classify errors and highlight important signals such as heap corruption, malware, and much more.
This data is aggregated and archived in a centralized object store, providing your team
a single system to investigate errors across your environments.
Join industry leaders like Fastly, Message Systems, and AppNexus that use Backtrace to
modernize their debugging infrastructure.
It's free to try, minutes to set up, fully featured with no commitment necessary.
Check them out at backtrace.io.cppcast.
So what are some of the confusions relating to data-oriented design that you wanted to clear up?
Yeah, so people contact me about this every few weeks, and they ask,
basically, I have this system in my head.
I thought about using object-oriented programming because that's what everyone else uses.
But then I read this or saw this video from this conference where these people who use data-oriented design they're saying that
like you shouldn't have virtual functions because that's bad or you shouldn't have
like a base class because that just shows that you don't know your own data and so on and so on
and and they're like super confused because they they don't know what to do with this.
How should they then start making their own code
and how should they approach it?
And I think that the problem comes from the fact
that data-oriented design is very good
when you have a lot of data
and you have some transformation,
small transformation on that data that you want to execute,
and then you have some huge amount of resulting data or even small amount of resulting data
because you're filtering in the middle,
in these cases, data-oriented design can give you a lot
because you will have better cache coherency,
which is another key concept in data-oriented design.
But on the other hand, object-oriented programming is popular for a reason.
And in my opinion, that reason is that people like to think in things
and not in these abstract pipes where you pipe in some data on one end
and it comes out on the other end
somehow different.
People actually like to think when I have an object of a class that actually is a thing.
Even when we talk about it between programmers, we always talk about these like, like, you know, the mesh or the texture or
whatever, when we really just mean like this class or this object. But we really think
about or talk about them as things that exist while they don't. But it's much easier. It's
a much easier conceptual model to think about them as if they would be existing.
And people have this in their mind, and I think this is really good.
This is why object-oriented programming is so popular and so successful,
that the mental model is very close to what we see around us and it's easy to pick up. While data-oriented design
is not easy to pick up and it really is powerful, but everyone should know where to use it, right?
And it adds to the confusion that all the presentations about data oriented design are actually verded in a way
that you you have to do this or you have to do that and it's not it's not really uh verded in a
way like okay when you have this kind of problem then this is the best tool and you should use this
tool but there are all these other problems where this is not the best tool.
So don't try to force this
when this is not the best tool.
So,
I'm sorry, go ahead.
No, no, go on.
I was just curious,
then you're saying a data oriented design
is a tool that should be used
when you're moving lots of data.
But how do we make that determination
between, if you will,
lots of objects and lots of data? but how do we make that determination between, if you will, lots of objects and lots of data?
Well, yeah, like, I mean,
when you have lots of objects or lots of data,
that's basically the same case for us.
When you think about it,
like in games,
and as I said, it was popularized for games mostly,
when you think about the games you think about okay I have this world and it's full of these I
don't know like boxes whatever and you know that okay if I have this like a few
thousand of these then you will have a lot of corresponding data
even if the boxes look exactly the same or whatever you will just have like
at the minimum you will have a vector like a
Not not a C++ vector
Like just like a position a three-dimensional or two-dimensional position somewhere. you will have that stored, right? And from the data-oriented design perspective, the
way you should do this is you will have an array where you store all the
positions and when you have the object that actually is placed on that position,
then it will just have an index into
this array and say, I'm storing my position in this index. So when you want to transform these
objects, let's say you want to move them somewhere, then all you do is take this array, run through
all the objects and say, okay, plus one to all the positions or whatever and then
the cache coherence is awesome obviously and you still have all your objects transformed
but not touching the object itself right while in object-oriented design
when you think about it the the data
should be encapsulated into the object so you should have an object that is
like let's say a box a box class and you have an object of that class and the
position is inside this but it is possible that you have let's say color
you have I don't know like you have some pointers there because you have, let's say, color, you have, I don't know, like you have some pointers there, because you
have some textures on those boxes, and so on and so on. And when you want to just transform these
objects, you just want to move them. Then when you are iterating through these boxes, in the best
case scenario, you still have them in an array. But your cache coherence is already not so good, because you have all these data members. So you can try moving all the relevant data together
inside one class, but you can't move the relevant data together between the objects, right?
So because of this, data-oriented design promotes arrays of objects and not objects of arrays, right?
Or, yes.
Wait, am I confusing that right now?
No, I don't think so.
So, you would have like an array of all the positions, an array of all the colors, an array of all the names, or something like that.
Indeed.
Indeed.
Okay.
Interesting.
Indeed.
And that's very good, right?
That's what our branch predictors, our pre-cachers,
basically our CPUs are built for this.
And this is what data-oriented design
tries to emphasize, that
if you know your data
and you know the hardware that you're trying to run
it on, then you
should think this way, like from
these two sources,
the way you should lay out
your data should be obvious.
The best way, let's say.
Okay. The problem is that some people
try to apply this to everything and then we end up with trying to design let's say a ui system
where where this is not necessarily the best way to do it i mean okay maybe it is but there are many cases where this might not be the best way to do it
or the biggest confusion for people uh so far has been how do i implement an entity component system
on top of data oriented design and this is this is all doable it it definitely requires a lot of thinking. But if someone is like a totally new programmer,
like as in new to the field of games, then probably not the best way to start thinking
about entity component systems. So this is something I definitely wanted to clear up.
So the conceptual model of objects that's easy for us
humans to reason about. Yes. Good for beginners.
Yes. And then when you need to get the performance out of it, you can take a
data-oriented approach when you have many, many objects.
Yeah, I believe that
this is actually what
even those companies have been doing
I mean the companies where the people
work who gave those
presentations they have been
using object oriented
programming or at least
like
something similar maybe they didn't use
like virtual functions or something like that, but they did
compose data this way.
And then when they saw that there is
a way that is better for the performance, then they said,
okay, how can we make this more formal?
How can we collect all the ideas into one and promote it?
I believe this is how it happened and not in a vacuum.
Is there any particular presentation that you think explains the Day-oriented design concepts well? I think
I saw the Mike Acton talk from
CQBCon 2014. Are there any other
particular presentations that you would
recommend watching if you're interested in data-oriented
design? No, I think
that's a very good one.
I've been
watching
a few other presentations from Mike Acton
and he's very, like, he can
explain this very well and much better than I can.
And I think the only problem here is that he tries to give this very good advice to
everyone, but not everyone is ready for it, I think.
You really need to achieve a certain level of understanding
to really appreciate what he's saying.
Okay.
I'm curious.
I mean, we talked about the gains that can be made
and why from data-oriented design,
but it seems like if you needed, for instance, to talk about in your code,
you needed to reason about both the name, the position, and the color, if you will,
of a particular box using the box analogy,
then you pay a cost now because now you're having to index three different arrays instead of having,
now you're giving up cache coherency when you need to talk about all the properties of a thing.
Does that come into play? Is that a consideration when you're doing this kind of programming?
So, yes, it is. But the answer is, how frequently do you do this?
If you do this all the time, accessing all those three things at the same time,
then you should keep them together.
Okay.
Right?
So the answer is always know your data.
If you know your data, then you will know how frequently you are doing this,
and then you can make this decision.
Okay.
Okay.
That makes sense.
The problem is that it makes total sense,
and everyone agrees with it,
but you can't start making your toy game engine or whatever
because the requirement is to know your data
and you don't have any so and and for people who worked in the industry for uh for i don't know
a few years at least they have at least some idea how the data looks like right but but a lot of people contact me and they're
like i was watching this presentation i would love to start out with something how do i do it
and it's like oh i'm sorry but you have to know your data do you find that when you're working
on something like a new game engine,
are you going to understand where you should be applying data-oriented design up front? Or is it something you need to go back retroactively and say,
oh, we're applying this algorithm across all these objects at once.
We should really be using a data-oriented approach as opposed to an object-oriented approach.
In some cases, I think it's very easy to make this decision.
There are some scenarios where you know, like,
for example, effect of particle systems is a very good example for this.
They are...
How should I explain this? Basically, these
are usually visual effects that are made out of smaller, like
pieces that are rendered onto the screen. And there's like
usually hundreds or even 1000s of them. So you know that there
will be a large amount of data that you can run through
and just do that one thing that needs to be done in that one frame. And this is a very
good thing to do there. And even the one I mentioned before, like entity component systems,
this can be a very good example if the game is big enough, right?
When the game is not like, let's say, 100 objects every level, but like 10,000 objects
a level, then this definitely is a good way to do it.
Can you explain what an entity component system is? Okay, that's another very big topic.
But entity component system,
actually the name should be technically
entity component system system
because the system is also part of the,
these are three things in this concept. So
an entity in this concept is... let's say you have a car in your game. That's your entity.
A component in the system is parts of that car. So you can say not necessarily physical parts,
but logical parts.
So let's say a car is in your game,
you can drive your cars.
So a component is something that is something like drivable.
So when you have, let's say, a plane in your game,
then you can add the same component to that entity,
and that becomes drivable as well. Okay. So the third part is the system,
which is basically doing a very similar thing to what I described in the data-oriented design.
It takes all these components and does something with all those components. So usually, this is
just a frame update. Like let's say, the drivable components
need to be processed every frame to figure out exactly which
direction the car is going or something like this. So the
system is the one that does this every frame. Okay.
So these are the three parts of the system. And usually the entity is meant to be like
pure in the sense that it doesn't contain logic. It basically just contains
pointers or IDs or indices
for the components,
depending on how they are stored.
Right?
So the entity doesn't
do anything. You can't really
say, like, I'm moving this
entity. What you're saying is
that the entity has
a transform component and you are doing a
transformation on that component.
This is kind of difficult to reason about in some scenarios, but usually this kind of
reasoning only has to be done by programmers because everything is hidden from the designers or artists. And this is used
in almost every game engine I know.
So it's a super popular thing.
Another thing you mentioned we wanted to talk
about while we were setting up the show was the debuggability of C++
and how you're concerned that with newer features we're setting up the show was the debug ability of C++ and how you're concerned
that with newer features we're adding to the language, how you think it's actually getting
worse?
Yeah, so I had this problem and actually a little bit ties into the previous topic
because what is happening right now is that we are adding very useful abstractions to the language,
but we have this saying that these abstractions are zero cost.
That is mostly true until you try to debug.
In which case, some of these abstractions are super heavy.
Even like a unique pointer can be very, very heavy in debug.
And that is becoming more of a problem because I worked on projects where we were actually
not able to switch to debug because the performance was so bad. Now, I'm not saying that this was because of C++.
I'm saying that we are enhancing this problem by these new features. And this is obviously not
related just to the language features. It's more like the language features and the standard library or any library that is built on that.
So this is becoming a problem where even if you just do some simple tests on Compiler
Explorer to just switch between the release version or the debug
version and see the same
code, it's like
sometimes 10 times more
code and 10 times more instructions
I mean. And that's just
that just means that
in some projects you
have to go and switch
off certain parts of the
project or the game.
I'm familiar with games in this sense,
but you have to just switch off certain parts
to get good enough performance in other parts to be able to debug it.
So I know a couple of years ago,
GCC added a new optimization flag, dash OG,
which gives you optimizations that don't hinder your ability to debug the
application.
Have you had any success with that kind of selectively using optimizations to
still let you debug it?
Yeah,
not on the compiler level,
but,
but what we do sometimes is have like different modules compiled in different
modes. So the module
that I'm debugging is compiled
in debug, but all the other modules
are compiled in release.
Sometimes this is doable.
I was going to say, don't you sometimes
have problems with
ODR violations or
something else propping up?
The most common thing
is the iterator debug level problem
when you have this error.
But yeah, this can be problematic in some cases.
Fortunately, it works in other cases,
especially when you are using DLLs more frequently in the project
and not statically linking,
then this can work quite well.
But yeah, it's a big problem.
And the other thing is that, for example,
kind of my pet peeve,
to be able to read dumps easily.
And this was a sad moment for me
when Lambdas were added
and there's no way to name a Lambda
in the sense that in the call stack
it would show up with a name.
Yes.
And it happens to me every, let's say, few months.
I get a call stack from somewhere,
and something is called from a Lambda.
And when you have a generic system that executes that Lambda,
like a job system or something like that,
then it's just impossible to know what happened.
And this is just a huge problem, and I
don't understand why this doesn't come up
like as
someone should propose something
to solve this, right?
Because, well,
it just hinders
the debugability.
It's interesting. It's definitely
not something I've heard anyone
else talk about, but I can also see where it could be difficult and could cause problems
when you need both the performance and the debug ability.
Yeah, I mean, in this case, in the Lambda case, it's not about performance.
It's just generally a feature introduced where nobody really thought about
how we are going to debug this.
And another thing
I can mention is
I'm super excited
about
all the meta classes
and
I really
think that this will add
good capabilities to the language,
especially for game developers with our custom RTTIs,
because every engine has its own custom RTTI.
But during the presentations,
Herb Sutter was mentioning that this can be debugged easily, right?
Because the generated code can be shown in the debugger.
And I agree, this would be great.
I love the idea.
But then why don't we do this for macros?
Because that's still a huge pain
to debug anything that's in a macro,
like expanded by, you know?
Yeah, right.
Yeah, that's the main reason that they say to not use macros,
because debugging them is nearly impossible.
Yeah, and then if we can do this with metaclasses,
then why can't we do this with macros?
I can honestly say I don't know enough
to say what the issues would be or not.
Yeah, I would love to see this done
and and these these things make me say that we are we are adding features and
and someone should be there on the committee with this big red flag like what about debug ability? Because I personally am spending like 60% of my programming time debugging stuff.
Either my own or mostly not my own.
But that's it. Makes me think of Kate Gregory's Stop Teaching C talk, where she focused on how we should really be teaching new programmers to use the debugger as opposed to using printf-style debugging.
And yeah, we should maybe have someone in the committee
making sure that they're thinking about the debugger
when introducing new features.
It's certainly something, and I know from my experience
working with Ben after our Constexpr All the Things talk
that he submitted just, it wasn't a specific proposal,
but just a paper that, you know,
something that the committee might want to think about,
basically lessons that we've learned
on using constexpr.
You could certainly write up a similar paper
on lessons that you've learned
for debugging new features
and what the committee should maybe be thinking about,
and maybe at least get a few people thinking
and submit that to the
next meeting.
When would the next meeting be? February?
March? Something like that.
Yeah.
Maybe this is a good idea. I think
the same...
Like, for example,
the STL would
be way more popular with game developers
if the debugability would be much more popular with game developers if the debugability
would be much better.
And I think there was a huge
step forward when
Visual Studio started
allowing people to add their custom
debug
visualizations, right?
And that was a great
step forward,
but we definitely need more than that.
Do you think it is something that just even better tooling could help with,
or do you think it's just at the language level
we need more to improve some of these issues?
Well, in some cases, better tooling would be great.
In some cases, like the Lampa case,
I think there should have been a name added.
Even if it's just optional, just find a way to add a name to Lambda. Because obviously,
originally Lambdas were thought to be just this simple thing by passing to a function like into
some function
that iterates on an array and does this
small thing on that array, right?
Right. But nowadays
lambdas are so much more than that.
Yeah, and we're getting even more features
added to them in C++20
also.
I haven't read up on that
but this is really
one part where I think language is responsible
for making this debuggable
Okay, well Balazs, it's been great having you on the show today
Can people find you online anywhere?
Yeah, I have my Twitter handle.
I sent that, but I mean, I think that will show up on the page. Sure, yeah, I'll put that in the show notes.
Sure, so that's the best way.
Okay, well, thank you so much for your time today.
Yeah, thank you.
Cool, thank you.
Thank you, guys.
Thanks so much for listening in as we chat about C++.
I'd love to hear what you think of the podcast.
Please let me know if we're discussing the stuff you're interested in.
Or if you have a suggestion for a topic, I'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
I'd also appreciate if you like CppCast on Facebook and follow CppCast on Twitter.
You can also follow me at Rob W irving and jason at left kiss on
twitter and of course you can find all that info and the show notes on the podcast website