CppCast - Data Oriented Design

Episode Date: January 18, 2018

Rob and Jason are joined by Balázs Török to talk about his work in the Video Game Industry and his thoughts on Data Oriented Design. Balázs Török is a Senior Tech Programmer at Techland.... He has more than 10 years of experience in the games industry. Balázs learned the ropes at Hungarian companies by making smaller titles and then moved to Poland to work on The Witcher series. He was the Lead Engine programmer on The Witcher 3 and now he is working at Techland on another promising project. News Matt Godbolt: Meltdown and Spectre CppCast YouTube Channel Free ebook on C++ Notes for Professionals Conan C/C++ Package Manager hits 1.0 Meltdown checker/PoC written in C++ Guy Davidson - Diversity and Inclusion - Secret Lightning Talks @ Meeting C++ 2017 Balázs Török @m0radin Links CppCon 2014: Mike Acton "Data-Oriented Design and C++" StackOverflow: What is Data Oriented Design? Sponsors Backtrace Embo++ Hosts @robwirving @lefticus

Transcript
Discussion (0)
Starting point is 00:00:00 Episode 134 of CppCast with guest Balazs Tarak, recorded January 17th, 2018. This episode of CppCast is sponsored by Backtrace, the turnkey debugging platform that helps you spend less time debugging and more time building. Get to the root cause quickly with detailed information at your fingertips. Start your free trial at backtrace.io slash cppcast. CppCast is also sponsored by Embo++. The upcoming conference will be held in Bokom, Germany from March 9th to 11th. Meet other embedded systems developers
Starting point is 00:00:30 working on microcontrollers, alternative kernels, and highly customizable zero-cost library designs. Get your ticket today at embo.io. In this episode, we talked about updates to the Conan C++ Package Manager. Then we talked to Blas Turok, game engine developer at Techland. Blas talks to us about data-oriented design and some of the confusion around the concept. Welcome to episode 134 of CppCast, the only podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today?
Starting point is 00:01:40 All right, Rob. How are you doing? I'm doing okay. I'm hoping I don't get snowed into my office right now. And depending on what part of the world you're from, snowed in means about two or three inches of snow. One to three inches in North Carolina can be pretty severe. And it's starting to build up a little bit, so I'm probably going to be heading home once this interview is over. Yeah, to be fair, around here, if it was the first snow of the season, that would have an impact. If it was the third or fourth snow of the season, people mostly wouldn't notice. Yeah.
Starting point is 00:02:13 Yeah. Anyway, at the top of our episode, I'd like to read a piece of feedback. This week, we got a lot of tweets from last week's episode. This one is from Sandeep, and he wrote to Matt Godbolt saying, this is great. I listened to your explanation about this on CPPcast as well. Thanks for the video. And this is in reference to Matt,
Starting point is 00:02:34 who in addition to being on our show last week talking about Meltdown Inspector, he also released a video on YouTube talking about Meltdown Inspector. So if you felt like you were missing anything out of just hearing us talk about it and you want to see some visuals to help go along with the explanation, I
Starting point is 00:02:51 highly recommend going out and looking at Matt's video. Yeah, I don't know how often Matt's planning on releasing videos, but should probably subscribe to his channel just in case. Yeah, and speaking of YouTube channels, we talked, I think on like this last or second to last episode of 2017 about possibly doing YouTube uploads of CB cast videos. And I did
Starting point is 00:03:14 go ahead and start doing that. Um, so I'll put a link to our new YouTube channel. Uh, so far, I think I've put like 30 or 40 of the most recent episodes up, and I'll continue uploading videos from the back catalog and new videos as we release them every week. But it is still just audio content only. We're not going to start doing actual video anytime soon. No, I'm not dressed for that. No, neither am I. But that the interesting uh side effect of giving us at least some level of a transcription for people who care about that right and is that something you just automatically get in youtube because i might need to find out how to dig that out do you have a transcript yes it is automatically there and i believe you can disable it but it is their automatic transcriptions
Starting point is 00:04:03 by default and i know that at least one of our listeners had been looking at it and sent it to me. Okay, great. Well, we'd love to hear your thoughts about the show as well. You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com. And don't forget to leave us a review on iTunes. Joining us today is Balazs Tarak. Balazs is a senior tech programmer at Techland. He has more than 10 years of experience in the games industry.
Starting point is 00:04:27 Balazs learned the ropes at a Hungarian company by making smaller titles and then moved to Poland to work on the Witcher series. He was a lead engine programmer on The Witcher 3 and now is working at Techland on another promising project. Balazs, welcome to the show. Hi. Hey, I'm kind of curious because I know I've talked to a lot of people in Europe about, you know, this movement of people through Europe and how basically a lot of the work
Starting point is 00:04:51 just gets done in English because you might have someone who's Spanish or French or German all working in the same office. Is it similar for you going from Hungary to Poland? Yeah, it's kind of similar. There's still quite a bit done in polish when it's between only polish people but yeah i i have my own bubble of english moving around with me okay well blage we got a couple articles to talk about feel free to comment on any of these and then we'll start talking to you more about your game development work okay cool okay so this first one is a free ebook uh c++ notes for professionals
Starting point is 00:05:31 and this is kind of interesting because it's an ebook that's produced from stack overflow content is that what it is that appears to be what it is yeah they they took a whole bunch of stack overflow c++ posts and condensed it into an e-book and obviously allocated it by different chapters. So I guess there's multiple Stack Overflow posts to produce each chapter. But it seems to cover a lot. Well, yes, I definitely noticed it covered a lot. I didn't realize where the content came from. I was looking at it thinking, I don't think I would have organized this the same way,
Starting point is 00:06:05 but there's a lot of information in there and it covers through C++17 at least. It says at the very bottom the book is compiled from Stack Overflow documentation. The content is written by the beautiful people at Stack Overflow. Wow.
Starting point is 00:06:22 I'm not sure if they've done other books similar to this, but it's a pretty neat idea. Yeah. Well, actually, a few years ago, there were books done a similar way from Wikipedia. Just like someone started a Wikipedia search on a certain page, followed all the links, collected them into a book, and published it.
Starting point is 00:06:46 How well did that work out? No idea. Okay. No idea. It actually, people caught on pretty quick, but I have no idea how it did money-wise. Interesting. I mean, the Wikipedia ecosystem i guess you know allows reproduction of the content right right yeah sure yeah the other day i was searching for something relatively
Starting point is 00:07:13 esoteric i don't recall what it was and the first link i found was wikipedia and then the next like four or five links were all mirrors of wikipedia and'm like, no, I'm looking for different information. Yeah. I think as long as there is a human being selecting those things, like in this case from Stack Overflow, it's still okay. I mean, if someone checks and says like, oh,
Starting point is 00:07:39 this is a really good answer, maybe this way it can reach people who wouldn't check stack overflow. I don't know. Sure. Okay, this next one is an announcement from the Conan blog that the C++ package manager has hit 1.0.
Starting point is 00:07:57 And there's not really too much discussion about any of the new features in Conan with this post, but they're basically committing that now they're 1.0, they're not going to be putting in any new breaking changes, and they're putting out a big thank you to the community for helping them with feedback
Starting point is 00:08:13 and allowing them to make breaking changes while they were in earlier versions. But now they're committing to stability. Well, there is a comment here about help for better cross-platform support which that's true cross-compilation support that's neat yeah and if you haven't uh look they've got uh shoot there's a link here that i cannot find at the moment that they are talking about their package databases that they've got going so they're still working on building up their official set
Starting point is 00:08:46 of known stable good packages that are maintained by them and not just random things. So it'll be great to see that set of known good packages get larger also. Yeah, and they're also putting out a call
Starting point is 00:09:02 for if anyone is looking for a job in Madrid, they're looking to hire more people. I think they do a combination of C++ and Python for Conan. I think it's mostly Python, but yeah. It's certainly people who have to be familiar with C++. Sure. Have you made use of any package managers like Conan Balazs? Yeah, actually in one of the
Starting point is 00:09:26 projects in previous company we used Conan or let's say like started using it so I have some minimal experience with it but it's nothing major
Starting point is 00:09:41 there was another guy setting up the whole thing but yeah, it's not as nice as Python nothing major. There was another guy setting up the whole thing. But yeah, it's... I mean, it's not as nice as Python with everything, but it's definitely better than nothing. Because it would be great to have something official, obviously. Yeah. Something like pip or gem. yeah well sometimes we hear things
Starting point is 00:10:11 happening in the javascript world and that's a bit scary like so so let's not go that far if we can yeah i think when we had conan on the show like a year or so ago, we talked about that, the whole, was it the left pad controversy? Something like that, or left trim. Yeah, something like that. And I just read an article from a guy who said that they could easily distribute code that would collect passwords and whatever names from websites, even, what is it, credit card, not just the codes, but the CVCs and stuff like that.
Starting point is 00:10:54 That's scary. Yeah. Okay, the next thing we have is, we talked about Meltdown Specter a lot last week obviously someone put out a github project where you can run some code and find out if your system is affected by meltdown yes so this is pretty cool i i did not try it looks like it was built for linux linux only uh yes it says linux, yeah. Do you have a chance to check it out, Jason? No, I'm actually, I don't want to know.
Starting point is 00:11:33 Actually, most of my Linux environments that I'm running are virtual machines that are just for work. So the possibility of data being collected from one of them that would be bad is low. But no, I haven't tested it yet well if you're interested in seeing if your linux boxes uh still needs to be patched you can definitely check this out okay and then the next last thing we have is uh all the meeting c++ talks are now online and And we wanted to call this one out. Well, the lightning talks are not all of C++. All the lightning talks are now out.
Starting point is 00:12:09 This one is from Guy Davidson, who we've had on the show before, another game developer, actually. And he put out this secret lightning talk about diversity and inclusion. And I pretty much agree with everything he's saying here. It's definitely a very important topic. We should be trying to expand the pool of C++ developers that we can hire because the industry is mostly white males. And it'd be nice if we could have more people out there to hire.
Starting point is 00:12:41 I'm not really sure what the average C++ developer can do about this unless you're a hiring manager or something. Do you have any thoughts on this, Jason? Not directly, because I haven't actually worked in a regular organization for eight years, and I haven't hired anyone in a very long time. I will say, the pool of people that I've ever had the chance to interview was extremely not diverse, but I wasn't even involved in the collection process of who got an interview in the first place.
Starting point is 00:13:16 Right. Yeah. I mean, maybe the HR people should be watching this too. I don't know. But I mean, I've definitely been talking to people during an interview but yeah I'm not responsible for for who walks in that door yeah I don't know do you have any thoughts on this Blas well I actually had the luck to work with female programmers in multiple companies in my life so it's it's actually well the ratio of them wasn't very good right so so from a team of 20
Starting point is 00:13:49 people we had one one girl so that's that's not very good and i i i just don't believe that it has to be like that but but even when i went to uni it was like we had i think a class of like 400 people or like a year you know not another class and five of us were were women so yeah yeah i believe there are only two women in my computer science curriculum for yeah for my my career at college yeah yeah so so i think it has to be fixed there first right so if we want to hire more of them then let's try to have more of them like in in uni and in in high schools and stuff like that get interested yeah one of the interesting things he pointed out is there's this really neat graph where he showed uh the percentage of women in various industries, including computer science, medical, I think lawyers as well. And all of them had a pretty similar trend line up until like the 1980s
Starting point is 00:14:53 where the number of women in those industries was increasing steadily and at a similar rate. I think it was like 35% of the industry was female in the 1980s. And then suddenly suddenly just for computer science the trend line went down whereas the like medical and lawyers continued going up closer to 50 percent and his belief was that that changed because uh the personal computer came out during the 80s and it was marketed more towards men and boys. And as we started getting more programming jobs, the people who were interested in it were more men
Starting point is 00:15:31 because they grew up being more interested in computers. So I think Sarah Chip's project, like Jewelbots, will hopefully help change that, but it might be a generation before we see the results of that type of work. Right. Yeah. generation before we see the results of that type of work. Right. With education, I think if we would follow like northern European
Starting point is 00:15:52 countries, like in Finland, I think they have programming in the regular curriculum. Yeah, that's correct. It doesn't matter if you're a boy or a girl, you just learn programming. Yeah, makes sense. Okay, well, Bl blosh let's start talking uh more about some of your experience as a game engine designer i just need to say
Starting point is 00:16:13 right out front i'm a huge huge fan of the witcher 3 uh which you worked on um it has it's by far the best role-playing game i've ever played, and it's so good that I have trouble playing other RPGs now because I go back and compare it to The Witcher 3. So thank you for that. Thank you. Thank you for saying that, and I hope that others who worked on it will hear that as well. So what other games have you worked on, though?
Starting point is 00:16:42 Well, there were smaller titles. As I worked in Hungary, I worked on some smaller games. The first game I worked on was called Battle Station Specific, which is interesting because it's not so popular in the US, as far as I remember. It wasn't. But it's about the Pacific front in the Second World War. And then I worked on a small game called Skydrift. It was like an Xbox Live Arcade and PSN and Steam title. It was about racing and fighting with
Starting point is 00:17:26 planes. Kind of like Mario Kart with planes. Wow. That sounds like fun. Yeah, it was fun, definitely. And then I moved to Poland, worked on Xbox 360 version of The Witcher
Starting point is 00:17:42 2 and then Witcher 3 then early little bits 2, and then Witcher 3, then early little bits of Cyberpunk, and then moved to GOG. That wasn't games, but still game-related. Worked a little bit there, and then moved to another company. I can't really talk about that project,
Starting point is 00:18:04 and now I'm in Techland and I still can't talk about this one either but yeah so that's it game to be announced eventually yeah I'm very curious about the first game you mentioned that you said was not popular
Starting point is 00:18:19 in the US as far as you know is a World War 1 game or World War 2? World War 2 it was No, World War II. It was like a strategy action kind of game. Recently, I'm not a big gamer.
Starting point is 00:18:34 The games that I play tend to be adventure games, not role-playing games like Witcher, like you're talking about, Rob. But I recently read an article in Gamasutra about how in Eastern europe there are um and in russia that there's this like subculture of brutally difficult games that just don't really make their way west as far and i'm just curious if it was like one of those that's like was a
Starting point is 00:18:59 supremely difficult thing no no, not really. It was just not... Back then, the publisher was called Eidos. Oh, yeah. Sure. Tomb Raider is their biggest name back then, right?
Starting point is 00:19:19 And then they were bought out by Square Enix. And they basically shut down the studio, so there were no more of these Battle Stations games. But there were two, Midway and Pacific. Those were the two games from that studio. I'm also curious about your experience at GOG. Since I am a big fan of retro stuff, what was the work like there?
Starting point is 00:19:49 So at GOG, I was working on the overlay. I don't know if you know what an overlay is, but it's basically in Steam as well and in many other digital content distribution systems for games. There's like you press shift tab or some other key combination and then there's something appearing on top of the game that shows you, let's say, achievements, chat window,
Starting point is 00:20:23 news about the game, whatever, the current time, and stuff like this. I was working on this for Galaxy, which is the platform of GOG. Okay. How does that kind of thing work? How does it hook into the game or whatever to be able to... Yeah, it's a super interesting topic. And maybe we will do another chat about that. Okay.
Starting point is 00:20:51 Because I could talk about that for a while. But basically, WinAPI is very... Well, it has some dark corners, let's say. So you can use that and hook into another game. Basically, you can hook into any other process that runs on Windows. Okay.
Starting point is 00:21:18 Obviously, the processes of the operating system are protected in some way, but yeah, you can hook into them, and then when they just try to render something, you just say, hey, after you rendered your own thing, then maybe you should render this, and you just basically hook into the rendering of the game. Interesting.
Starting point is 00:21:46 Yeah, and because of this, I had to support APIs that I haven't worked with before, like DirectX 8 from a very long time ago, especially since GOG has a lot of old games on Galaxy, so I had to work with such APIs. Right. Yeah, it was very interesting. I always thought it would be fun to work with them
Starting point is 00:22:15 just because of the games they work with, but I don't live in Poland. Can you tell us a little bit about what your experience has been like in general in the games industry? I know The Witcher, I believe the release date was delayed by a few months. We've heard from lots of game developers of a kind of intense game dev crunch with deadlines. Have you had that type of experience or is it a little bit different in Eastern Europe? No, no, it's the same.
Starting point is 00:22:47 I don't want to really dig deep into this because it's like a hot topic. And companies might not like people talk about this. But yeah, definitely this is the stigma of the game development industry, that there's a load of crunch and people suffer and there's relationships that get broken up. Even marriages fail because of... Yeah, it's not a joke. It does happen. I've seen it. I've seen it happen.
Starting point is 00:23:26 Yeah. This, unfortunately, is true. Nowadays, more and more companies try to fight with this. And they introduce policies where people cannot crunch over a certain
Starting point is 00:23:42 amount of time and so on and so on. That sounds good. Yeah, my answer is take note. Yeah, like in the last few years, there has been definitely progress with this. I keep saying, you know, not that I like read a bunch of studies directly or anything, but at least links to studies that say that like basically no human is really productive over 35 hours a week. You think you're being productive, but you're really not. Your productivity keeps going down the longer you sit in that chair.
Starting point is 00:24:15 Unfortunately, this kind of studies you can show to managers or project managers or producers, as we call them sometimes in the games industry. And they will even agree with you. They will even say sometimes that, yeah, yeah, we know. And then still nothing changes. So, yeah. But anyways this is just uh i i heard from people who who work in um uh movies like effects for movies that it's even worse so not really yeah so i think i can believe that yeah yeah because because you know they get a contract, the movie is coming out in a year, the movie cannot slip, there's like 15 scenes that need to be done, and yeah, it's just, there's a hard deadline, and it's creative work, so there's no knowing beforehand how long it's going to take, right? Yeah, and delaying the release of a video game, you know, delays the download from Steam or maybe the shipment of boxes
Starting point is 00:25:27 to GameStop or something, but delaying of a movie would mess up theater schedules literally around the world. It seems like it's an expense they would not want to pay for. Yeah, I think it's also the fact that movies are... Like, the way they schedule them is so precise. Like, okay, on this day, this movie has to come out. Yeah, this time. It's a Christmas release or whatever.
Starting point is 00:25:55 Like, it must be done, yeah. There's a lot of merchandise. And with games, it's usually all this other additional stuff is done when the game is very popular but with movies there's deals with cinemas for the cup that you sell in the kiosk
Starting point is 00:26:14 or whatever and you can't back out of that, right? Right. This reminds me, I still need to see Star Wars. Whoa. Me too. I've had a busy year. So one thing we talked about before getting you on the show is that you wanted to clear up some confusions you saw around data-oriented design.
Starting point is 00:26:38 But before we get into that, I thought maybe we could do a quick overview of data-oriented design. Yeah, so before we get into this, I would like to mention that the reason I wanted to talk about this is that this confusion came from a lot of people who contacted me and they were asking advice. How should they think about this? So that's why I thought that this would be a very good way to talk about this and reach a lot of people. So data-oriented design is basically a different approach to problems. And it has been popularized in the games industry by Mike acton through multiple presentations actually and the core principle of data-oriented design is that programs are basically just transforming from data to data so without knowing the data there's no way to actually make a good application because the sole purpose of this application is to
Starting point is 00:27:45 transform that data. So from this core principle basically it tries to define how to approach the data and how to approach programming based on that data. And this is kind of coming from a background where in the PlayStation 3 times this was super important because back then the design of the PS3, the SPUs, kind of forced people to think super low level. And they were thinking in ways that are not really object-oriented, and they got used to this. And even before that, right? Like even in the PS2 times, there was a lot of low-level programming.
Starting point is 00:28:42 And because of this, people just got into this different mindset where they really concentrate on the low level, while object-oriented programming, on the other hand, is trying to build abstractions on top of that data. And very much in games but also in other applications these abstractions can be actually limiting sometimes and limiting not necessarily in the ways we we work with the code but limiting in terms of performance and that's that's the the core of of data-oriented design that to achieve the best possible performance, you actually have to think about the data and how this data is transformed through the whole application.
Starting point is 00:29:33 And then you can reason about this performance because you know that, okay, you have this much data, you're processing it this way, you have these loops, you have these instructions in these loops, and then you have these loops, you have these, like instructions in these loops. And then you have a certain performance characteristic that you can expect, because you know the hardware, it's it's very much a different approach to object oriented programming. So so this is the I don't know if I gave a very good explanation of what it does or how data-oriented design fits in programming, but this is how it is in my head.
Starting point is 00:30:12 So basically it came out of necessity from the PlayStation days, you're saying? What I'm saying is that it came out of necessity not necessarily from the PlayStation days, but from any days when people had to dig so deep to gain that one last bit of performance. Okay. And it's very good for, obviously, for embedded systems as well,
Starting point is 00:30:40 where not only the instructions, but even the size of the data is very important. I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors. Backtrace is a debugging platform that improves software quality, reliability, and support by bringing deep introspection and automation throughout the software error lifecycle. Spend less time debugging and reduce your mean time to resolution Thank you. code to classify errors and highlight important signals such as heap corruption, malware, and much more. This data is aggregated and archived in a centralized object store, providing your team a single system to investigate errors across your environments.
Starting point is 00:31:33 Join industry leaders like Fastly, Message Systems, and AppNexus that use Backtrace to modernize their debugging infrastructure. It's free to try, minutes to set up, fully featured with no commitment necessary. Check them out at backtrace.io.cppcast. So what are some of the confusions relating to data-oriented design that you wanted to clear up? Yeah, so people contact me about this every few weeks, and they ask, basically, I have this system in my head. I thought about using object-oriented programming because that's what everyone else uses.
Starting point is 00:32:17 But then I read this or saw this video from this conference where these people who use data-oriented design they're saying that like you shouldn't have virtual functions because that's bad or you shouldn't have like a base class because that just shows that you don't know your own data and so on and so on and and they're like super confused because they they don't know what to do with this. How should they then start making their own code and how should they approach it? And I think that the problem comes from the fact that data-oriented design is very good
Starting point is 00:32:57 when you have a lot of data and you have some transformation, small transformation on that data that you want to execute, and then you have some huge amount of resulting data or even small amount of resulting data because you're filtering in the middle, in these cases, data-oriented design can give you a lot because you will have better cache coherency, which is another key concept in data-oriented design.
Starting point is 00:33:23 But on the other hand, object-oriented programming is popular for a reason. And in my opinion, that reason is that people like to think in things and not in these abstract pipes where you pipe in some data on one end and it comes out on the other end somehow different. People actually like to think when I have an object of a class that actually is a thing. Even when we talk about it between programmers, we always talk about these like, like, you know, the mesh or the texture or whatever, when we really just mean like this class or this object. But we really think
Starting point is 00:34:14 about or talk about them as things that exist while they don't. But it's much easier. It's a much easier conceptual model to think about them as if they would be existing. And people have this in their mind, and I think this is really good. This is why object-oriented programming is so popular and so successful, that the mental model is very close to what we see around us and it's easy to pick up. While data-oriented design is not easy to pick up and it really is powerful, but everyone should know where to use it, right? And it adds to the confusion that all the presentations about data oriented design are actually verded in a way that you you have to do this or you have to do that and it's not it's not really uh verded in a
Starting point is 00:35:14 way like okay when you have this kind of problem then this is the best tool and you should use this tool but there are all these other problems where this is not the best tool. So don't try to force this when this is not the best tool. So, I'm sorry, go ahead. No, no, go on. I was just curious,
Starting point is 00:35:36 then you're saying a data oriented design is a tool that should be used when you're moving lots of data. But how do we make that determination between, if you will, lots of objects and lots of data? but how do we make that determination between, if you will, lots of objects and lots of data? Well, yeah, like, I mean, when you have lots of objects or lots of data,
Starting point is 00:35:52 that's basically the same case for us. When you think about it, like in games, and as I said, it was popularized for games mostly, when you think about the games you think about okay I have this world and it's full of these I don't know like boxes whatever and you know that okay if I have this like a few thousand of these then you will have a lot of corresponding data even if the boxes look exactly the same or whatever you will just have like
Starting point is 00:36:31 at the minimum you will have a vector like a Not not a C++ vector Like just like a position a three-dimensional or two-dimensional position somewhere. you will have that stored, right? And from the data-oriented design perspective, the way you should do this is you will have an array where you store all the positions and when you have the object that actually is placed on that position, then it will just have an index into this array and say, I'm storing my position in this index. So when you want to transform these objects, let's say you want to move them somewhere, then all you do is take this array, run through
Starting point is 00:37:20 all the objects and say, okay, plus one to all the positions or whatever and then the cache coherence is awesome obviously and you still have all your objects transformed but not touching the object itself right while in object-oriented design when you think about it the the data should be encapsulated into the object so you should have an object that is like let's say a box a box class and you have an object of that class and the position is inside this but it is possible that you have let's say color you have I don't know like you have some pointers there because you have, let's say, color, you have, I don't know, like you have some pointers there, because you
Starting point is 00:38:07 have some textures on those boxes, and so on and so on. And when you want to just transform these objects, you just want to move them. Then when you are iterating through these boxes, in the best case scenario, you still have them in an array. But your cache coherence is already not so good, because you have all these data members. So you can try moving all the relevant data together inside one class, but you can't move the relevant data together between the objects, right? So because of this, data-oriented design promotes arrays of objects and not objects of arrays, right? Or, yes. Wait, am I confusing that right now? No, I don't think so.
Starting point is 00:38:58 So, you would have like an array of all the positions, an array of all the colors, an array of all the names, or something like that. Indeed. Indeed. Okay. Interesting. Indeed. And that's very good, right? That's what our branch predictors, our pre-cachers,
Starting point is 00:39:19 basically our CPUs are built for this. And this is what data-oriented design tries to emphasize, that if you know your data and you know the hardware that you're trying to run it on, then you should think this way, like from these two sources,
Starting point is 00:39:38 the way you should lay out your data should be obvious. The best way, let's say. Okay. The problem is that some people try to apply this to everything and then we end up with trying to design let's say a ui system where where this is not necessarily the best way to do it i mean okay maybe it is but there are many cases where this might not be the best way to do it or the biggest confusion for people uh so far has been how do i implement an entity component system on top of data oriented design and this is this is all doable it it definitely requires a lot of thinking. But if someone is like a totally new programmer,
Starting point is 00:40:28 like as in new to the field of games, then probably not the best way to start thinking about entity component systems. So this is something I definitely wanted to clear up. So the conceptual model of objects that's easy for us humans to reason about. Yes. Good for beginners. Yes. And then when you need to get the performance out of it, you can take a data-oriented approach when you have many, many objects. Yeah, I believe that this is actually what
Starting point is 00:41:07 even those companies have been doing I mean the companies where the people work who gave those presentations they have been using object oriented programming or at least like something similar maybe they didn't use
Starting point is 00:41:24 like virtual functions or something like that, but they did compose data this way. And then when they saw that there is a way that is better for the performance, then they said, okay, how can we make this more formal? How can we collect all the ideas into one and promote it? I believe this is how it happened and not in a vacuum. Is there any particular presentation that you think explains the Day-oriented design concepts well? I think
Starting point is 00:42:06 I saw the Mike Acton talk from CQBCon 2014. Are there any other particular presentations that you would recommend watching if you're interested in data-oriented design? No, I think that's a very good one. I've been watching
Starting point is 00:42:22 a few other presentations from Mike Acton and he's very, like, he can explain this very well and much better than I can. And I think the only problem here is that he tries to give this very good advice to everyone, but not everyone is ready for it, I think. You really need to achieve a certain level of understanding to really appreciate what he's saying. Okay.
Starting point is 00:42:58 I'm curious. I mean, we talked about the gains that can be made and why from data-oriented design, but it seems like if you needed, for instance, to talk about in your code, you needed to reason about both the name, the position, and the color, if you will, of a particular box using the box analogy, then you pay a cost now because now you're having to index three different arrays instead of having, now you're giving up cache coherency when you need to talk about all the properties of a thing.
Starting point is 00:43:32 Does that come into play? Is that a consideration when you're doing this kind of programming? So, yes, it is. But the answer is, how frequently do you do this? If you do this all the time, accessing all those three things at the same time, then you should keep them together. Okay. Right? So the answer is always know your data. If you know your data, then you will know how frequently you are doing this,
Starting point is 00:44:08 and then you can make this decision. Okay. Okay. That makes sense. The problem is that it makes total sense, and everyone agrees with it, but you can't start making your toy game engine or whatever because the requirement is to know your data
Starting point is 00:44:28 and you don't have any so and and for people who worked in the industry for uh for i don't know a few years at least they have at least some idea how the data looks like right but but a lot of people contact me and they're like i was watching this presentation i would love to start out with something how do i do it and it's like oh i'm sorry but you have to know your data do you find that when you're working on something like a new game engine, are you going to understand where you should be applying data-oriented design up front? Or is it something you need to go back retroactively and say, oh, we're applying this algorithm across all these objects at once. We should really be using a data-oriented approach as opposed to an object-oriented approach.
Starting point is 00:45:22 In some cases, I think it's very easy to make this decision. There are some scenarios where you know, like, for example, effect of particle systems is a very good example for this. They are... How should I explain this? Basically, these are usually visual effects that are made out of smaller, like pieces that are rendered onto the screen. And there's like usually hundreds or even 1000s of them. So you know that there
Starting point is 00:46:02 will be a large amount of data that you can run through and just do that one thing that needs to be done in that one frame. And this is a very good thing to do there. And even the one I mentioned before, like entity component systems, this can be a very good example if the game is big enough, right? When the game is not like, let's say, 100 objects every level, but like 10,000 objects a level, then this definitely is a good way to do it. Can you explain what an entity component system is? Okay, that's another very big topic. But entity component system,
Starting point is 00:46:52 actually the name should be technically entity component system system because the system is also part of the, these are three things in this concept. So an entity in this concept is... let's say you have a car in your game. That's your entity. A component in the system is parts of that car. So you can say not necessarily physical parts, but logical parts. So let's say a car is in your game,
Starting point is 00:47:31 you can drive your cars. So a component is something that is something like drivable. So when you have, let's say, a plane in your game, then you can add the same component to that entity, and that becomes drivable as well. Okay. So the third part is the system, which is basically doing a very similar thing to what I described in the data-oriented design. It takes all these components and does something with all those components. So usually, this is just a frame update. Like let's say, the drivable components
Starting point is 00:48:14 need to be processed every frame to figure out exactly which direction the car is going or something like this. So the system is the one that does this every frame. Okay. So these are the three parts of the system. And usually the entity is meant to be like pure in the sense that it doesn't contain logic. It basically just contains pointers or IDs or indices for the components, depending on how they are stored.
Starting point is 00:48:51 Right? So the entity doesn't do anything. You can't really say, like, I'm moving this entity. What you're saying is that the entity has a transform component and you are doing a transformation on that component.
Starting point is 00:49:10 This is kind of difficult to reason about in some scenarios, but usually this kind of reasoning only has to be done by programmers because everything is hidden from the designers or artists. And this is used in almost every game engine I know. So it's a super popular thing. Another thing you mentioned we wanted to talk about while we were setting up the show was the debuggability of C++ and how you're concerned that with newer features we're setting up the show was the debug ability of C++ and how you're concerned that with newer features we're adding to the language, how you think it's actually getting
Starting point is 00:49:49 worse? Yeah, so I had this problem and actually a little bit ties into the previous topic because what is happening right now is that we are adding very useful abstractions to the language, but we have this saying that these abstractions are zero cost. That is mostly true until you try to debug. In which case, some of these abstractions are super heavy. Even like a unique pointer can be very, very heavy in debug. And that is becoming more of a problem because I worked on projects where we were actually
Starting point is 00:50:37 not able to switch to debug because the performance was so bad. Now, I'm not saying that this was because of C++. I'm saying that we are enhancing this problem by these new features. And this is obviously not related just to the language features. It's more like the language features and the standard library or any library that is built on that. So this is becoming a problem where even if you just do some simple tests on Compiler Explorer to just switch between the release version or the debug version and see the same code, it's like sometimes 10 times more
Starting point is 00:51:31 code and 10 times more instructions I mean. And that's just that just means that in some projects you have to go and switch off certain parts of the project or the game. I'm familiar with games in this sense,
Starting point is 00:51:47 but you have to just switch off certain parts to get good enough performance in other parts to be able to debug it. So I know a couple of years ago, GCC added a new optimization flag, dash OG, which gives you optimizations that don't hinder your ability to debug the application. Have you had any success with that kind of selectively using optimizations to still let you debug it?
Starting point is 00:52:15 Yeah, not on the compiler level, but, but what we do sometimes is have like different modules compiled in different modes. So the module that I'm debugging is compiled in debug, but all the other modules are compiled in release.
Starting point is 00:52:32 Sometimes this is doable. I was going to say, don't you sometimes have problems with ODR violations or something else propping up? The most common thing is the iterator debug level problem when you have this error.
Starting point is 00:52:51 But yeah, this can be problematic in some cases. Fortunately, it works in other cases, especially when you are using DLLs more frequently in the project and not statically linking, then this can work quite well. But yeah, it's a big problem. And the other thing is that, for example, kind of my pet peeve,
Starting point is 00:53:20 to be able to read dumps easily. And this was a sad moment for me when Lambdas were added and there's no way to name a Lambda in the sense that in the call stack it would show up with a name. Yes. And it happens to me every, let's say, few months.
Starting point is 00:53:45 I get a call stack from somewhere, and something is called from a Lambda. And when you have a generic system that executes that Lambda, like a job system or something like that, then it's just impossible to know what happened. And this is just a huge problem, and I don't understand why this doesn't come up like as
Starting point is 00:54:10 someone should propose something to solve this, right? Because, well, it just hinders the debugability. It's interesting. It's definitely not something I've heard anyone else talk about, but I can also see where it could be difficult and could cause problems
Starting point is 00:54:30 when you need both the performance and the debug ability. Yeah, I mean, in this case, in the Lambda case, it's not about performance. It's just generally a feature introduced where nobody really thought about how we are going to debug this. And another thing I can mention is I'm super excited about
Starting point is 00:54:55 all the meta classes and I really think that this will add good capabilities to the language, especially for game developers with our custom RTTIs, because every engine has its own custom RTTI. But during the presentations,
Starting point is 00:55:21 Herb Sutter was mentioning that this can be debugged easily, right? Because the generated code can be shown in the debugger. And I agree, this would be great. I love the idea. But then why don't we do this for macros? Because that's still a huge pain to debug anything that's in a macro, like expanded by, you know?
Starting point is 00:55:45 Yeah, right. Yeah, that's the main reason that they say to not use macros, because debugging them is nearly impossible. Yeah, and then if we can do this with metaclasses, then why can't we do this with macros? I can honestly say I don't know enough to say what the issues would be or not. Yeah, I would love to see this done
Starting point is 00:56:14 and and these these things make me say that we are we are adding features and and someone should be there on the committee with this big red flag like what about debug ability? Because I personally am spending like 60% of my programming time debugging stuff. Either my own or mostly not my own. But that's it. Makes me think of Kate Gregory's Stop Teaching C talk, where she focused on how we should really be teaching new programmers to use the debugger as opposed to using printf-style debugging. And yeah, we should maybe have someone in the committee making sure that they're thinking about the debugger when introducing new features. It's certainly something, and I know from my experience
Starting point is 00:56:58 working with Ben after our Constexpr All the Things talk that he submitted just, it wasn't a specific proposal, but just a paper that, you know, something that the committee might want to think about, basically lessons that we've learned on using constexpr. You could certainly write up a similar paper on lessons that you've learned
Starting point is 00:57:18 for debugging new features and what the committee should maybe be thinking about, and maybe at least get a few people thinking and submit that to the next meeting. When would the next meeting be? February? March? Something like that. Yeah.
Starting point is 00:57:33 Maybe this is a good idea. I think the same... Like, for example, the STL would be way more popular with game developers if the debugability would be much more popular with game developers if the debugability would be much better. And I think there was a huge
Starting point is 00:57:49 step forward when Visual Studio started allowing people to add their custom debug visualizations, right? And that was a great step forward, but we definitely need more than that.
Starting point is 00:58:06 Do you think it is something that just even better tooling could help with, or do you think it's just at the language level we need more to improve some of these issues? Well, in some cases, better tooling would be great. In some cases, like the Lampa case, I think there should have been a name added. Even if it's just optional, just find a way to add a name to Lambda. Because obviously, originally Lambdas were thought to be just this simple thing by passing to a function like into
Starting point is 00:58:45 some function that iterates on an array and does this small thing on that array, right? Right. But nowadays lambdas are so much more than that. Yeah, and we're getting even more features added to them in C++20 also.
Starting point is 00:59:02 I haven't read up on that but this is really one part where I think language is responsible for making this debuggable Okay, well Balazs, it's been great having you on the show today Can people find you online anywhere? Yeah, I have my Twitter handle. I sent that, but I mean, I think that will show up on the page. Sure, yeah, I'll put that in the show notes.
Starting point is 00:59:33 Sure, so that's the best way. Okay, well, thank you so much for your time today. Yeah, thank you. Cool, thank you. Thank you, guys. Thanks so much for listening in as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in.
Starting point is 00:59:50 Or if you have a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W irving and jason at left kiss on twitter and of course you can find all that info and the show notes on the podcast website

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.