CppCast - Soagen

Starting point is 00:00:00 Episode 367 of CppCast with guest Mark Gillard, recorded 9th of August 2023. This episode is sponsored by the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about several new library releases and about how standard containers are implemented in the Microsoft Standard Library. Then we are joined by Mark Gillert. Mark talks to us about his library Sogen, a structure operation generator for C++. Welcome to episode 367 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Timo Dummler.

Starting point is 00:01:09 Joined by my co-host for today, Jason Turner. Jason, how are you doing today? I'm excited today, Timo, actually. Thank you for inviting me back on as a guest co-host. It's been a little while. How many episodes have you all done without me here, basically? 17, 18, depending on whether we count the one that we did together. Right, right. Yeah, quite a lot. Yeah. It's been fun so far. So everything's going well? Did I leave the podcast in good hands?

Starting point is 00:01:42 Yes, yes. It's been great fun. And yeah, thank you so much. As you can tell, Phil is still on vacation. So I'm very, very excited to have you here. You're back on the show for the first time in over a year, aren't you? If you don't count the episode, the special Christmas one that we did together. Yeah, definitely over a year. Yeah, if you don't count the Christmas episode. So how have you been since then? What are you up to these days? Well, not a whole lot has changed for me professionally anyhow i'm still doing training i do have a c++ best practices workshop coming up at ndc tech town in the end of september i'm assuming this will air before then right oh yeah this will air next week on friday oh okay then yes definitely and

Starting point is 00:02:27 i'm planning to do also a best practices workshop and the post-conference cbp con so there's that and i'm also sitting here like a fool staring at my youtube subscriber count because i'm currently at 99 807 subscribers all right just want to see that 100 tick over. That's amazing. So if you haven't yet subscribed to Jason's YouTube channel, please do so now to help him get the six digit in there. I'm hoping to hit that by the end of the week. That's my plan right now. So what about you? You've had some career things going on lately, right? also write a book about C++, which I wanted to do for a long time. But that's not really compatible with being full-time employed at a big tech company, unless you want to work weekends

Starting point is 00:03:30 and evenings, which is not something I'm a particular fan of. So I kind of had that in my backlog, but always wanted to do that. And so now I finally will have time to pursue that. So I'm very excited about that. Do you have a title or a plan, something to get the listeners excited about? So tentative title is low latency C++. You might have seen my three hour talk at C++ Now this year, which had the same title where I was talking about all of the different techniques that you can use if you're into audio or finance or video games or any of those kind of fields where you're optimizing for latency rather than just general performance like where it really matters how many milliseconds or microseconds a particular piece of code runs and it shouldn't be over a certain deadline it should be as fast as possible so you have this hyper focus on kind of latency and that leads to

Starting point is 00:04:23 particular ways of writing c++ that have some overlap with kind of general performance optimization but sometimes you also take very different approaches and so i want to just provide an overview over all these techniques that you you might want to use in those industries so it's kind of a summary of like a bunch of talks that i've given over the last five six seven, seven years, and also kind of other material that I have kind of researched in parallel. So I just want to, I've been asked by people, it was kind of fun when I first did this one hour version of this low latency overview talk, people were like, yeah, this is really exciting, but like one hour is nowhere near enough to like give a proper overview.

Starting point is 00:04:59 So can you do a longer one? So I did a three hour version at C++ now. And then after the three hour version, people were like well but that's nowhere near enough time to actually explain how this will work so can you like a one-day workshop and then i was like yeah but okay let me just write it up right so so that's what i want to do um and yeah that's kind of going to be another one of my site projects for the next few months are you working with a publisher or self-publishing? So I haven't decided that yet. I had actually a couple of publishers who kind of approached me and said they were interested to do something together, but I haven't yet decided if I go for that or if I do self-publishing or how I'm going to approach this.

Starting point is 00:05:40 I want to first get like a kind of table of contents and this is what's going to be in this book and have like a really good idea of that in my head. And then think about how I'm going to make that happen and exactly with whom. And yeah, that sounds like a pretty good approach to me having only self-published, but I have worked with publishers on other, good. It makes sense. Yeah. To, uh, to get as much done as you can before you talk to a publisher and then decide if you want to work with a publisher, if you want to go the self publishing route and just as an aside to listeners

Starting point is 00:06:14 who might be thinking about the same kind of thing, there's nothing stopping you from self publishing and then selling your book to a publisher. That is a thing. I actually had someone offer to buy my book and I said, no, I'm good. Thank you very much. Bye. That is interesting. I did not know that was a thing. Thank you, Jason. You own the copyright to your book, so no one can stop you from selling it. Right. Well, so at the top of every episode, I'd like to read a piece of feedback, but actually this week, I didn't receive any feedback, neither by email nor by twitter nor by a mastodon

Starting point is 00:06:46 nor on reddit um so i actually don't have any new feedback uh this week but um if you have any please uh let us know you'd like to hear thoughts about the show you can always reach out to us on mastodon or on twitter or is it no officially called x i'm'm not sure. I saw someone write T-X-I-T-T-E-R. All right. Or you can email us, and that's definitely going to still work. Email us at feedback at cppcast.com. Joining us today is Mark Gillert.

Starting point is 00:07:17 Mark is a soft body physics engine developer and low-level tooling guy at Ausgenic, a surgical training company based in Helsinki, Finland. Prior to his current role, he was the chief architect of an internal graphics engine used by the company in prototypes during their startup phase. Before coming to Finland, Mark was a teacher, researcher, and consultant

Starting point is 00:07:38 at Flinders University in South Australia, working with haptic controllers to find novel ways of modeling and teaching different surgical interactions. Mark first learned to code as a teenager, making mods in Unreal Script for Unreal Tournament 2004. And these days, almost all of his work is C++. Mark, welcome to the show.

Starting point is 00:07:58 Hi, thanks for having me. Mark, I feel like your bio is backward because so many people say they got into programming because they wanted to do gaming stuff. It sounds like make tools. I want to make things that go in games. So, you know, the path I've taken is, is, uh, and satisfies that quite nicely. I can totally see that. I can feel that. I mean, I'm curious though, you, you just like casually throw soft body physics into your, into your bio, um, versus hard body physics, rigid body. Like what's, what's the Like what's the deal here?

Starting point is 00:08:46 So it's a platform for simulating the interactions between different tissues, right? So it's not just rigid body physics as you might have in, say, NVIDIA's physics where you've got spheres and cubes and various things. It's more about pretty much entirely focusing on the interactions between the soft tissues themselves. Okay. Yeah. And how we might drive, say, we have these three-dimensional

Starting point is 00:09:14 haptics controllers that we use to act as a proxy for, say, a scalpel or a drill or something, and that's capable of rendering some force feedback. And so from the physics simulation, the interactions between the tool and the soft tissue, we can pull forces out of that and have the tool render some force as it would in real life if you passed an instrument through some flesh during surgery. So what does this practically look like, soft body physics modeling?

Starting point is 00:09:42 Is it like a bunch of particles connected by springs or am i like overthinking this no that's that's essentially the the bare bones description of what it is it's it's not a mass spring system but it shares like similarities with that it's a it's uh and it's i'm of the people that work on the project i'm not the physicist so i don't want to get too much into the specifics because I'm going to, I'm going to fudge the description, but I would say it would be fair for me to describe it as being a mass spring system on Uber steroids.

Starting point is 00:10:16 Okay. And that's about as technical as I can be on the physics side of it because the, you know, the research and the, the, the physical principles that go into making it work aren't really my area of expertise. I wrap it up in a software engineering framework and make it fast. That's sort of where

Starting point is 00:10:31 I live. Interesting. Right. So, Mark, we'll get more into your work in just a few minutes. But before we do that, we have a couple of news articles to talk about. So, if you have a feature comment on any of these, okay?? So the first one I have this time is a series, actually, of blog posts, a whole series of blog posts called Inside STL by Raymond Chen that came out last week. And probably there's going to be more coming out about how the containers in the Microsoft Standard Library are implemented under the hood.

Starting point is 00:11:01 So I thought that was really interesting. There was one about the vector. It's called Inside STL, the hood. So I thought that was really interesting. There was one about the vector. It's called Inside STL, the vector. That was cool because I always thought that vectors, like std vector is implemented as like three members, like pointer size and capacity. But it actually turns out that the Microsoft version is implemented with three pointers,

Starting point is 00:11:18 first, last, and end. So Raymond talks about that. Then he has a blog post about the string, the Microsoft string. There's another post about the string, the Microsoft string. There's another one about the pair, the lists, like the maps. And there's like a blog post for each one of those. So if you want to dig into Microsoft STL implementation

Starting point is 00:11:36 and see how things are done there, I think that's a really cool kind of series of blog posts that caught my attention. Well, and the string article also does a comparison on how the other two standard libraries implement their small string optimization. So for anyone who's curious about what the heck is small string or short string optimization, how does it work? This is a super succinct overview of that because this is, well, Raymond's article is what he publishes

Starting point is 00:12:01 one literally every weekday, right? So they're, they, they're never very, very long. So it's all compact in here because we, we interviewed him back in the day. Uh, CBP cast. Oh, that's interesting. I haven't, I haven't listened to that one yet. So I actually started listening to all the CBP cast episodes all the way from like the very beginning when rob were just doing them on his own and then like later and later but i haven't caught up to this

Starting point is 00:12:30 one yet so yeah so i think this is if we look at i'm derailing the conversation now but i think if we looked at this post number it might actually be literally be post 108 532 or something ridiculous that's a lot of blog posts maybe i'm wrong but it's a lot i think it's every weekday that he publishes one right so there's another blog post that i also found really interesting this week from twist and brindle whom we had on the show a couple episodes ago he talked about his flux library kind of an alternative way to do like iterators and ranges, which is kind of really cool. And he updated his library to support C++20 modules and he wrote a blog post about it.

Starting point is 00:13:11 And that's really interesting, not just because modules are great and because the Flux library is great, but also because the blog post actually explains how it all works, right? So he shows, first of all, how to compile his library using modules on all the three major compilers like clang 16 gcc 13 msvc 17.6 like what compiler flags you need to compile

Starting point is 00:13:33 with modules all of that stuff uh he also talks about how you can try it out using cmake uh he mentions that cmake has this new built-in module support, but he actually doesn't use it. He uses Viktor Zverevich's modules.cmake thing for that. He explains how that works. And then also he talks about how to modularize a library. So you can actually apply that to your own library. If you want to make your library compatible with C++20 modules, he kind of goes through that as well.

Starting point is 00:13:59 So I thought that was like a really, really cool and comprehensive blog post for people who are interested in actually using modules in practice. So I know we're going to get into Mark's library a little bit later on, but I'm curious if you looked at modules at all yet so far for the library you've been working on, Mark? No, I admittedly, modules conceptually have been a bit of a black hole for me. I just, other things keep coming up when I've set aside time to learn about them. So no, I haven't. Yeah. Well, I mean, I'm in the same boat because i'm waiting for the tooling story to be complete yes so that i can just use them not have to figure out how to use them but anyhow

Starting point is 00:14:35 yeah so speaking of cmake someone actually made a branch of cmake that supports python scripting in addition to regular CMake scripting. There's a GitHub repository. It's called PyCmake. There's a delightful Reddit discussion about whether that's a good idea or not. Yeah, I thought that was another interesting project that I wanted to mention that surfaced this week.

Starting point is 00:15:01 Let's do a show of hands. Who thinks that Python and CMake is a good idea? Mark, Timur, either one of you think it's a good idea? I won't raise my hand, but I have a bit of a non-straightforward answer. I think replacing CMake's DSL with literally anything else is is a goal worth pursuing i don't think replacing it with a turing complete like full programming language is is the right way to do it so one of the former projects i was working on we had uh cmake augmented with lua scripts really uh which was kind of really cool for that particular use case uh you could do cool things that are a pain to do

Starting point is 00:15:46 in cmake um but whether you really should have to do these things probably my answer would be no well every single cmake best practices talk in the last five years has been stick with a declarative style in your cmake don't have a lot of ifs and branching and stuff. And so I'm leaning more towards the side of making it too easy to program her CMake. Might not be a good idea. But I've also been in situations where I could have used that.

Starting point is 00:16:16 Yeah, and there's also situations like Mark's library that we're going to get into later where we actually have to generate code at certain points. And you have to somehow integrate that into your CMake. Before we get to that, I want to mention one more library that also popped up this time around. So lots of interesting new libraries recently.

Starting point is 00:16:40 This library is called CppTrace. And it's a lightweight stack trace library that we can use while we're waiting for C++23 header stack trace to actually be universally available. So currently, I think the Microsoft compiler actually is the only one that has a full implementation of C++23 stack trace. I think GCC has kind of a partial one.

Starting point is 00:17:01 Clang is lagging behind. They don't have anything at the moment. If you want to use that stuff cross-platform today, it seems like this is a new library that you can use instead. Cool. And yeah, the last newsworthy library that I want to mention on this episode is a new library called Sogen,

Starting point is 00:17:21 a structure of arrays generator for C++, and that's Mark's library. As it so happens, the author of that library is our guest for today. So hello again, Mark. Greetings. So first of all, how do I pronounce Sogen? Is it Sogen? Is it S-O-A-G-E-N?

Starting point is 00:17:37 How do I pronounce the name of your library correctly? I don't think anybody had literally spoken it out loud until today. So I'm happy with Sojin. That's how it is in my head. Okay. I vaguely remember it's like a Japanese character from some kind of video game or something like that. What? Yeah, yeah.

Starting point is 00:17:57 Oh, I was just, I just took the words SOA generator and stuck them together. That's as deep as any thought that went into it is is it actually an anime character timor are you just making stuff up what's happening um i might be making stuff up i'm gonna research this i i have a vague memory of like i i've heard this name before but but maybe i'm mistaken to me it counts that sounds like uh like a shortening of sojourner or something. Like I'm expecting it to be a traveler of some sort. Oh, yes.

Starting point is 00:18:32 So Sojin is a character in Ghost of Tsushima, which is a video game. Oh, okay. But it's spelled S-O-G-E-N without the A. Okay. Apparently, it's impossible to make up a new word today. Right. Anyway, so what is Sojin? What problem does it solve? And what's the structure of structure of arrays and why do we need that in c++ what is this about okay so the problem that

Starting point is 00:18:52 it's aiming to solve is the essentially the cache locality problem inherent in uh if you have a large data set of say you have an array of many many objects those objects have quite a few fields you need to whip through that array and only do some processing on one particular field in each element in the array. You're going to essentially take a cache hit every single time because your struct, one or two instances of your struct are going to fill the cache, where really what you just want is that one element laid out side by side. So, struct of arrays is saying, okay, instead of having one array with each of our individual objects, let's not have an explicit object anymore and let's have many arrays, one for each field, and then whip through the particular array that we want. So, that's not ideal for, say, most scenarios where you would have – the example I use in the documentation for the library

Starting point is 00:19:47 is an employee database piece of software. Now, obviously, in reality, you would use SQL or something for this, but we'll just roll with this. In that sort of application, you're going to want the array of structs model. You're going to want the objects to be self-contained actual objects because you're really going to access a field of an employee and not touch any others. That's pretty unusual. But for the scenarios where you do need to do that, you want to restructure your data that way for things like low latency applications like collision detection in a

Starting point is 00:20:20 game engine, for instance, or rendering applications. It's often worthwhile to structure your data that way. So the annoying thing about working with data like that, though, is that you now lose the explicit object that models your thing, and you instead have to implicitly connect all these different arrays together and say, okay, all of the elements at index seven are the one I'm interested in. And that can be quite annoying because if you add a new element to any one of those arrays, you need to ensure that you do it to all of them. If you shuffle them, sort them, whatever. Otherwise, you end up with data going out of sync and all sorts of crazy bugs. So this project is fundamentally two things.

Starting point is 00:21:00 It's a set of abstractions for, it's a library for working with data like that in C++ as though it were one contiguous collection. And it's also a generator for solving some additional problems on top of that. So when I was looking at your project before the interview, I was immediately reminded of Mike Acton's data-oriented design talk from CppCon like 2015 or whatever that was. Is that a fair comparison? This is a data-oriented design principle? Yes, very much so. And I thought maybe I linked to that talk somewhere in the description for the project. Maybe I didn't.

Starting point is 00:21:37 But yes, certainly that sort of design is firmly in mind. Yes. Okay. So it's a way of like formalizing that kind of design. Yeah. mind yes okay so it's a way of like formalizing that kind of design yeah and and uh wrapping it up in a you know so okay the vocabulary type everybody uses in c++ for containers is vector right even when you don't want to use a vector you want to use a vector that's the whole joke so that's an interface that we're all familiar with uh i sort of wanted to have that same

Starting point is 00:22:01 interface but for this style of data. Okay. All right. And so how do I use Sogen? Like what's the workflow like? Because the title suggests that it's a generator for C++, so it's not actually just a C++ library, so I need to generate code. How does that work? How do I use this? Like do I define my struct and then I run a script over it to generate some other C++ code that I then compile into my program, something along those lines?

Starting point is 00:22:36 Yeah, I should be careful to clarify that you don't have to use the generator. The library is, the features you get without the generator are about 90% of what the generator provides. So the generator is for fairly specific use cases on top, which I'll explain in a bit, I suppose. But if you were to use the generator, yes, you would be, you describe your struct in a configuration file that says what the members are, if you have any particular alignment requirements, and if you want to do any code injection stuff and run the tool and it spits out a header file for you with with the with the with the code that then uses the library as a as a dependency but i i should i feel like i've sort of buried the lead there i should clarify what you might use the uh generator for over the top of the base library perhaps that's worth

Starting point is 00:23:20 worth me clarifying yeah so because this – my background is not game development, but it's game dev adjacent. And in those sorts of environments, you have to do a lot of reflection-based tasks. So whenever you need to deal with deserializing and serializing assets, for instance, there's all sorts of different assets in a game engine. Even if you're not making games, my company's not making games,

Starting point is 00:23:44 we still use Unreal Engine, and that has its own built-in reflection system. And these reflection systems, because C++ doesn't have reflection proper, invariably end up being based on some combination of source code scanners, stringification, magic macros, that sort of thing. And indeed, Unreal is no exception. And that works very well for the way they use it, but it does mean it's a little bit hard to bring in, you know, if you want to bring in third-party libraries and have them integrate

Starting point is 00:24:13 into whatever the reflection system is natively. If they depend on magic macros and they depend on various injections that you need to do, you either need to maintain forks of these libraries or you need to create wrappers for everything that you bring in, which You either need to maintain forks of these libraries or you need to create wrappers for everything that you bring in, which is its own maintenance burden. So the goal of the generator is for you to be able to just say, okay, I need you to put in this magic macro

Starting point is 00:24:35 as part of my class definition and have these magic macros as being part of the various functions and expose it to whatever system and to do that quite simply. And that way you don't have to maintain a bunch of wrapper classes because that's essentially what the code generator is doing for you oh and the and of course the other thing you get too is you get names which uh was something one of the questions is what do you get out of using a generator that that you don't get from uh from just template metaprogramming for this particular application is you get names for everything. Okay.

Starting point is 00:25:08 So if you have like a row abstraction, say – sorry, I should – my mental model for how this works is that it's a table with rows and columns. Right. So each column being each member of your struct and each row being the implicit data members that all share the same index. If you want to address a row so let's say you have some say std tuple or something a struct of references to each member of that row you want to be able to do dot id you don't want to have to do dot get angle brackets zero you know we're

Starting point is 00:25:39 humans we like we like names now you can you can do that with templates if you're willing to use specialization macros or do specialization tricks by, like, injecting something into your type using, like, CRTP. But always that ends up being a very tedious to maintain thing that is easy to accidentally break. So by having everything just in a nice little config file that has its own set of diagnostics associated with it, it generates all the named members for you, and you don't have to worry about maintaining any template specialization soup. Okay. Of course, it's there.

Starting point is 00:26:15 It's just that the generator is doing it for you. All right. I want to, if you don't mind, just make sure I understand, without the generator, I have all of the tools that let me have a bunch of these, this table, as you're describing it, rows and columns. And if I want to sort based on the third element in my row, the third column in my row, it will do that and keep everything nice and organized. It'll sort all of the other columns at the same time for me. I can access the members by index.

Starting point is 00:26:49 I can do things with index. But if I want names, then I want to use your generator. Correct. Okay. Names and anything that might be integrated into a reflection system. Yeah, that's what you're getting with a generator. So what kind of reflection then do you provide? Do I get like compile time tables of the names and members and that kind of thing?

Starting point is 00:27:07 Yes, you do. There's a compile time. There's a template variable for accessing the names of the columns. There's the null terminated string for the names of each column. There's an enum in each class that has the indices of each column. And then there's the config file. You can inject various annotations into your class so that if you want to say, in Unreal Engine, if you want to expose a class to the visual programming,

Starting point is 00:27:29 the blueprint system, you need to use the U class magic macro, and you can trivially inject that into your types without having to create a bunch of wrapper classes for third-party libraries or whatever. Okay. Once we have the generated reflection-y thingy, we can pretend like we have our old school structs with all the members in one thing. Yep, correct.

Starting point is 00:27:51 And I can just do like a ranged for loop over them and just pretend like my world is what I thought it had been. Yep. Okay. And so how does the generator know the members of the struct? Do you have like a thing that actually scans the code and somehow parses C++ class declarations? Or do you have to put annotations on your members with magic macros?

Starting point is 00:28:16 Or do you have to duplicate the declaration in some kind of script that you feed to the generator? What is this magic? It's a TOML config file. So you describe your, in the config file, you have an entry for each SOA struct you want to create, and you run the tool over on the config file. Okay. Is TOML like, I don't know, related to JSON or anything else, the YAML that we're familiar

Starting point is 00:28:43 with, or is that, Why did you pick it? It's related insofar as it's in the same class of config. I picked it because, well, okay. I don't think JSON is very human friendly. And I think YAML is far too complex. It's too easy to shoot yourself in the foot with YAML in what is ostensibly a config file format so i like toml because it's it's hard to get wrong it kind of looks like old school like windows ini files from what i'm looking at here yep that's a good way to describe it it's it's any files but with some sort of standard applied to them standard who needs a standard? So these reflection, I'm just curious, because before we started recording, you and I were bantering a little bit about constexpr, all the things. Yes. So I'm assuming that these reflection structures that you provide are like constexpr static tables of things that people can do anything at compile time they want to know about the types

Starting point is 00:29:46 that they are working with. Yes. Yep. Nice. All right. So speaking of reflection, I think actually this array of structs, struct of arrays transformation

Starting point is 00:29:56 actually often comes up as a use case for like proper reflection, which we obviously don't have in the language yet. Like I think Bryce mentioned this in his ACCU keynote this year. I saw a talk by Floris Bob sitting in at this year's C++ Now conference where he was doing similar things like, oh, let's make the data layout configurable,

Starting point is 00:30:20 but let's still have the same class interface as we always do. And then he also very quickly reached a point where he said, like, we can only really do this with reflection. It's really cool that you can do this basically the way you do it. But like, if you actually had proper reflection in the language, how would you do it then? Like, would that make things a lot easier? Like, what's your take on kind of reflection, why we need it?

Starting point is 00:30:41 Okay, so I don't know that I really want to dare to say how i might do it because i'm nowhere near enough of an expert to speculate uh what it might look like in the language but i can tell you uh well i've already sort of touched on one thing that's very very difficult without it is the names aspect of it right you know you can you can do some complex template injection nonsense but that's it's just that. It's complex nonsense. There is one other thing that the generated code from Sojin can do, which I would love Reflection to be able to do, but as I understand it, there's not really any elegant way of doing it in vanilla C++, and that is if you have a,

Starting point is 00:31:20 say you have some static interface, you have a class of types that have all got a pushback method with some varying number of arguments. It might be the case that you want in some situations, one or two of them at the end to have a sensible default and you, otherwise they may not be any sensible defaults. And there's no way that I know of to be able to have as part of the function definition, some sort of conditional assignment operator default have as part of the function definition some sort of conditional assignment operator default thing as part of the actual function i can think of ways to do it with like if you instead of taking all the arguments individually you took them as a struct and you

Starting point is 00:31:54 could have say the that struck struct be built up of like a composite of base classes that each represent each member and maybe they have an in-class member initializer, or maybe they don't, depending on some template specialization stuff. But again, just me even trying to explain that, on top of my head, I've stressed myself out. That's the sort of thing that it would be nice just to be able to have a cool little syntactic thing to say, okay, maybe there'll be an equals whatever for this function parameter, or maybe there won't, and to have that be syntactically valid and to maybe source that from some const eval function or something, or maybe there won't. And to have that be syntactically valid and to maybe source that from some const eval function or something, you know, who knows.

Starting point is 00:32:28 That's the example that comes to mind. That's interesting because the enumerating, being able to enumerate members of a struct is, I think one of the first things that any refraction proposal always says, you know, we need to do this, but yeah, being able to basically express that it's kind of variable whether or not a function parameter has a default value and you determine that somewhere else in code whether that's the case yeah that's really cool like i i

Starting point is 00:32:58 don't even know if like any of the reflection proposals that were discussed in the last few years like can actually do something like this. I don't know. Maybe they can, maybe they cannot. I should say that it's a testament to my familiarity with, with homebrew reflection systems that are in video games is such that the idea of iterating through all the members of a structure didn't even occur

Starting point is 00:33:20 to me. Cause that's just like, duh. So I've, I was immediately thinking of something a lot more niche but yeah also that i think i think it's really interesting because some some of these reflection papers i think were written from like this more academic point of view it's like okay let's build this up from first principles you know we need to do this and this and this and this

Starting point is 00:33:39 and then i think actually it's really cool to come from the other end and say, well, these are the problems we need to solve. You know, we need a library that can do this, and we need like a piece of code that can do that. And like, what do we need to get that? And I think it kind of coming at it from the other end, I think that's a really cool approach to kind of figure out how, how reflection should work. But yeah,

Starting point is 00:33:59 I think that's not actually that much going on currently in the reflection study group. I think the work there has kind of stalled i think they kind of ran out of funding for like to keep work on the keep working on these papers and compiler forks or something i don't know i don't think there's much going on there at the moment unfortunate yeah i'm curious if you can speak at all to how you've actually used Sogen up to this point and what kind of performance benefit perhaps you've seen by moving something from an array of structs to structs of arrays layout. Yes. Okay. So I'll give you a little bit of context about the nature of the data I work with at my job, for instance, which also depends pretty heavily on not this, but a thing very much like it.

Starting point is 00:34:46 We originally, in an earlier version of things, did work with conventional array of structs. We've got particles in our physics simulation system, and they have things you might expect them to have, position, mass, et cetera. We've also got constraints which act on those particles, and they have different properties depending on what type of constraint they are, for instance. A bend constraint, for instance, is sat between two triangles, and it's related to the bend angle between the two of them. And so it'll have the indices of the triangles

Starting point is 00:35:14 and the various weights of the coefficients for the math, basically. And that's just a whole bunch of floats. So when we transitioned that from array of structs to struct of arrays, immediately we saw a speedup of about 30-ish percent. Oh, wow. Which is alone a pretty good – like that's how heavily – how well suited our data and our sort of access patterns were to this, that we got such a marked speed up immediately. But then we got a secondary speed up because the whole reason we were investigating structure of arrays to begin with was not the performance that it grants. At the time, it was a surprise, but intuitively it makes sense.

Starting point is 00:35:56 The reason we were actually doing it is because we wanted to SIMDify everything, where previously we'd only used it in a few places because the data wasn't structured in a way that made it easy to do. If I might, for just a second, if you can clarify SIMDify. Sure. I can't think of off the top of my head what SIMD is short for. Single instruction multiple data?

Starting point is 00:36:15 Right, exactly. So we were transitioning from just basic scalar math to using SIMD registers, using SIMD compiler intrinsics to do it. We didn't necessarily do it all raw. We used a library for that, but we still needed to change the layout of all our data to be able to do that. And we transitioned from AOS to SOA.

Starting point is 00:36:44 We got the 30 speed up then because all our data was contiguous and appropriately aligned we could swap out the the math for simmed math and then we went from you know we went from doing scalar math to vector math on on eight lanes or something so that was wow you know it now much much much. Like triple digit percentage speed increases. It's at most eight times faster if you're now doing eight things at the same time you were previously doing one. So that's amazing.

Starting point is 00:37:14 And we would not have been able to do that if not for changing the layout of our data to SOA to begin with. So it has sort of compounding impacts if you pair it with SIMD and things like that. It just reminded me of Godbolt's Law, which isn't very well known, I don't think, but Matt Godbolt from Compile Explorer. Godbolt's Law, if any single optimization makes a routine run two or more times faster, then you've broken the code. And you've just broken Godbolt's law with your assertions here.

Starting point is 00:37:48 And that is all. I'm sorry, Matt. That's interesting. So I had another point about SIMD. I actually talked about this with Matthias Kretz, who's the author of StidSimD, which we are hopefully going to get in C++26. And we were talking about this SOA, AOS, kind of like the speed ups that you can kind of get there and and he said something

Starting point is 00:38:10 interesting he said that often like you get the best speed up not from actually transforming aos to soa uh but from uh like doing like an in-between. So the fastest thing often is to have an array of structs that then inside have arrays, which are like simmed register width sized. So you have an array of structs of simmed register width sized arrays. Do you have any opinion on that? Have you tried anything like this? I do.

Starting point is 00:38:38 Or does Sojin even support this stuff? Yes, yes, and yes, I think in that order. That's amazing. That's amazing. So, okay. Okay, so, yes, and yes, I think in that order. That's amazing. That's amazing. Okay. So, yes, it does support it. We can do all of those things. I am familiar with that workflow.

Starting point is 00:38:54 The main reason you do that sort of thing is because you need essentially the alignment of the batches to meet whatever your SIM requirement is, so 16 bytes or 32 bytes or whatever. And then, of course, that matches the size of the thing so that you step forward by that amount and it all stays aligned nicely and you can do your load aligned and store aligned, et cetera, as you move through into your calculations. By nesting it in a second level of array like that, you sort of get that just out of the compiler

Starting point is 00:39:25 just using a strategically placed align as you get that but that does of course mean that you now have two layers of data you need two little square brackets everywhere if you want to address those members individually and you have to do like you know if you've got a simmed register width of eight and you want to access element 17 you then have to do like you know blah divided by eight blah mod eight which is pretty annoying and you have abstractions for that but i think humans tend to prefer data being flat so the way that i've addressed that problem is to have it so that if you specify over alignment for a column in one of the tables you'll get an aligned stride which is a static context for a size T just as a member of the class, the description for each column, which is just calculated

Starting point is 00:40:09 based on the alignment that you've specified. And that says that if you step through this collection by this amount at a time, everything stays nice and perfectly aligned. So you can just have one level of floats, and you can step through it by that value, and you get the same benefit without having two sets of square brackets everywhere.

Starting point is 00:40:25 Cool. Right. So we'll be back in just one second, but I would like to mention a few words from our sponsor. This episode is supported by PVS Studio. PVS Studio is a static code analyzer created to detect errors and potential vulnerabilities in C, C++, C Sharp, and Java code. Podcast listeners can get a one-month trial

Starting point is 00:40:47 of the analyzer with the CPP Cast 23 promo code. Besides, the PVS Studio team regularly writes articles that are entertaining and educational at the same time on their website. For example, they've recently published a mini book called 60 Terrible Tips for a C++ Developer. You can find links to the book and to the trial license in the notes for this episode. And now we're back to Mark and Jason. Hello again, both of you. So I actually have a couple more Sojin questions.

Starting point is 00:41:14 So what C++ standards and compilers does Sojin support? And do you support CMake? 17, the big three, and no. All right. Not because I don't want to support CMake. I'm a CMake novice. I tend to choose Maison Build as a preference, and I've never gotten around to essentially learning enough about CMake other than learning what I need to know to fix a problem in someone else's project, then probably forgetting it immediately afterwards. So I would not have any objection to someone adding CMake support,

Starting point is 00:41:50 but I currently haven't done it myself. Right. So you're open to contributions and pull requests and things like that? Yep, absolutely. Yep. All right. Amazing. So where can we find your code? Is it on GitHub?

Starting point is 00:41:59 It is on GitHub. Marza slash Sogen, S-O-A-G-E-N. All right. We're going to post the link to that repository in the show notes. And what license do you use? Can people just go and use that code for their own projects? Yep. MIT.

Starting point is 00:42:15 MIT. Amazing. All right. And do you have any kind of roadmap, what you want to do next with this library? Is there going to be some kind of 1.0 release at some point? There is a roadmap. It's actually currently the only issue on the repository is my own notes as a roadmap.

Starting point is 00:42:32 So yeah, I suppose eventually there'll be a 1.0. So I mentioned earlier that predominantly the features you get by using the generator are the reflection stuff and the default arguments and names. Apart from those, which I don't think I'll ever be able to close that gap in the absence of C++ having actual reflection, there are some other things currently that it does, just some class interface stuff,

Starting point is 00:42:57 but I would like to bring the two to like at parity. So that would be, I guess, my 1.0. So you essentially get all of the features you possibly could without using the generator. And it's not currently there. It's almost there. Silly question. What language is the generator written in? Python. I had a suspicion it might be. I didn't actually look on the project. So from the perspective of do you support CMake, the rest of the library, I'm assuming,

Starting point is 00:43:26 is header only, right? Yes, correct. Right. So, yeah, so supporting CMake should be relatively easy to just have a custom build step that calls your Python script that generates the thing and then have other things rely on the output of that. So for our listeners' standpoint, for you, CMake, it shouldn't be difficult to use your library. All right.

Starting point is 00:43:43 So you actually have a few other libraries on your GitHub. So you mentioned that the language that the user kind of specifies their struct layout in, in Asogen is TOML. I also noticed you have a library called TOML++ on your repository, and that has 1,200 stars on GitHub. So that's not a low number. So it seems like it's a popular library. What's that one about?

Starting point is 00:44:09 Yeah, it's a Tumble parsing and serializing library in C++, I guess. Okay, so a bit of brief history as to why this library exists. I needed to use Tumble for a personal project a few years ago. There were two options in C++. One was abandonware. Author hadn't touched it in three years, and it didn't support the current version of Tumul. And the other one was much more actively maintained, but it didn't really suit the programming model that I wanted to use with it.

Starting point is 00:44:40 It was still being developed at the time, too, so I was still missing a few features. And I thought, well, okay, how hard could this be? Which is of course, famous last words, because writing a parser for if you've never done that before, it was like, oh, okay, actually this is kind of complex. And then, oh, okay, now I've published an open source library that's hilariously gotten popular that I wasn't expecting that. Now I've got to maintain it. Oh, crap. So how's that going? Oh, it's going okay. Admittedly, my enthusiasm for the project is a bit lower than what it was

Starting point is 00:45:11 now that it's relatively mature, but I come back to it occasionally and, you know, tinker. It's on the back burner in terms of, like, new features and stuff, but, yeah, it's still maintained in that I fix bugs. I've been pondering an episode of c++ weekly titled something like um how to responsibly abandon your open source project yeah i can relate to that yeah right are there any other projects that you're working on that you want to share with us no i had intended to i I was building a ray tracer to make use of

Starting point is 00:45:45 Sojin as part of my, hey, here's this thing, and I'm using a ray tracer. But I realized that I could either do one or the other and not both. I might release that at some point. But no, nothing that's maybe worthy of discussion on the podcast. Alright.

Starting point is 00:46:01 You also recently attended your first C++ committee meeting in Varna. That was in June, if I remember correctly. You were both there. And so you joined the Finnish standardization body. So you're now an official member of the committee as far as I know. So what was that like going to your first committee meeting? Are you kind of interested in the progress of standardization? Do you want to do this more? Yes.

Starting point is 00:46:27 Okay. So what was it like? It was good. I went in with about a million questions, and I came out with not really any questions anymore. It was a very good learning experience. I don't feel like I went there with any intention of making waves. I just wanted to learn how everything worked.

Starting point is 00:46:46 How the sausage is made, as they say. Exactly, how the sausage is made, precisely. So I feel like I got a good overview about that. All of the misconceptions I had were dispelled and all of the questions I had were answered, so that was good. And would I like to participate further? Yeah, I've got an idea for a relatively simple change, I think, that I'm exploring.

Starting point is 00:47:03 I know, I know, famous last words. I've got two ideas. One is, I think, relatively simple, and the other is not. And it'd be interesting to find if that guess holds up if I were to write them both up. So it might turn out to be the other way around. Are you going to participate in the reflection work? Because it sounds like you should. Yeah, but if it's in a situation where it sort of needs people to pick it up and take the lead on it i don't want to put my hand up for that i participate in discussions but i don't necessarily want to be the uh the driving force behind them we'll say i see yeah no i think i think i can understand why it's taken so long and and how challenging

Starting point is 00:47:44 it actually is because when you really drill down into not only what reflection is, but how do you express it programmatically, syntactically? This is like a computer science. This is a hardcore computer science thing, and I'm very much not that. So I'll participate in discussions, but I don't necessarily think I want to take any sort of lead in designing that sort of thing. Yeah, I think on top of designing,

Starting point is 00:48:07 you also have to implement it in a compiler. So you kind of have to be a compiler engineer or have a compiler engineer working with you as well. And so, yeah, I think it's an enormous amount of work to make progress on this. And yeah, I do get that, you know, it's very time-consuming, expensive to do this work. It needs experienced people.

Starting point is 00:48:28 I'm very sorry that it's kind of stalled. I hope to see progress there. And I think one thing that way you could, I think, contribute very well is to just provide real world experience of like use cases. Like this is what we actually need reflection for. These are like actual things that pop up in my day to day work. Can this or that syntax or proposal actually do that for us? if it doesn't you know maybe you're missing something here right he seems skeptical i i i you know having only recently started being involved in the whole

Starting point is 00:48:59 proceedings i haven't really got a good sense for you you know, how much time it would actually consume if I were to be actively involved on particular proposals or whatever. So I'm hesitant to fully dive deep into anything just yet. Fair enough. Fair enough. I think I spent like several committee meetings just being a tourist before I kind of wrote my first little paper. Yep. And then I kind of got sucked in somehow because the little paper turned out to be way more complicated than I thought. But yeah, I don't know. It can go either way, right?

Starting point is 00:49:33 Well, as long as we're talking about it, when is the next standards meeting in case any of the listeners are interested in trying to attend themselves? So the next standards meeting is actually in November in Kona, Hawaii, in the US, taking place from the 6th to the 11th of November. So yeah, we had the meeting there last year also in November, and we're going to be in Kona again. I'm actually this time not going to be in Kona in person myself. This is going to be the first committee meeting that I'm going to miss since I joined. I'm going to attend it virtually. So you can dial in?

Starting point is 00:50:09 Yes. So since COVID, everything's hybrid and you can dial in. The only thing you have to deal with is a pretty brutal 12-hour time difference between where I live and where Kona is. But if you find a way to deal with that, then yes, if you're a member of the committee,

Starting point is 00:50:25 you can dial in and participate in discussions. So I think that's something that is a lot better than what it used to be before COVID. Because before COVID, it was like, you can't afford to go there. Then basically, you're out, which is not a very inclusive way of standardizing a language. So I'm very, very happy that we improved on that one that's cool all right so then i think uh we're nearing the end of our episode here um so we should probably start wrapping up but um yeah um mark is there anything else you want to tell us before we do that is there any way people can reach you if they want to contribute to Sojourn or just get in touch, ask your questions, talk to you about reflection or whatever else?

Starting point is 00:51:09 Yeah, probably the easiest starting point is just GitHub, GitHub forward slash Marza. My repositories all have contact information for me on them. So you can use that as a jumping off point and go from there. I'm pretty active on Twitter or the artist formerly known as Twitter. And, you know, Discord and various things.

Starting point is 00:51:29 So I'm reachable. Just GitHub's a starting point and go from there. All right. Well, then thank you so much for being our guest today, Mark. It was a great discussion. Thank you so much. And thank you, Jason, for being my co-host today. It was an honor and it was a lot of fun to have you back on the show.

Starting point is 00:51:44 And I hope this is not going to be the last time and we're going to have you back at some point in the future again. Absolutely, Timur, let me know. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing

Starting point is 00:51:58 the stuff you're interested in. Or if you have a suggestion for a guest or a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate it if you can follow CppCast on Twitter or Mastodon. You can also follow me and Phil individually on Twitter or Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com. The theme music for this episode was provided by podcastthemes.com.

CppCast - Soagen

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.