CppCast - Soagen
Episode Date: August 18, 2023Mark Gillard joins Timur and guest co-host Jason Turner. Mark talks to us about reflection, SIMD, and his library soagen, a structure-of-arrays generator for C++. News What is Low Latency C++...? C++Now 2023, part 1 What is Low Latency C++? C++Now 2023, part 2 Inside STL: The vector Inside STL: The string Experimenting with Modules in Flux pycmake cpptrace Links Soagen on GitHub Soagen documentation Mike Acton: Data-Oriented Design and C++ at CppCon 2014 Bryce Adelstein Lelbach on SoA and reflection at ACCU 2023 Data-Oriented Design and Modern C++ at CppNow 2023 Godbolt's law toml++ on GitHub PVS-Studio: 60 terrible tips for a C++ developer
Transcript
Discussion (0)
Episode 367 of CppCast with guest Mark Gillard, recorded 9th of August 2023.
This episode is sponsored by the PVS Studio team.
The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about several new library releases
and about how standard containers are implemented in the Microsoft Standard Library.
Then we are joined by Mark Gillert.
Mark talks to us about his library Sogen, a structure operation generator for C++.
Welcome to episode 367 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Timo Dummler.
Joined by my co-host for today, Jason Turner.
Jason, how are you doing today?
I'm excited today, Timo, actually.
Thank you for inviting me back on as a guest co-host.
It's been a little while. How many episodes have you all done without
me here, basically? 17, 18, depending on whether we count the one that we did together.
Right, right. Yeah, quite a lot. Yeah. It's been fun so far.
So everything's going well? Did I leave the podcast in good hands?
Yes, yes. It's been great fun. And yeah, thank you so much. As you can tell,
Phil is still on vacation. So I'm very, very excited to have you here. You're back on the
show for the first time in over a year, aren't you? If you don't count the episode, the special
Christmas one that we did together. Yeah, definitely over a year. Yeah,
if you don't count the Christmas episode. So how have you been since then? What are
you up to these days? Well, not a whole lot has changed for me professionally anyhow i'm still doing training
i do have a c++ best practices workshop coming up at ndc tech town in the end of september i'm
assuming this will air before then right oh yeah this will air next week on friday oh okay then yes definitely and
i'm planning to do also a best practices workshop and the post-conference cbp con so there's that
and i'm also sitting here like a fool staring at my youtube subscriber count because i'm currently
at 99 807 subscribers all right just want to see that 100 tick over.
That's amazing. So if you haven't yet subscribed to Jason's YouTube channel,
please do so now to help him get the six digit in there.
I'm hoping to hit that by the end of the week. That's my plan right now.
So what about you? You've had some career things going on lately, right? also write a book about C++, which I wanted to do for a long time. But that's not really
compatible with being full-time employed at a big tech company, unless you want to work weekends
and evenings, which is not something I'm a particular fan of. So I kind of had that in my
backlog, but always wanted to do that. And so now I finally will have time to pursue that. So I'm
very excited about that. Do you have a title or a plan, something to get the listeners excited about? So tentative title is low latency C++. You might have seen my three
hour talk at C++ Now this year, which had the same title where I was talking about all of the
different techniques that you can use if you're into audio or finance or video games or any of those kind of fields where you're optimizing for latency
rather than just general performance like where it really matters how many milliseconds or
microseconds a particular piece of code runs and it shouldn't be over a certain deadline it should
be as fast as possible so you have this hyper focus on kind of latency and that leads to
particular ways of writing c++
that have some overlap with kind of general performance optimization but sometimes you also
take very different approaches and so i want to just provide an overview over all these techniques
that you you might want to use in those industries so it's kind of a summary of like a bunch of talks
that i've given over the last five six seven, seven years, and also kind of other material that
I have kind of researched in parallel. So I just want to, I've been asked by people, it was kind of
fun when I first did this one hour version of this low latency overview talk, people were like, yeah,
this is really exciting, but like one hour is nowhere near enough to like give a proper overview.
So can you do a longer one? So I did a three hour version at C++ now. And then after the three hour
version, people were
like well but that's nowhere near enough time to actually explain how this will work so can you
like a one-day workshop and then i was like yeah but okay let me just write it up right so so that's
what i want to do um and yeah that's kind of going to be another one of my site projects for the next
few months are you working with a publisher or self-publishing?
So I haven't decided that yet. I had actually a couple of publishers who kind of approached me and said they were interested to do something together, but
I haven't yet decided if I go for that or if I do self-publishing or how I'm going to approach this.
I want to first get like a kind of table of contents and this is what's going to be in this
book and have like a really good idea of that in my head.
And then think about how I'm going to make that happen and exactly with whom.
And yeah, that sounds like a pretty good approach to me having only self-published, but I have worked with publishers on other, good. It makes sense. Yeah.
To, uh,
to get as much done as you can before you talk to a publisher and then decide
if you want to work with a publisher,
if you want to go the self publishing route and just as an aside to listeners
who might be thinking about the same kind of thing,
there's nothing stopping you from self publishing and then selling your book to
a publisher. That is a thing.
I actually had someone offer to buy my book and I said,
no, I'm good. Thank you very much. Bye. That is interesting. I did not know that was a thing.
Thank you, Jason. You own the copyright to your book, so no one can stop you from selling it.
Right. Well, so at the top of every episode, I'd like to read a piece of feedback, but actually
this week, I didn't receive any feedback, neither by email nor by twitter nor by a mastodon
nor on reddit um so i actually don't have any new feedback uh this week but um if you have any
please uh let us know you'd like to hear thoughts about the show you can always reach out to us on
mastodon or on twitter or is it no officially called x i'm'm not sure. I saw someone write T-X-I-T-T-E-R.
All right.
Or you can email us,
and that's definitely going to still work.
Email us at feedback at cppcast.com.
Joining us today is Mark Gillert.
Mark is a soft body physics engine developer
and low-level tooling guy at Ausgenic,
a surgical training company based in Helsinki, Finland.
Prior to his current role,
he was the chief architect of an internal graphics engine
used by the company in prototypes during their startup phase.
Before coming to Finland,
Mark was a teacher, researcher, and consultant
at Flinders University in South Australia,
working with haptic controllers to find novel ways
of modeling and teaching
different surgical interactions.
Mark first learned to code as a teenager, making mods in Unreal Script for Unreal Tournament
2004.
And these days, almost all of his work is C++.
Mark, welcome to the show.
Hi, thanks for having me.
Mark, I feel like your bio is backward because so many people say they got into programming because they wanted to do gaming stuff. It sounds like make tools. I want to make things that go in games.
So, you know, the path I've taken is, is, uh, and satisfies that quite nicely.
I can totally see that.
I can feel that.
I mean, I'm curious though, you, you just like casually throw soft body physics into
your, into your bio, um, versus hard body physics, rigid body.
Like what's, what's the Like what's the deal here?
So it's a platform for simulating the interactions between different tissues, right?
So it's not just rigid body physics as you might have in, say, NVIDIA's physics where
you've got spheres and cubes and various things.
It's more about pretty much entirely focusing
on the interactions between the soft tissues themselves.
Okay.
Yeah.
And how we might drive, say, we have these three-dimensional
haptics controllers that we use to act as a proxy for, say,
a scalpel or a drill or something, and that's capable of rendering
some force feedback.
And so from the physics simulation, the interactions between the tool
and the soft tissue, we can pull forces out of that
and have the tool render some force as it would in real life
if you passed an instrument through some flesh during surgery.
So what does this practically look like, soft body physics modeling?
Is it like a bunch of particles connected by
springs or am i like overthinking this no that's that's essentially the the bare bones description
of what it is it's it's not a mass spring system but it shares like similarities with that it's a
it's uh and it's i'm of the people that work on the project i'm not the physicist so i don't want
to get too much into the specifics because I'm going to,
I'm going to fudge the description,
but I would say it would be fair for me to describe it as being a mass
spring system on Uber steroids.
Okay.
And that's about as technical as I can be on the physics side of it because
the,
you know,
the research and the,
the,
the physical principles that go into making it work aren't really my area of
expertise. I wrap it up in a software engineering framework and make it fast. That's sort of where
I live. Interesting. Right. So, Mark, we'll get more into your work in just a few minutes. But
before we do that, we have a couple of news articles to talk about. So,
if you have a feature comment on any of these, okay?? So the first one I have this time is a series, actually, of blog posts,
a whole series of blog posts called Inside STL by Raymond Chen
that came out last week.
And probably there's going to be more coming out
about how the containers in the Microsoft Standard Library
are implemented under the hood.
So I thought that was really interesting.
There was one about the vector.
It's called Inside STL, the hood. So I thought that was really interesting. There was one about the vector. It's called Inside STL, the vector.
That was cool because I always thought that vectors,
like std vector is implemented as like three members,
like pointer size and capacity.
But it actually turns out that the Microsoft version
is implemented with three pointers,
first, last, and end.
So Raymond talks about that.
Then he has a blog post about the string,
the Microsoft string. There's another post about the string, the Microsoft string.
There's another one about the pair,
the lists, like the maps.
And there's like a blog post for each one of those.
So if you want to dig into Microsoft STL implementation
and see how things are done there,
I think that's a really cool kind of series of blog posts
that caught my attention.
Well, and the string article also does a comparison
on how the
other two standard libraries implement their small string optimization. So for anyone who's curious
about what the heck is small string or short string optimization, how does it work? This is
a super succinct overview of that because this is, well, Raymond's article is what he publishes
one literally every weekday, right? So they're, they, they're never very, very long.
So it's all compact in here because we, we interviewed him back in the day.
Uh, CBP cast.
Oh, that's interesting.
I haven't, I haven't listened to that one yet.
So I actually started listening to all the CBP cast episodes all the way from like the
very beginning when
rob were just doing them on his own and then like later and later but i haven't caught up to this
one yet so yeah so i think this is if we look at i'm derailing the conversation now but i think if
we looked at this post number it might actually be literally be post 108 532 or something ridiculous
that's a lot of blog posts maybe i'm wrong but it's a lot i
think it's every weekday that he publishes one right so there's another blog post that i also
found really interesting this week from twist and brindle whom we had on the show a couple episodes
ago he talked about his flux library kind of an alternative way to do like iterators and ranges, which is kind of really cool.
And he updated his library to support C++20 modules
and he wrote a blog post about it.
And that's really interesting,
not just because modules are great
and because the Flux library is great,
but also because the blog post
actually explains how it all works, right?
So he shows, first of all,
how to compile his library using modules on all the
three major compilers like clang 16 gcc 13 msvc 17.6 like what compiler flags you need to compile
with modules all of that stuff uh he also talks about how you can try it out using cmake uh he
mentions that cmake has this new built-in module support, but he actually doesn't use it. He uses Viktor Zverevich's modules.cmake thing for that.
He explains how that works.
And then also he talks about how to modularize a library.
So you can actually apply that to your own library.
If you want to make your library
compatible with C++20 modules,
he kind of goes through that as well.
So I thought that was like a really,
really cool and comprehensive blog post
for people who are interested
in actually using modules in practice. So I know we're going to get into Mark's library a little bit later on,
but I'm curious if you looked at modules at all yet so far for the library you've been working on,
Mark? No, I admittedly, modules conceptually have been a bit of a black hole for me. I just,
other things keep coming up when I've set aside time to learn about them. So no, I haven't.
Yeah. Well, I mean, I'm in the same boat because i'm waiting for the tooling story to be complete yes so that i can just use them not have to figure out how to use them but anyhow
yeah so speaking of cmake someone actually made a branch of cmake that supports python scripting
in addition to regular CMake scripting.
There's a GitHub repository.
It's called PyCmake.
There's a delightful Reddit discussion
about whether that's a good idea or not.
Yeah, I thought that was another interesting project
that I wanted to mention that surfaced this week.
Let's do a show of hands.
Who thinks that Python and CMake is a good idea?
Mark, Timur, either one of you think it's a good idea?
I won't raise my hand, but I have a bit of a non-straightforward answer. I think
replacing CMake's DSL with literally anything else is is a goal worth pursuing i don't think replacing it
with a turing complete like full programming language is is the right way to do it so one
of the former projects i was working on we had uh cmake augmented with lua scripts really uh which
was kind of really cool for that particular use case uh you could do cool things that are a pain to do
in cmake um but whether you really should have to do these things probably my answer would be no
well every single cmake best practices talk in the last five years has been stick with a declarative
style in your cmake don't have a lot of ifs and branching and stuff.
And so I'm leaning more towards the side
of making it too easy to program her CMake.
Might not be a good idea.
But I've also been in situations
where I could have used that.
Yeah, and there's also situations like Mark's library
that we're going to get into later
where we actually have to generate code
at certain points.
And you have to somehow integrate that into your CMake.
Before we get to that, I want to mention one more library
that also popped up this time around.
So lots of interesting new libraries recently.
This library is called CppTrace.
And it's a lightweight stack trace library that we can use while we're waiting
for C++23 header stack trace
to actually be universally available.
So currently, I think the Microsoft compiler
actually is the only one that has a full implementation
of C++23 stack trace.
I think GCC has kind of a partial one.
Clang is lagging behind.
They don't have anything at the moment.
If you want to use that stuff cross-platform today,
it seems like this is a new library that you can use instead.
Cool.
And yeah, the last newsworthy library
that I want to mention on this episode
is a new library called Sogen,
a structure of arrays generator for C++,
and that's Mark's library.
As it so happens, the author of that library is our guest for today.
So hello again, Mark.
Greetings.
So first of all, how do I pronounce Sogen?
Is it Sogen?
Is it S-O-A-G-E-N?
How do I pronounce the name of your library correctly?
I don't think anybody had literally spoken it out loud until today.
So I'm happy with Sojin.
That's how it is in my head.
Okay.
I vaguely remember it's like a Japanese character from some kind of video game or something like that.
What?
Yeah, yeah.
Oh, I was just, I just took the words SOA generator and stuck them together.
That's as deep as any thought that went into it is
is it actually an anime character timor are you just making stuff up what's happening
um i might be making stuff up i'm gonna research this i i have a vague memory of like i i've heard
this name before but but maybe i'm mistaken to me it counts that sounds like uh like a shortening
of sojourner or something.
Like I'm expecting it to be a traveler of some sort.
Oh, yes.
So Sojin is a character in Ghost of Tsushima, which is a video game.
Oh, okay.
But it's spelled S-O-G-E-N without the A.
Okay.
Apparently, it's impossible to make up a new word today.
Right.
Anyway, so what is Sojin?
What problem does it solve? And what's the structure of structure of arrays and why do we need that in c++ what is this about okay so the problem that
it's aiming to solve is the essentially the cache locality problem inherent in uh if you have a large
data set of say you have an array of many many objects those objects have quite a few fields
you need to whip through that array and only do some processing on one particular field in each element in the array.
You're going to essentially take a cache hit every single time because your struct, one or two instances of your struct are going to fill the cache, where really what you just want is that one element laid out side by side. So, struct of arrays is saying, okay, instead of having one array with each of our individual
objects, let's not have an explicit object anymore and let's have many arrays, one for
each field, and then whip through the particular array that we want.
So, that's not ideal for, say, most scenarios where you would have –
the example I use in the documentation for the library
is an employee database piece of software.
Now, obviously, in reality, you would use SQL or something for this,
but we'll just roll with this.
In that sort of application, you're going to want the array of structs model.
You're going to want the objects to be self-contained actual objects
because you're really going to access a field of an employee and not touch any others. That's
pretty unusual. But for the scenarios where you do need to do that, you want to restructure your
data that way for things like low latency applications like collision detection in a
game engine, for instance, or rendering applications. It's often worthwhile to structure your data that way. So the annoying thing about working with data like that, though,
is that you now lose the explicit object that models your thing, and you instead have to
implicitly connect all these different arrays together and say, okay, all of the elements at
index seven are the one I'm interested in. And that can be quite annoying because if you add a new element to any one of those
arrays, you need to ensure that you do it to all of them.
If you shuffle them, sort them, whatever.
Otherwise, you end up with data going out of sync and all sorts of crazy bugs.
So this project is fundamentally two things.
It's a set of abstractions for, it's a library for working with data like that in C++ as though it were one contiguous collection.
And it's also a generator for solving some additional problems on top of that.
So when I was looking at your project before the interview, I was immediately reminded of Mike Acton's data-oriented design talk from CppCon like 2015 or whatever that was.
Is that a fair comparison?
This is a data-oriented design principle?
Yes, very much so.
And I thought maybe I linked to that talk somewhere in the description for the project.
Maybe I didn't.
But yes, certainly that sort of design is firmly in mind.
Yes.
Okay.
So it's a way of like formalizing that kind of design.
Yeah. mind yes okay so it's a way of like formalizing that kind of design yeah and and uh wrapping it
up in a you know so okay the vocabulary type everybody uses in c++ for containers is vector
right even when you don't want to use a vector you want to use a vector that's the whole joke
so that's an interface that we're all familiar with uh i sort of wanted to have that same
interface but for this style of data. Okay. All right.
And so how do I use Sogen?
Like what's the workflow like? Because the title suggests that it's a generator for C++,
so it's not actually just a C++ library,
so I need to generate code.
How does that work?
How do I use this?
Like do I define my struct and then I run a script over it to generate some other C++ code that I then compile into my program, something along those lines?
Yeah, I should be careful to clarify that you don't have to use the generator.
The library is, the features you get without the generator are about 90% of what the generator provides. So the generator is for fairly specific use cases on top,
which I'll explain in a bit, I suppose. But if you were to use the generator, yes, you would be,
you describe your struct in a configuration file that says what the members are, if you have any
particular alignment requirements, and if you want to do any code injection stuff and run the tool and it spits
out a header file for you with with the with the with the code that then uses the library as a
as a dependency but i i should i feel like i've sort of buried the lead there i should clarify
what you might use the uh generator for over the top of the base library perhaps that's worth
worth me clarifying yeah so because this – my background is not game development,
but it's game dev adjacent.
And in those sorts of environments,
you have to do a lot of reflection-based tasks.
So whenever you need to deal with deserializing
and serializing assets, for instance,
there's all sorts of different assets in a game engine.
Even if you're not making games, my company's not making games,
we still use Unreal Engine, and that has its own built-in reflection system.
And these reflection systems, because C++ doesn't have reflection proper,
invariably end up being based on some combination of source code scanners,
stringification, magic macros, that sort of thing.
And indeed, Unreal is no exception.
And that works very well for the way they use it,
but it does mean it's a little bit hard to bring in, you know,
if you want to bring in third-party libraries and have them integrate
into whatever the reflection system is natively.
If they depend on magic macros and they depend on various injections
that you need to do, you either need to maintain forks
of these libraries or you need to create wrappers for everything
that you bring in, which You either need to maintain forks of these libraries or you need to create wrappers for everything that you bring in,
which is its own maintenance burden.
So the goal of the generator is for you to be able to just say,
okay, I need you to put in this magic macro
as part of my class definition and have these magic macros
as being part of the various functions
and expose it to whatever system and to do that quite simply.
And that way you don't have to maintain a bunch of wrapper classes because that's essentially what the code generator is
doing for you oh and the and of course the other thing you get too is you get names which uh was
something one of the questions is what do you get out of using a generator that that you don't get
from uh from just template metaprogramming for this particular application is you get names for everything.
Okay.
So if you have like a row abstraction, say – sorry, I should –
my mental model for how this works is that it's a table with rows and columns.
Right.
So each column being each member of your struct
and each row being the implicit data members that all share the same index.
If you want to address a row so let's say
you have some say std tuple or something a struct of references to each member of that row you want
to be able to do dot id you don't want to have to do dot get angle brackets zero you know we're
humans we like we like names now you can you can do that with templates if you're willing to use
specialization macros or do specialization tricks by, like, injecting something into your type using, like, CRTP.
But always that ends up being a very tedious to maintain thing that is easy to accidentally break. So by having everything just in a nice little config file that has its own set of diagnostics associated with it,
it generates all the named members for you,
and you don't have to worry about maintaining
any template specialization soup.
Okay.
Of course, it's there.
It's just that the generator is doing it for you.
All right.
I want to, if you don't mind, just make sure I understand,
without the generator, I have all of the tools
that let me have a bunch of these, this table, as you're describing it, rows and columns.
And if I want to sort based on the third element in my row, the third column in my row,
it will do that and keep everything nice and organized. It'll sort all of the other columns at the same time for me.
I can access the members by index.
I can do things with index.
But if I want names, then I want to use your generator.
Correct.
Okay.
Names and anything that might be integrated into a reflection system.
Yeah, that's what you're getting with a generator.
So what kind of reflection then do you provide?
Do I get like compile time tables of the names and members and that kind of thing?
Yes, you do.
There's a compile time.
There's a template variable for accessing the names of the columns.
There's the null terminated string for the names of each column.
There's an enum in each class that has the indices of each column.
And then there's the config file.
You can inject various annotations into your class so that if you want to say,
in Unreal Engine, if you want to expose a class to the visual programming,
the blueprint system, you need to use the U class magic macro,
and you can trivially inject that into your types without having to create
a bunch of wrapper classes for third-party libraries or whatever.
Okay.
Once we have the generated reflection-y thingy,
we can pretend like we have our old school structs
with all the members in one thing.
Yep, correct.
And I can just do like a ranged for loop over them
and just pretend like my world is what I thought it had been.
Yep.
Okay.
And so how does the generator know the members of the struct?
Do you have like a thing that actually scans the code
and somehow parses C++ class declarations?
Or do you have to put annotations on your members with magic macros?
Or do you have to duplicate the declaration in some kind of script
that you feed to the generator?
What is this magic?
It's a TOML config file.
So you describe your, in the config file, you have an entry for each SOA struct you
want to create, and you run the tool over on the config file.
Okay.
Is TOML like, I don't know, related to JSON or anything else, the YAML that we're familiar
with, or is that, Why did you pick it?
It's related insofar as it's in the same class of config. I picked it because,
well, okay. I don't think JSON is very human friendly. And I think YAML is far too complex.
It's too easy to shoot yourself in the foot with YAML in what is ostensibly a config file format so i like toml because it's it's hard to get wrong it kind of looks like old school like
windows ini files from what i'm looking at here yep that's a good way to describe it it's it's
any files but with some sort of standard applied to them standard who needs a standard? So these reflection, I'm just curious, because before we started recording, you and I were bantering a little bit about constexpr, all the things.
Yes.
So I'm assuming that these reflection structures that you provide are like constexpr static tables of things that people can do anything at compile time they want to know about the types
that they are working with.
Yes.
Yep.
Nice.
All right.
So speaking of reflection,
I think actually this array of structs,
struct of arrays transformation
actually often comes up as a use case
for like proper reflection,
which we obviously don't have in the language yet.
Like I think Bryce mentioned this
in his ACCU keynote this year.
I saw a talk by Floris Bob sitting in at this year's C++ Now conference
where he was doing similar things like,
oh, let's make the data layout configurable,
but let's still have the same class interface as we always do.
And then he also very quickly reached a point where he said,
like, we can only really do this with reflection.
It's really cool that you can do this basically the way you do it.
But like, if you actually had proper reflection in the language,
how would you do it then?
Like, would that make things a lot easier?
Like, what's your take on kind of reflection, why we need it?
Okay, so I don't know that I really want to dare to say how i might do it because i'm nowhere
near enough of an expert to speculate uh what it might look like in the language but i can tell you
uh well i've already sort of touched on one thing that's very very difficult without it is the
names aspect of it right you know you can you can do some complex template injection
nonsense but that's it's just that. It's complex nonsense.
There is one other thing that the generated code from Sojin can do,
which I would love Reflection to be able to do,
but as I understand it, there's not really any elegant way of doing it in vanilla C++, and that is if you have a,
say you have some static interface, you have a class of types
that have all got a pushback method
with some varying number of arguments. It might be the case that you want in some situations,
one or two of them at the end to have a sensible default and you, otherwise they may not be any
sensible defaults. And there's no way that I know of to be able to have as part of the function
definition, some sort of conditional assignment operator default have as part of the function definition some sort of conditional assignment
operator default thing as part of the actual function i can think of ways to do it with like
if you instead of taking all the arguments individually you took them as a struct and you
could have say the that struck struct be built up of like a composite of base classes that each
represent each member and maybe they have an in-class member initializer,
or maybe they don't, depending on some template specialization stuff.
But again, just me even trying to explain that, on top of my head, I've stressed myself out.
That's the sort of thing that it would be nice just to be able to have a cool little syntactic thing to say, okay, maybe there'll be an equals whatever for this function parameter,
or maybe there won't, and to have that be syntactically valid and to maybe source that
from some const eval function or something, or maybe there won't. And to have that be syntactically valid and to maybe source that from some
const eval function or something, you know, who knows.
That's the example that comes to mind.
That's interesting because the enumerating,
being able to enumerate members of a struct is,
I think one of the first things that any refraction proposal always says,
you know, we need to do this, but yeah,
being able to basically express
that it's kind of variable whether or not a function parameter has a default value and
you determine that somewhere else in code whether that's the case yeah that's really cool like i i
don't even know if like any of the reflection proposals that were discussed in the last few
years like can actually do something like this.
I don't know.
Maybe they can,
maybe they cannot.
I should say that it's a testament to my familiarity with,
with homebrew reflection systems that are in video games is such that the
idea of iterating through all the members of a structure didn't even occur
to me.
Cause that's just like,
duh.
So I've,
I was immediately thinking of something a lot more
niche but yeah also that i think i think it's really interesting because some some of these
reflection papers i think were written from like this more academic point of view it's like okay
let's build this up from first principles you know we need to do this and this and this and this
and then i think actually it's really cool to come from the other end and say, well, these are the problems we need to solve.
You know, we need a library that can do this,
and we need like a piece of code that can do that.
And like, what do we need to get that?
And I think it kind of coming at it from the other end,
I think that's a really cool approach to kind of figure out how,
how reflection should work.
But yeah,
I think that's not actually that much going on currently in the reflection
study group.
I think the work there has kind of stalled i think they kind of ran out of funding for like to keep work on the keep working
on these papers and compiler forks or something i don't know i don't think there's much going on
there at the moment unfortunate yeah i'm curious if you can speak at all to how you've actually used Sogen up to this point and
what kind of performance benefit perhaps you've seen by moving something from an array of structs
to structs of arrays layout. Yes. Okay. So I'll give you a little bit of context about the nature
of the data I work with at my job, for instance, which also depends pretty heavily on not this, but a thing very much like it.
We originally, in an earlier version of things, did work with conventional array of structs.
We've got particles in our physics simulation system, and they have things you might expect
them to have, position, mass, et cetera.
We've also got constraints which act on those particles, and they have different properties
depending on what type of constraint they are, for instance.
A bend constraint, for instance, is sat between two triangles,
and it's related to the bend angle between the two of them.
And so it'll have the indices of the triangles
and the various weights of the coefficients for the math, basically.
And that's just a whole bunch of floats.
So when we transitioned that from array of structs to struct of arrays, immediately we saw a speedup of about 30-ish percent.
Oh, wow.
Which is alone a pretty good – like that's how heavily – how well suited our data and our sort of access patterns were to this, that we got such a marked speed up immediately. But then we got a secondary speed up because the whole reason
we were investigating structure of arrays to begin with
was not the performance that it grants.
At the time, it was a surprise, but intuitively it makes sense.
The reason we were actually doing it is because we wanted
to SIMDify everything, where previously we'd only used it
in a few places because the data wasn't structured in a way
that made it easy to do.
If I might, for just a second, if you can clarify SIMDify.
Sure.
I can't think of off the top of my head what SIMD is short for.
Single instruction multiple data?
Right, exactly.
So we were transitioning from just basic scalar math
to using SIMD registers,
using SIMD compiler intrinsics to do it.
We didn't necessarily do it all raw.
We used a library for that,
but we still needed to change the layout of all our data to be able to do that.
And we transitioned from AOS to SOA.
We got the 30 speed up then because all our data was
contiguous and appropriately aligned we could swap out the the math for simmed math and then we went
from you know we went from doing scalar math to vector math on on eight lanes or something so
that was wow you know it now much much much. Like triple digit percentage speed increases.
It's at most eight times faster
if you're now doing eight things
at the same time you were previously doing one.
So that's amazing.
And we would not have been able to do that
if not for changing the layout of our data
to SOA to begin with.
So it has sort of compounding impacts if you pair it with SIMD and things like that.
It just reminded me of Godbolt's Law, which isn't very well known, I don't think, but Matt Godbolt
from Compile Explorer. Godbolt's Law, if any single optimization makes a routine run two or
more times faster, then you've broken the code. And you've just broken Godbolt's law
with your assertions here.
And that is all.
I'm sorry, Matt.
That's interesting.
So I had another point about SIMD.
I actually talked about this with Matthias Kretz,
who's the author of StidSimD,
which we are hopefully going to get in C++26.
And we were talking about this SOA, AOS, kind of like the speed ups that you can kind of get there and and he said something
interesting he said that often like you get the best speed up not from actually transforming
aos to soa uh but from uh like doing like an in-between. So the fastest thing often is to have an array of structs
that then inside have arrays,
which are like simmed register width sized.
So you have an array of structs of simmed register width sized arrays.
Do you have any opinion on that?
Have you tried anything like this?
I do.
Or does Sojin even support this stuff?
Yes, yes, and yes, I think in that order.
That's amazing.
That's amazing. So, okay. Okay, so, yes, and yes, I think in that order. That's amazing. That's amazing.
Okay.
So, yes, it does support it.
We can do all of those things.
I am familiar with that workflow.
The main reason you do that sort of thing is because you need essentially the alignment of the batches to meet whatever your SIM requirement is, so 16 bytes or 32 bytes or whatever.
And then, of course, that matches the size of the thing
so that you step forward by that amount
and it all stays aligned nicely
and you can do your load aligned and store aligned, et cetera,
as you move through into your calculations.
By nesting it in a second level of array like that,
you sort of get that just out of the compiler
just using a strategically placed align as you get that but that does of course mean that you
now have two layers of data you need two little square brackets everywhere if you want to address
those members individually and you have to do like you know if you've got a simmed register
width of eight and you want to access element 17 you then have to do like you know
blah divided by eight blah mod eight which is pretty annoying and you have abstractions for that but i think humans tend to prefer data being flat so the way that i've addressed that problem
is to have it so that if you specify over alignment for a column in one of the tables
you'll get an aligned stride which is a static context for a size T just as a member of the class,
the description for each column, which is just calculated
based on the alignment that you've specified.
And that says that if you step through this collection
by this amount at a time, everything stays nice
and perfectly aligned.
So you can just have one level of floats,
and you can step through it by that value,
and you get the same benefit without having two sets
of square brackets everywhere.
Cool.
Right.
So we'll be back in just one second,
but I would like to mention a few words from our sponsor.
This episode is supported by PVS Studio.
PVS Studio is a static code analyzer created to detect errors
and potential vulnerabilities in C, C++, C Sharp, and Java code.
Podcast listeners can get a one-month trial
of the analyzer with the CPP Cast 23 promo code. Besides, the PVS Studio team regularly writes
articles that are entertaining and educational at the same time on their website. For example,
they've recently published a mini book called 60 Terrible Tips for a C++ Developer. You can find
links to the book and to the trial license
in the notes for this episode.
And now we're back to Mark and Jason.
Hello again, both of you.
So I actually have a couple more Sojin questions.
So what C++ standards and compilers does Sojin support?
And do you support CMake?
17, the big three, and no.
All right. Not because I don't want to support CMake. I'm a CMake novice. I tend to choose
Maison Build as a preference, and I've never gotten around to essentially learning enough
about CMake other than learning what I need to know to fix a problem in someone else's project,
then probably forgetting it immediately afterwards.
So I would not have any objection to someone adding CMake support,
but I currently haven't done it myself.
Right.
So you're open to contributions and pull requests and things like that? Yep, absolutely.
Yep.
All right.
Amazing.
So where can we find your code?
Is it on GitHub?
It is on GitHub.
Marza slash Sogen, S-O-A-G-E-N.
All right.
We're going to post the link to that repository in the show notes.
And what license do you use?
Can people just go and use that code for their own projects?
Yep.
MIT.
MIT.
Amazing.
All right.
And do you have any kind of roadmap, what you want to do next with this library?
Is there going to be some kind of 1.0 release at some point?
There is a roadmap.
It's actually currently the only issue on the repository
is my own notes as a roadmap.
So yeah, I suppose eventually there'll be a 1.0.
So I mentioned earlier that predominantly the features you get
by using the generator are the reflection stuff
and the default arguments and names.
Apart from those, which I don't think I'll ever be able to close that gap
in the absence of C++ having actual reflection,
there are some other things currently that it does,
just some class interface stuff,
but I would like to bring the two to like at parity.
So that would be, I guess, my 1.0.
So you essentially get all of the features
you possibly could without using the generator. And it's not currently there. It's almost there.
Silly question. What language is the generator written in?
Python.
I had a suspicion it might be. I didn't actually look on the project.
So from the perspective of do you support CMake, the rest of the library, I'm assuming,
is header only, right?
Yes, correct.
Right.
So, yeah, so supporting CMake should be relatively easy to just have a custom build step that calls your Python script
that generates the thing and then have other things rely on the output of that.
So for our listeners' standpoint,
for you, CMake, it shouldn't be difficult to use your library.
All right.
So you actually have a few other libraries on your GitHub.
So you mentioned that the language that the user kind of specifies
their struct layout in, in Asogen is TOML.
I also noticed you have a library called TOML++ on your repository,
and that has 1,200 stars on GitHub.
So that's not a low number.
So it seems like it's a popular library.
What's that one about?
Yeah, it's a Tumble parsing and serializing library in C++, I guess.
Okay, so a bit of brief history as to why this library exists.
I needed to use Tumble for a personal project a few years ago.
There were two options in C++.
One was abandonware.
Author hadn't touched it in three years, and it didn't support the current version of Tumul.
And the other one was much more actively maintained, but it didn't really suit the programming
model that I wanted to use with it.
It was still being developed at the time, too, so I was still missing a few features.
And I thought, well, okay, how hard could this be? Which is of course,
famous last words, because writing a parser for if you've never done that before, it was like,
oh, okay, actually this is kind of complex. And then, oh, okay, now I've published an open source
library that's hilariously gotten popular that I wasn't expecting that. Now I've got to maintain
it. Oh, crap. So how's that going?
Oh, it's going okay.
Admittedly, my enthusiasm for the project is a bit lower than what it was
now that it's relatively mature, but I come back to it occasionally
and, you know, tinker.
It's on the back burner in terms of, like, new features and stuff,
but, yeah, it's still maintained in that I fix bugs.
I've been pondering an episode of c++ weekly
titled something like um how to responsibly abandon your open source project yeah i can
relate to that yeah right are there any other projects that you're working on that you want
to share with us no i had intended to i I was building a ray tracer to make use of
Sojin as part of my, hey,
here's this thing, and I'm using a ray tracer.
But I realized that I could either
do one or the other and not both.
I might release that at some point.
But no, nothing that's maybe
worthy of discussion on the podcast.
Alright.
You also recently attended your first
C++ committee meeting in Varna.
That was in June, if I remember correctly. You were both there.
And so you joined the Finnish standardization body.
So you're now an official member of the committee as far as I know.
So what was that like going to your first committee meeting?
Are you kind of interested in the progress of standardization? Do you want to do this more?
Yes.
Okay.
So what was it like?
It was good.
I went in with about a million questions,
and I came out with not really any questions anymore.
It was a very good learning experience.
I don't feel like I went there with any intention of making waves.
I just wanted to learn how everything worked.
How the sausage is made, as they say.
Exactly, how the sausage is made, precisely.
So I feel like I got a good overview about that.
All of the misconceptions I had were dispelled
and all of the questions I had were answered, so that was good.
And would I like to participate further?
Yeah, I've got an idea for a relatively simple change,
I think, that I'm exploring.
I know, I know, famous last words. I've got two ideas. One is, I think, relatively simple,
and the other is not. And it'd be interesting to find if that guess holds up if I were to
write them both up. So it might turn out to be the other way around.
Are you going to participate in the reflection work? Because it sounds like you should.
Yeah, but if it's in a situation where it sort of needs people
to pick it up and take the lead on it i don't want to put my hand up for that i participate
in discussions but i don't necessarily want to be the uh the driving force behind them we'll say
i see yeah no i think i think i can understand why it's taken so long and and how challenging
it actually is because when you really drill down into not only what reflection is,
but how do you express it programmatically, syntactically?
This is like a computer science.
This is a hardcore computer science thing, and I'm very much not that.
So I'll participate in discussions,
but I don't necessarily think I want to take any sort of lead
in designing that sort of thing.
Yeah, I think on top of designing,
you also have to implement it in a compiler.
So you kind of have to be a compiler engineer
or have a compiler engineer working with you as well.
And so, yeah, I think it's an enormous amount of work
to make progress on this.
And yeah, I do get that, you know,
it's very time-consuming, expensive to do this work.
It needs experienced people.
I'm very sorry that it's kind of stalled.
I hope to see progress there.
And I think one thing that way you could, I think, contribute very well is to just provide
real world experience of like use cases.
Like this is what we actually need reflection for.
These are like actual things that pop up in my day to day work.
Can this or that syntax or proposal actually do that for us? if it doesn't you know maybe you're missing something here right
he seems skeptical i i i you know having only recently started being involved in the whole
proceedings i haven't really got a good sense for you you know, how much time it would actually consume if I were to be
actively involved on particular proposals or whatever. So I'm hesitant to fully dive
deep into anything just yet. Fair enough. Fair enough. I think I spent like several
committee meetings just being a tourist before I kind of wrote my first little paper.
Yep. And then I kind of got sucked in somehow because the little paper turned out to be way more
complicated than I thought.
But yeah, I don't know.
It can go either way, right?
Well, as long as we're talking about it, when is the next standards meeting in case any
of the listeners are interested in trying to attend themselves?
So the next standards meeting is actually in November in Kona, Hawaii,
in the US, taking place from the 6th to the 11th of November. So yeah, we had the meeting there
last year also in November, and we're going to be in Kona again. I'm actually this time not going to
be in Kona in person myself. This is going to be the first committee meeting that I'm going to miss since I joined. I'm going to
attend it virtually.
So you can dial in?
Yes. So since COVID,
everything's hybrid and you can
dial in. The only thing you have to deal with is
a pretty brutal 12-hour
time difference between where I live and
where Kona is.
But if you find a way to deal with that, then yes,
if you're a member of the committee,
you can dial in and participate in discussions. So I think that's something that is a lot better
than what it used to be before COVID. Because before COVID, it was like, you can't afford to
go there. Then basically, you're out, which is not a very inclusive way of standardizing a language.
So I'm very, very happy that we improved on that one that's
cool all right so then i think uh we're nearing the end of our episode here um so we should
probably start wrapping up but um yeah um mark is there anything else you want to tell us before we
do that is there any way people can reach you if they want to contribute to Sojourn or just get in touch, ask your questions,
talk to you about reflection or whatever else?
Yeah, probably the easiest starting point
is just GitHub, GitHub forward slash Marza.
My repositories all have contact information for me on them.
So you can use that as a jumping off point
and go from there.
I'm pretty active on Twitter
or the artist formerly known as Twitter.
And, you know, Discord and various things.
So I'm reachable.
Just GitHub's a starting point and go from there.
All right.
Well, then thank you so much for being our guest today, Mark.
It was a great discussion.
Thank you so much.
And thank you, Jason, for being my co-host today.
It was an honor and it was a lot of fun to have you back on the show.
And I hope this is not going to be the last time
and we're going to have you back at some point
in the future again.
Absolutely, Timur, let me know.
Thanks so much for listening in
as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing
the stuff you're interested in.
Or if you have a suggestion for a guest or a topic,
we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate it if you can follow CppCast on Twitter or Mastodon.
You can also follow me and Phil individually on Twitter or Mastodon.
All those links, as well as the show notes, can be found on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.