CppCast - BrontoSource and Swiss Tables
Episode Date: July 3, 2025Matt Kulukundis joins Timur and Phil. Matt talks to us about BrontoSource, his start-up focused on refactoring, updating or migrating large codebases, as well as his work on Swiss Tables. News... Herb Sutter's WG21, Bulgaria, trip report End of active development on jemalloc "Amortized O(1) complexity" - Andreas Weiss' lightning talk Reddit discussion of filter view issue Links Acronyms on cppreference/com Arthur O'Dwyer's acronym glossary Matt’s Swiss Tables talk at CppCon Example of BrontoSource integration in Compiler Explorer
Transcript
Discussion (0)
Episode 401 of Cppcast, recorded 27th of June 2025.
In this episode, we talk about the latest news regarding C++26,
the retirement of Gmalloc,
and the algorithmic complexity of FilterView.
Then we are joined by Matt Pouloukandis.
Matt talks to us about BrontoSource,
an AI-powered tool to modernize large C++ code bases.
C++ code bases.
Welcome to episode 401 of CppCaster, first podcast for C++ developers by C++ developers. I'm your host, Timo Dummler, joined by my co-host Phil Nash.
Phil, how are you doing today?
I'm good, Timo. A little bit tired, but how are you?
Yeah, I'm also quite tired.
I'm good, Timo. A little bit tired, but how are you? Yeah, I'm also quite tired. For context, I just came out of a very intense one week long
CUSS committee meeting. We're going to talk about that later. Going straight into a conference,
which was very good, but also very intense, where I was also keynoting, then had a bunch of meetings afterwards,
then got onto a plane back to Finland, arrived very late last night, and then had something like five hours of sleep and just got up.
So I think I'm okay. I think, Phil, you had even less sleep than I did.
Yeah, I did. And I am still at that conference that you mentioned that I happen to be running as well. So it's about 20 past five in the morning for me
here, which is not quite as bad as we had it for for MetcodPolt in the last episode, but I'm feeling
it a bit. So hopefully we'll make it through this episode and that's all I can say. Okay, so at the
top of every episode we'd like to read a piece of feedback. This time you received an email from Raga Vendra.
I have been following CWVcast for quite a while now.
The show is going amazing.
Would love to know how to navigate the jargon in the C++ space.
Better understand the concepts being talked about.
Cheers.
Hmm.
Interesting.
So are we too jargony?
I mean, I think if you think about like CRTP, CTAD, there is a lot of jargon in C++.
Yeah.
Right. So I remember somebody somewhere had a list of all of those things. I don't know, it wasn't
somebody's blog. There's a list of all the acronyms, at least. Maybe that would help.
Yeah. If you're ever in a live context, I actively recommend people, if you're
ever in a live context, just ask.
If, if you're stuck on something, odds are someone else in the room is stuck
on it too, and is also not asking.
Okay.
So I just looked this up.
There is a long list of acronyms on cppreference.com. Actually, that's the first thing that pops up if you Google C++
acronyms. And then there is an even longer list with very detailed
explanations on Arthur Dwyer's blog. And that blog post is called a
C++ acronym glossary. And that's the one that I remembered somebody
did this. So it was Arthur. Yeah. So that is a very detailed
explanation of every single one.
So that's just the acronyms.
I don't know.
Maybe there's other jargon that we're using, which is not acronyms, but also confusing.
So, um, yeah, apologies for that.
And thank you so much for the feedback.
I think the worst ones are where we reuse a word that has a general meaning and then we
have a very specific meaning and you have to know which context you're in to know which
one we're talking about.
Yeah. To grasp the concept, if you will. There you go. Like, can you think of something?
As Matt just said, just the word concept. Very easy to be talking about the general concepts or...
Oh, just the word concept. Yeah, absolutely. Yeah. Yeah, it's funny. When I talk about C++ stuff,
I catch myself wanting to say concept if I don't mean concept, the feature.
And then I say something like idea or notion instead. But it's just extra seconds to think about and come up with a different word.
Just adds an extra constraint.
Well, thank you so much for the feedback, Raghavendra. We'd like to hear your thoughts about the show. If you're listening to this, you can always email us at feedback at cppcast.com.
So joining us today is Matt Koulokandis.
Matt is the CEO and co-founder of BrontoSource, a startup that builds tools to modernize legacy
code bases at scale with a focus on a C and C++ space.
Prior to that, Matt spent 11 years at Google, where he led the software ecosystem organization
as a principal engineer.
During that time, he designed language
and library features for migration,
as well as directly planning and executing
multiple migrations across Google's entire code base.
Russ's SturtCollections hash map and Go's map
are based directly on his Swiss table work.
When he isn't trying to figure out
how to rewrite all of the world's code,
he scuba dives every chance he gets. Matt, welcome to the show.
Hey, it's great to be here.
I have a long time listener, first time caller, if you will.
So, yeah, it's great that you join us.
Actually, last time we had, no, the episode before last time,
we had Kristen on the show and she made the
connection. She said, Matt would be a great guest. And I quickly Googled you and I thought, yes,
Matt would be a great guest. And then Phil agreed. And so here we are. So thank you again for joining
us.
My pleasure. Kristin is an amazing networker and also an incredible engineer. I worked with her for
a number of years at Google.
and also an incredible engineer. I worked with her for a number of years at Google.
You talk about C++, Rust, Go, but scuba diving as well.
So I think we're missing deep sea from the list.
Yes.
Most of my scuba diving is relatively shallow.
I haven't advanced up in water, but generally speaking,
I only go around sort of 15 to 22 meters depth.
Right. I think most of my C++ is quite shallow as well. Generally speaking, I only go around sort of 15 to 22 meters depth.
Right.
Yeah. I think most of my C++ is quite shallow as well.
Okay.
Um, so we'll get, uh, back into talking about Matt's work in just a few minutes.
So Matt, hang in there.
Um, because we first have a couple of news articles to talk about, but feel
free to comment on any of these as well.
Um, so we got three news items for today.
The first one is the really big news.
So as I said earlier, we had a committee meeting,
which ended a week ago as we record this.
And we finished a C++ 26 committee draft,
which means that C++ 26 is now complete-ish,
except that now there's
two more meetings where we can address kind of wording bugs,
essentially, or other kinds of bugs that people report.
There's a process for this called the ballot.
But we are not any more procedurally going
to do any more design changes.
So this is now just about two more meetings to have like fit to iron out
the last little wrinkles. So that means C626 is design
complete. And the big news there is that we managed to put
reflection in at the last minute, there was a bit of a
scramble to get the wedding done. But it did happen. We
voted reflection in a week ago.
So, um, so your statistics will have reflection.
So that's pretty, pretty big news.
Yeah.
Huge congratulations to everyone who worked on that.
There were a bunch of people who put in a lot of hours there.
I'm sure.
And I'm sure we'll be hearing about it in all of our final questions.
Yes.
So, so there is already one trip report that I saw online from Herb.
There might be more coming out between now and when this episode is released,
but we're going to put definitely Herb's one in show notes.
We are not going to talk about what else happened at Sofia too much now,
because we're actually planning to do a special episode about that next time.
We have a very exciting guest planned for this, so
stay tuned for that. Yeah, maybe one more thing on that, which is that this is an awkward time
during the standardization process where for about two or three meetings we'll be saying,
and now it really is complete, because there's just different stages of completeness that we go
through. So it's always a little bit awkward until it's finally out there.
In fact, we only just got C++23 actually out at the end of last year.
Yeah.
So it's going to be two more milestones.
One is going to be two meetings from now, which is now, I think, officially
announced to be in March in London, right next year.
Phil, you might know something about that.
Yes, I know a little bit about that.
In fact, much more than I would like, because I'm actually hosting it for JGX, the company that I run, is hosting it next year.
Okay.
So that is going to be the meeting when we officially, when C++36 is going
to be officially done, done. So that's when this ballot where we iron out any issues that people report is also done.
And we're going to have a finished document.
And then there's going to be another year plus until ISO actually approves it and it becomes a new standard.
So there's still a little bit of time to go, but we already now know what the feature set is going to be.
So it is very, I think, in a very exciting stage
of that process right now.
So we have one more news item, which
is a little bit of a sad one.
So the active development on the gmalloc memory allocator
has come to an end.
There is a blog post by the main author and maintainer
explaining like why and like describing the whole history
of the project.
And yeah, I think that's kind of interesting,
both in terms of like, you know, what it is and how it happened,
how it was developed, how it was used, and also that now they reached a stage where they're like, oh yeah, we can't really keep going with this
anymore. So I believe TCMalloc is like the Google allocator and gmalloc is kind of the meta one,
right? Yeah. I actually think phase three meta of this blog post is really interesting because it
talks very heavily about sort of the politics of large organizations and how that influences
these projects like this.
And sort of when Facebook shifted to meta, the way that they rewarded individual contributors ended up being the
death knell for J.E. Malik just took time to play out. And so if you're a high placed
engineer, like really think about second order effects to how you incentivize the people
under you.
Yeah, that's interesting, because it's a very popular project, right? I think it's used in
a bunch of places. It's kind of very efficient, very fast. There's trade offs between project. I think it's used in a bunch of places. It's very efficient, very fast.
There's trade-offs between JMalloc and TCMalloc,
so you might choose one or the other.
But I think they're equally like marvels
of SuperStock engineering.
So yes, it's very interesting and slightly sad
to see that this is not going to be continuing as an actively
maintained project.
Thanks for the memories.
And I guess we're going to talk about allocators a little bit
more with Matt further down the line.
But before we get there, I have one more news item,
which is a lightning talk by Andreas Weiss from last year's
CPPCON, which has been released on YouTube.
And it's got a lot of views.
And it's a really, really interesting one.
So it's just five minutes, like all the lightning talks. But it's a really, really interesting one. So it's just five minutes, like all the
lightning talks, but it's quite fascinating, quite densely packed with information. There's a long
Reddit discussion about it as well. So it starts off by Andreas Weiss explaining what amortized
complexity actually means. So we got like big O like OO1 or N or log N and sometimes the standard
or some other description
of an algorithm size, this is amortized N complexity,
for example, right?
Or amortized O1 complexity.
So amortized constant complexity.
And so he managed to figure out a way to really, really nicely
explain what that actually means without too much math
in a very intuitive way.
So I thought that was great.
And then you went on to show that for a C++ range where the standard says that begin is amortized 01, that is actually unimplementable. So that's actually not true for filter views specifically,
which is a very popular view in the ranges library. Because what it actually does and how it
actually works and how you would implement it, even the most efficient case just kind of isn't compatible with any reasonable definition of amortized complexity.
So, and the kind of consequence of that is that it turns out we can't actually trust the C++ standard when it talks about algorithmic complexity guarantees.
So I think that's something that we should keep in mind.
Yeah, it's interesting. Hashtables often follow this iteration over hashtables is often of capacity instead of of size and depending on how you implement it. Right. But that it's actually
amortized, right? So I think the standard specifies that the hash table iteration should be O size.
I should double check that.
Let's see.
It's actually next on an iterator in a hash table is isomorphic to
a complexity of iteration.
So when you're talking about something like st ordered, stood unordered map operator bracket, it says
average case constant worst case linear.
So not so it's going to be iterators plus plus on stood
unordered stood unordered map iterator.
All right, so just one like find the next item.
Yeah, that can be of capacity, depending on implementation.
All right.
So I'd have to dig a little bit deeper to find that.
So let's do that offline.
But yeah, you're probably right.
That is probably exactly what it says.
And you've done a lot of work on HashMaps,
so I think you know what you're talking about.
You're going to talk about that in a minute as well. But before we get there, actually first, Matt, again,
thanks for joining us at this late hour for you. It's kind of
a bit of a weird one. It's 7am for me, which is just a lot. So
okay, it's 5am for Phil, which is pretty rough. And then it's
like past midnight for you now, right? So thank you very much for getting up, not getting up, but like staying up late, just for us. Really, really appreciate that. So what we want to talk to you about first is the main thing that you're working on right now. So that's called Bronto Source. Do you want to talk to us about what it is and what is it for and why it's awesome? Absolutely. So Brought to Source is a startup that I co-founded with my co-founder Andy,
building out tools for large-scale code migrations and
code updating with a real focus on C++.
And this comes directly out of the work that we did at Google for a number
of years. Actually, Kristin also did a lot of this work. And it's how do you build out a toolkit
of sort of easy things to allow you to make sweeping changes to sort of over millions of lines of C++
with a very high correctness, right? You want it to be so correct that it can commit code,
basically compiles, passes tests, submit, the end.
And that's a very high bar of correctness
that you have to aim for.
And so the architecture of a system like that
is fun and interesting.
And so the catch here is, though, that, or not the catch, but the special
sauce here is that that's actually done with the help of AI.
Is that right?
So how does that work?
What does that mean?
It's actually a bit of a mixture.
So we use a lot of the sort of traditional static analysis techniques.
Like we do build on top of playing and we have the AST and we manipulate it.
But whenever you're doing an analysis on code,
you always run like face first into the halting problem.
You're like, okay, can this pointer be null?
And like, without fail, your answer is gonna be like,
yes, no, and the vast majority of the time will be like,
I cannot prove that this is or is not null, right?
That's what the halting problem gives you
for all static analysis is the most common answer is, I don't know, I can't prove it in either direction. And
so what you can do with AI though, is you can get it to like extract a little bit more
information. You can read the comments, you can look at variable names and function names.
So you can augment the traditional static analyses with additional information gleaned from the text.
Then we go back to traditional static code generation techniques that are
correct by construction.
That is so cool.
So that means that the AI is going to give you just a little bit more data kind of that
you use as a heuristic basically?
Yeah, very much so.
kind of that you use as a heuristic, basically? Yeah, very much so. One of the early products, one of the other products we're sort of looking at doing is
automatic conversion of C and C++ code to Rust. And so if we're talking strictly about C code for a second,
in C you've got a struct and then you have a function whose first argument is a pointer to the struct. When you convert that to Rust, obviously you can convert it to a function that takes a pointer or a reference to a struct.
But the idiomatic conversion would be to put it in an input block so that it's a method.
And you can actually use AI a little bit to ask it, like, hey, which of these is more idiomatic?
Right.
But, okay, there's so many questions there.
First of all, how would you even convert CoC or SOS to Rust, given that Rust
doesn't have pointers and references and you express everything with other things
and you don't have, you know, random access and into an array and you don't have,
uh, just, just, you can't just pass references around. You just have to rewrite everything that would run into the borrower checker and fail. Is that something that you do or
do you just do a translation into unsafe rust basically? So we try to go to enigmatic rust.
There is always a fallback of unsafe rust. But right, it's not like,
yeah, okay, so rust doesn't call them references, it calls them, you know, borrows or things like it.
But in fact, most C++ code actually will, if sort of translated to idiomatic rust graphs will pass the borrow checker. Like aliasing for const references is surprisingly, it's uncommon.
You know, it's not non-existent, but it's uncommon.
You usually can make a more idiomatic conversion than the pure semantically
correct, like we're going to throw unsafe on everything and do everything that way.
Huh. That is interesting on many levels, because one thing that people, including myself, correct, like we're going to throw unsafe on everything and do everything that way.
That is interesting on many levels because one thing that people, including myself, kept saying when we were talking about things like safety and getting rid of UB and C++ is that
if you were to introduce something like the borrower checker into C++,
you would have to rewrite all of your code because none of it would work.
And you're saying, well, actually, there's a lot of code that technically has references and
pointers, but, you know, it doesn't actually have like multiple, you know, mutable references to
the same thing. And you can reason about it not having that locally. So you can actually make a
transformation to save Rust or something like it. This is very new information for me.
This kind of changes the picture actually for me.
So that's really interesting.
Yeah.
And it's important to realize that you're always playing this numbers game of like,
how much of the code can you lift to save idiomatic Rust?
And how much do you have to fall back to the unsafe versions of it
when you do these conversions?
All right. So when people say that, oh, we don't really want to do C++ anymore. fall back to the unsafe versions of it when you do these conversions.
All right. So when people say that, oh, we don't really want to do C++ anymore, it's
all unsafe, you want to import everything to Rust, then you have a tool
with which they can do that at scale.
That's the hope.
Right now that tool, like our C++ refactoring tool is available.
You can find it on our website and look at the docs.
I can't wait for Andy's C Plus Now talk about it to come out. I'm very excited for that. But the C2Rust converter
is much more alpha-level software, at least at this point in time, June 2025.
All right. So that's actually a good point. So what stage is the C++
refactoring tool at? Is it better? Is it actually a product that you can now buy and use? What stage
is that at? It is a product. You could reach out to us and buy it and use it. It's available on
Godbolt to play around with. Oh, that is so cool. Yeah. If you go to our docs, all of the examples
in the docs and with a like, hey, see it in action on compiler explorer.
And so you can use it on Godbolt, play around with it and get a sense for how well it works.
I will own very openly.
It's early days.
There are bugs.
There are, you know, features yet to be implemented, but we're working at it and making progress.
That is very cool.
I did look at your webpage, but I missed that there were Godbolt links there.
So big thanks to you for letting me know that we will put a link to the
website in the show notes so other people can check it out too.
That is very cool.
So one thing that I found when I was doing somewhat similar stuff at JetBrains
where obviously the C-Line IDE and all of their
other IDEs for other languages have lots of refactoring tools, which is really cool.
Like for example, for me, that was the killer feature of C-Line that I can just go on an
identifier, click whatever hotkey that is and say rename, and it's not going to just
rename it in a file, it's actually going to rename it properly.
And we'll know that if the same identifier appears in a different scope, it's probably a different actually going to rename it properly. And we'll know that it's the same identifier appears in a different scope.
It's probably a different variable and not rename that one.
And so it would actually do that kind of very intelligently.
So that for me was a killer feature.
But I think whenever I tried to use it at like the code bases that were like larger
than let's say a million files, a million lines or something, it would get slow.
And I think nowadays there are improving it, but I think the difference, your product is
that you're explicitly targeting massive code base, right? So what's the difference and challenge
there? And can you talk a little bit about that? Absolutely. So the way we think about it is,
for the in IDE kind of experience that you're talking about, you want the entire
change to be done and you want to submit sort of a whole logical change that contains everything.
And for a very large code base, you really can't. You know, relatively few code bases
can tolerate submitting 10,000 or 100,000 files in a single change. It just gets rough.
CI systems break down. You have
too many races with developers. And so the idea that we have is you submit into your
code base a declaration of intent. You sort of say, hey, here's a pattern for old code
and here's the pattern for new code that I want it to look like. And then the system
goes in the background, making those changes, breaking them up incrementally, submitting them bit by bit.
And this is, there are a lot of things like Open Rewrite or TreeSitter that give you the ability to do stuff like this in different languages.
TreeSitter kind of claims to support C++, but doesn't really, the C++ grammar is not great for that sort of thing.
And you can build custom tools in Clang doing it, but you really have to
deeply understand the internals, the Clang AST to build those tools.
And so what we have allows you to write actual C++ snippets that are,
here's my code before and here's my code after.
And then we do the sort of translation of that C++ snippet to the right thing to search
the AST and find the right nodes and then make the transformation.
So we sort of think about it as a declarative intent where you just submit these declarative
intents and the system will start to move your code base in the background for you.
And so how do you express that intent? Is that in English? Or is there some kind of
declarative special language in which you do that? It's actually C++ code where you say like,
okay, I have a struct, it inherits a struct or class, it inherits from just the tag that says,
I'm a rewrite. And then you have a function that's annotated before
and a function that's annotated after. And whatever it looks for patterns in your
code base that are like the body of the before, and it changes it to patterns in
your code base to be the like the after. And this is on Godbolt. You can play
with it. It's pretty fun.
So yeah, no, I encountered this quite a few times, I think, where there were
refactorings that were just not trivial.
Like, I don't know, some vendor, you know, updated their audio API or whatever
other API and said, you know, this function call that you've been using for
the last 10 years is no longer safe.
So now we have a different one, but it takes like this other parameter where
you now have to specify, you know, do you want to do like, I don't know, the safe mode or the unsafe mode or something.
And so then you have to just do a little bit more work. Typically, like the code bases where I was working on, like they weren't that big. So you could get away with doing it manually.
weren't that big, so you could get away with doing it manually.
Yeah, absolutely.
And if you can get away with doing these things manually, your life is so much easier, you should like, there's a question of scale where you're like,
what's your code base is, you know, million, 10 million lines of code.
The manual kind of breaks down.
So that's kind of what you're targeting, right?
Code bases that are sufficiently large like this.
Yeah.
That's really interesting.
And so that goes back to stuff you did at Google.
So can you tell us a little bit more about what you were doing
there and how that played into kind of where you are now
and how that transition process happened that you decided,
oh, I'm going to now do my own startup and do this differently or something. I'm just curious, how did that happen?
So I was at Google for 11 years. And of that, I think nine of those years, I was on C++
core libraries, which was the team that owns the sort of libraries like app sale and the
internal libraries to Google. And we would do these migrations,
but every time we did them, it was a very bespoke tool.
So we would say like, okay,
we want to build a new error handling library
for use everywhere across Google.
And so we're gonna look at the existing set
of error handling libraries and build tools
to migrate everyone onto this sort of central standard one.
And we would do each migration like this
would be a bespoke singular tool.
And after like eight years of this,
we sort of started to see the pattern of like,
actually if we could lower the cost of this
and have something where you don't have to learn
clients internals,
because teams would come to us all the time
and they would say,
hey, I see the changes you guys are making and I really want to do something like that on my code base,
on my like corner of Google's code base. Can I do that?
We would say like, oh yeah, you can build your own tool over here using Clang AST matchers and all this.
And they would say, thanks. And we would never hear from them again. Because like, right. And eventually
we were like, maybe if we gave them something that really lowered the bar here, that made
it so that they didn't have to learn all of Klang's internals in order to do refactors.
And so we started down this path.
And about a year ago, right, I founded Browse Source in September, and sort of the year
before that, I was working with figuring out how AI would fit into our larger code migration
strategy across Google.
And honestly, I think most people in the AI space are missing it.
There's a lot of focus on how do I put AI in the IDE and how do I have it be like autocomplete on steroids?
And no one is looking at how do I actually get it to have correctness at 99.9% correct.
And so I had this insight of like, oh, what if I use it to read the parts that my static
analysis can't do, like comments and variable names, but still use the traditional techniques
that give you that correctness bar?
And I got really excited about this idea and decided that I wanted to bring it outside
of Google because I don't know, I like rewriting code.
I like seeing things change.
So where does the name come in? Because Brontosaurus sounds like Brontosaurus. Presumably that is the intention. But is that because you're taking old dinosaur code and
updating it? Or is it because you're dealing with huge dinosaur sized code bases? Or is it
the combination of the two? It's the combination of the two and the fact that the domain name was available.
It is when we were when we were founding the company, we had a bunch of time trying to figure
out what to name it. And like the first name I came up with was Verdigris. And Andy was like,
I hate that. I don't know how to spell it. I don't know what it means. Like this is a terrible day.
Um, so we took a step back and we're like, what are the rules that we want for a good
name?
What should a name for a company be?
We said the dot com and dot dev have to both be available.
That way, if you get the wrong URL, it will go to the right place.
If you hear it pronounced, you should be able to guess how it is spelled. And if, uh, it should be three syllables or fewer, right? Cause we
want it to be kind of short.
And then we found out very quickly that dotcom and dotdev have to be available
are deeply restricting set of things. And our fourth sort of soft rule was that we wanted it to be clever and a little playful
because I find it's really important to have a sense of play and whimsy.
Right.
So I can't help but notice that there is a tendency for dev tools like this to have like
cute colorful animals as their kind of mascot. So, you know, PBS studio has a bright blue unicorn
and you have a bright green brontosaurus, which also both are not actually extant animals,
which is also fun. Yeah. So a good friend of ours named Dan Zaloggi is a professional
Dan Zaloggi is a professional graphic artist and we hired him to design our logo. That is Charlotte Bronto, if you're curious. Also, it's not AI generated, the Brontosaurus.
It is not AI that I generated. Dan Zaloggi did great work, sort of working with us to figure out
what color palette, sort of how the vibe was. Cause we told them like we wanted something sort of friendly like the go gopher.
Right.
That approachable kind of a mascot.
I think you nailed it.
Um, Dan is amazing.
Yeah.
I'll put a plug out there.
Like if you're working in the video game space and you want a graphic designer,
reach out for him.
Also his, his professional is like, he's best known for a series called creepy working in the video game space and you want a graphic designer, reach out for him. Also,
he's best known for a series called Creepy Pokemon, which is hilarious.
Okay. All right. So yeah, this is very exciting. I really hope you're going to succeed with this startup. You seem to have some quite unique and amazing tech there, which is going to be hopefully very valuable for anybody with a big code base
that needs to do stuff with it, basically.
So I think that's a pretty big market, no?
Yeah, I think so.
I'm hopeful.
If you are anyone interested in trying it out, playing, like feel free to go to our
website, play with it on Godbolt or just shoot me an email, matt at bronzesource.dev.
I think, I think I want to drill down into one last thing.
Like how would I use this in practice?
Is it a plugin for my IDE?
Is it like a cloud-based thing?
Does it hook into the kind of CI?
Like what level does it attach to?
Like how do I actually interact with it?
So it would hook into your CI, right?
GitHub actions sort of into your CI,
because that's how it will decide
when it reads from your code base,
it starts to make the changes,
it wants to test them, right?
It needs to verify the correctness of the changes
by running them through your CI and then sending them.
And based on configuration,
it would be like you could send it to say like, okay,
I want to send it to specific reviewers or I want it to just submit on green, things
like that. It could also, it can, the plan is, and now we're firmly in the world of like
theory and where we're going, not what we have, is to run it as part of like a code
review time as well. So you can give it rules for how to like look at code at code review time.
So it can suggest edits in that context.
So would that be basically competing with something like, you know,
other AI powered tools that are out there?
It would compete with all the sort of automated code review and linters and things like that.
You could think of it a bit like Clippy, like Rust has Clippy.
There are client ID checks.
So it does compete with that.
I used to work at Sonar, as many people know, static analysis tools, but also
one of the main products will do a sort of an automated code review and quality
gate on CI.
And I haven't been there for a while now, but I understand that they are
introducing AI into some of that process as well.
And I'm wondering if we're getting to a world where we'll have AI agents modifying
the code, submitting a PR and then a different AI agent reviewing it, and then
deciding whether to allow it to merge into the codebase or not. So in some senses it depends on your definitions of AI. Like at Google we
already had it to the point where systems would generate large volumes of code and secondary
systems would review them and if the secondary system was okay with it we would submit it
directly and like sometimes those secondary systems were just like a pile of regexes that was doing
a sanity check on the tool.
And so you can already be in that space.
I think the real question, organizations need to take a very SRE mindset of this, of if
you have automated systems operating at scales like this, and
there is an issue, something slips by, you actually have to do the postmortem and understand
what could we have done that would have stopped this.
And like the answer shouldn't be like, oh, just don't run systems like that.
No, the answer needs to be how do you ensure correctness at the levels you need?
Right? Much the same way, if I, as a developer, write a
book that takes down prod, we should say, not like Matt, you're a bad developer, but how did our
systems fail to allow prod to come down from a simple thought? Yeah, that's a good way of thinking
about it, actually. All right, Matt, so that sounds really cool. It sounds like you have some really interesting technology going on there. And I think there's a wider question of like, how will AI fit into the whole kind of ecosystem in the long term kind of discussion or question, I think that's going to be that remains to be seen, it's going to play out in some way or another, like, everywhere I look these days, I mean, I'm not working on AI myself directly, but, you know, I go to conferences, I go to committee meetings, I talk to people, and
I get this vibe that we are still just in the like, in the baby steps stage, like people are just
kind of scratching the surface and trying to discover how to use these tools effectively, or,
like how to use it beyond like the really obvious things that are out there, right?
Yeah, I think we're also going to go into a very interesting space in the world of like
Hiram's law with AI, where people are going to build a system and they're going to deploy
it and it's going to be working.
And then Anthropic is going to upgrade from like Cloud Sonnet 3.7 to 3.8.
And that deployed system is suddenly going to break.
And then you're going to say like, no, no, leave the old one up.
They'll say like, okay, you're paying us enough money that we'll leave 3.7 up for you for
a while.
And then they'll say like, actually, it's been three years, you really need to upgrade
to 3.8.
And there's going to be these tension as a lot of the
players in the AI space relearn all of the difficult lessons of SREs and versioning APIs over time.
Right. So I'm going to now very deliberately change topic here, because I noticed that there's a
trend where when we get into the kind of depth of any discussion on the show lately, it always
degenerates into a discussion about either safety
or AI pretty much every time.
One of those two, which is kind of fun.
But there's just also other stuff to talk about.
So I just wanna use the remaining time
to cover other things,
but it's obviously a very fascinating discussion.
So thanks again for talking about BrontoSource,
very exciting.
Apart from that, and apart from
your work at Google on refactoring stuff, you also did a lot of very impactful work on
hash tables, right? So you, I think, came up with a particular type of hash table, which is now the
standard in two other programming languages. Is that right? Can you talk about that a little bit?
Yeah, I want to actually, like, I did not come up with. I was part of a group of people. So
Alka Sevlo-Humanos did most of the implementation. Sanjay and Jeff Dean had a couple of really key
insights. What I did was a lot of the polyteching. I made, I collected and gathered data, and I convinced
all of the stakeholders across Google that we should actually move all of our hash tables
to this. And then I did the work actually moving the hash tables. But I don't want to
take credit for what is really a brilliant algorithm that it was a mixture of all this
and Jeff Dean and Sanjay's work. Okay.
So not only is Rusts and Go's default hash map based on that, but also
internally at Google, this is the hash map they use.
Yeah.
It's open source.
It's Abseil's flat hash map.
Oh, okay.
I actually know about that one.
So what's so cool about this algorithm that's compared to like stuff that
people were using before then.
So right, the standard kind of hash table that you would write in college or whatever
has, you know, you have a number of buckets, you compute your hash, you do modulus by your size,
then you have like a linked list or a vector of things that fit in that hash bucket. Or you can
do a probing hash table where you sort of advance them one at a time when you hit collisions.
And that works, but each individual probe sort of reflects a single element.
And the way that Swiss tables work is that they have a metadata array at the front that contains seven bits of hash code.
And so the hash code is split into what's called H1 and H2. H1 is what you
take the modulus of. And you say like, okay, what position am I in? And then H2 is just
those seven bits and we have them packed into, there's an eighth bit that is a presence bit.
And so you have a set of 16 one byte objects that represent very compactly 16 different entries in the hash table
with a lot of hash code bits. And then you use SSE instructions to compare all 16 entries simultaneously.
And so the analogy is in a traditional hash table, you're probing elements one at a time to figure out which one it is.
And in this, you're probing elements one at a time to figure out which one it is.
And in this you're probing them 16 at a time.
Well, that's obviously going to be more performant. So that is really cool.
Yeah, it was, it's a lot of fun.
It's really impressive.
And the other thing about it is because you're probing them 16 at a time, when
you do erases, you can often not leave tombstones behind.
The set of times you need to put in tombstones is smaller.
And so you can actually get that win as well.
Right.
And that's called a Swiss table.
Yeah.
That was the internal code name because Alkes is in Zurich and...
So it's not something clever about holes in cheese or something like that.
Nope.
It's a common guess, but it's actually just named for Zurich and like Swiss
efficiency has a good spot in the American zeitgeist.
Oh, I'm really curious actually now about this because I mean, I'm not
originally from Germany, right?
I'm originally from Russia, but I grew up in Germany, so I have this German accent.
So I get this a lot sometimes when
I'm in countries like America, where people say, Oh, you're you're German, Germans are so efficient.
And so now you're saying, so I'm curious now, who is like, what's the stereotype? Like, what nation
is considered like, the kind of the most efficient one is the Swiss, the Germans or somebody else?
Like, so the Swiss have the like association with clockmakers in
the US. Yes, yes, yes, of course. Where Germans have an
association with rule followers in the US. Okay, that's
interesting. That's interesting. Huh, okay. Well, thanks.
Sorry, that wasn't a side, but I just I just wanted to know
that's that's that's interesting. Okay, so you not only did a lot of work in, obviously, refactoring and large-scale codebase
management and hash tables, but you also did a lot of work with concurrency as well.
And I remember actually the first time I met you, that was CVPCon, I believe 2021.
It was this weird one when the lockdowns, the COVID lockdowns had just ended.
And we had the very first in-person one after that, which was really, really small.
And it was impossible to travel there from Europe because there was this weird rule where if you had been physically in Europe,
for the last two weeks, you couldn't even travel to the US.
So I had to jump through quite a few hoops to even get there.
I think I was pretty much the only European there.
And there were just like 200 other people, which was like very weird addition of CPP con.
But I remember at that additional CPP con, you gave a talk, which I obviously went to,
about building a lock fee, multi producer, multi consumer for TCMalloc, which is the other big,
famous, efficient allocator that's out there, which is the Google one. That was a really,
really cool talk because what I remember from it is that you said, okay, we have TCMalloc,
which is this really, really complex massively parallel allocator with many different stages of caching and all of that
stuff. And in the middle, there's this one mutex, which is probably the most highly contented
mutex in all of Google. Let's get rid of that and replace that by a log V data structure,
which is ambitious, to say the least. I don't quite remember how it then played out. I kind
of remember the premise, but then I think it kind of lost me halfway through because
a jet lag and things like that.
Can you talk a little bit about that work?
Because that sounds really, really interesting and also very impactful actually.
Yeah, it's a ton of fun.
I'm in the weird set of people that like, like concurrency. And my favorite memory ordering is relaxed.
Oh, mine too, mine too.
I think you should do sequential consistency.
No, never.
If you can't name your acquires and releases,
you shouldn't be in the world of AtomX, just like using mutex.
That's so funny.
I just submitted a talk to the audio developer conference
in November in the UK. So if that gets accepted, it will be there about exactly this. So I agree
with everything you just said. And I have a talk scheduled hopefully for later this year about this
topic. So it resonates with me very strongly. Yeah, I'm actually one of, I'm very proud to have
over my course of working on things at Google have discovered two bugs in T-SAN.
Wow, okay.
So I wanted to replace this particular data structure in the guts of TCMALC. And one of
the things, right, for most data structures, they're sort of commonly known best in class things, right?
Like, cool, use a vector, use a B tree.
These are like commonly known multi-producer, multi-consumer queues are kind of an open
problem in that there's no single best one.
You should always have one that is tuned to the specifics of your system.
And I saw that the component I was replacing was actually a stack in TCMALC,
but it didn't need to be a stack.
It actually just needed to be a thing
that you could put things into and take things out of,
and it didn't care about ordering.
And so I thought, okay, I'll do this
multi-producer, multi-consumer queue,
because I know how to implement one, and it's based on what's called the disruptor pattern
from, LMAX was a hedge fund, I think, out of London in the like 2000s, 27 or so, they
had this disruptor pattern that is a way to implement a multi-producer, multi-consumer
queue. And so I was like, oh, cool.
I'll build it up.
I'll base it on that.
And like the talk goes through doing it and then debugging it and rolling it out and testing
it and debugging it more.
As it turns out, by the way, if you're ever writing a concurrent data structure
and you think like, oh, I have some bug and I'm tracing down all my concurrency
things, pause for a second, just put a giant mutex on it, put the mutex
everywhere and see if your bug persists.
Because a lot of the time your bug is actually just in your indexes and your
bookkeeping and not in your concurrency.
And so like that's a very easy, like first step in debugging any of these.
Also have fuzz tests.
Fuzz tests are great for multi threaded things.
And thread fuzzing specifically.
Yeah.
Thread fuzzing specifically, right?
You want to have a test that brings up N different threads that just kind of
pound on it and you don't even need to assert exact results, right? You want to have a test that brings up n different threads that just kind of pound on it.
You don't even need to assert exact results, right? Because if you run it in TSAN,
thread sanitizer from LLVM, right? If you run it in TSAN, it will tell you when you got your inner leaves wrong. So it's great. And so then as I went through and I did all these benchmarks and
everything else, at the very end of the day, the punchline was after doing all of this work, as near as I could tell,
it was a very mild performance regression, but it was so hard to get statistical significance.
It was functionally equivalent performance.
Interesting.
Yeah.
But I actually, on the path to doing it, I refactored and cleaned up a bunch of the code,
and I added unit tests to a bunch of things in TCMALC.
And so the code base actually got better, even though the final thing that I did those refactors for didn't land.
Okay, so we talked a lot about things from the world of C++.
Is there anything else that we haven't talked about in the world of C++. Is there anything else that we haven't talked about
in the world of C++ that you find particularly interesting
or exciting that maybe we'll be hearing from you more
about in the future?
Yeah, so I was at C++ now recently
and David Sankal gave a talk on some Rust binding stuff.
And I think actually there's gonna be a lot
of interesting things coming out actually
as a side effect of reflection around finding C++ other languages that I'm actually really
excited about.
I'm also mildly terrified of reflection from a Hyrum's law sense.
For those who don't know, Hyrum's law is this idea that any change whatsoever to your source
code can break some user.
And now with reflection, you change the order of private variables.
I mean, that could already change layout, which could break users, right?
Change the name of private variable could change reflection in some way and break users.
And so the tyranny of Hiram's law is going to increase with reflection
in a way I'm kind of curious about.
So for a moment there, I thought you were going to say something other than reflection,
but you still managed to sink it in anyway.
Yep.
You know, I thought about trying to not say reflection, but it's really cool.
It is really cool.
Okay, then I think we will start to wrap up there. And I just want to take a moment to point out
Tim has been doing most of the talking for this episode, because I've been having a lot of
latency issues here, which hopefully you won't hear too much of because of the editing. But
issues here, which hopefully you won't hear too much of because of the editing. But, but that's why I haven't said so much.
So as we do reach the final stretch, is there anything else you want to tell us
or where people go to find out more about BruntaSource or any of the other things
that we've been talking about Matt?
Yeah.
Like, first of all, thank you so much for having me.
This has been a fun conversation and thank you, Phil, for fighting through the connection issues and finding a time that
worked for all of us. Being spread across the world makes scheduling interesting. And
that, that I think is the big points. Oh, you can find us on brontosaurus.dev. If we manage to achieve our naming goals based on
the name Brontosaurus, you should be able to guess how it's spelled. But we will put that in the show
notes as well anyway. I figured. All right, so that wraps up our episode 401 with Matt Kulukandis
about Brontosaurus. Thank you again, Matt, for coming on the show and we will see you all again here in two weeks.
Bye.
Bye.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff that you're interested in
or if you have a suggestion for a topic we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate it if you can follow cppcast at cppcast on X or at mastadon at
cppcast.com on mastadon and leave us a review on iTunes.
You can find all of that info and the show notes on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.