CppCast - Going Cross Platform
Episode Date: February 12, 2021Rob and Jason are joined by Sebastian Theophil from think-cell. They first talk discuss a blog post on building a 1 billion LOC project with the Threadripper 3990X and a browser extension for easily s...earching for C++ reference help. Then they talk to Sebastian about his teams efforts to port their Windows C++ codebase onto MacOS and some of the challenges they dealt with, as well as recent efforts to start porting some of the code into Web Assembly. News Threadripper 3990X: The Quest to Compile 1 Billions lines of C++ on 64 cores Looking for Approachable Open Source Projects to Contribute to C++ Search Extension v0.2 released Links think-cell: Join us as a C++ developer Windows, macOS and the Web: Lessons from cross-platform development at think-cell tcjs library for generating type-safe JavaScript bindings for C++/Emscripten Sponsors Visual Assist
Transcript
Discussion (0)
Episode 286 of CppCast with guest Sebastian Teofil recorded February 10th, 2021.
This episode of CppCast is sponsored by Visual Assist, the well-known productivity extensions for Visual Studio.
Visual Assist speeds up development with features like smart navigation, code inspection and suggestions, powerful refactoring commands, and a whole lot more.
Even spell checking and comments. Start your free trial at wholetomato.com.
In this episode, we discuss C++ search extension
and building a billion lines of code.
Then we talk to Sebastian Teofil from ThinkCell.
Sebastian talks to us about porting a Windows code base
to Mac OS and then the web. Welcome to episode 286 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm all right, Rob. How are you doing?
Doing okay.
Things are starting to get a little crazy around my house.
We are planning on moving again sometime in the next few months.
So we'll see the backdrop behind you change.
You might still have cat castles whatever those are called but
i mean that might disappear sometime soon because uh we're gonna have to show and sell this house
uh it's not gonna be a very long move like you know five years ago we came from five or six years
ago we came from new jersey down north carolina this time we're moving like 10 minutes away to a
different house in the same area but but everything's all picked out and you know what your plans are.
Yeah, making all those plans, getting ready, and starting to pack stuff up.
Exciting.
Very exciting.
Any news from you?
No.
Okay.
Quiet time of year.
Yeah.
Okay, well, at the top of every episode, I'd like to read a piece of feedback.
This week, I got an email from Eric writing, Hi, guys. I'm really enjoying your podcast. I started listening about a month ago. I always learn something I did not know. Anyways, it might be an interesting episode to talk with someone from Rocky Linux. Thanks from Eric. So I'm not up on all the new Linux distros. Have you heard of Rocky? I looked it up simply because of this comment here.
It is apparently a fork of Red Hat Enterprise Linux,
although they don't actually say that on their own description because they don't like the direction
that Red Hat Enterprise Linux is moving.
I don't know the details there.
But they want to be bug-for-bug compatible
with those last versions of RHEL.
So an interesting topic,
although I'm not sure what it would have to do with C++.
Right, because I'm assuming it's all going to be written in C.
Well, I mean, it's just a Linux distro, right?
It's going to be mostly scripting tools, right?
As far as the distro actually goes.
Yeah.
Okay.
Well, it's something to consider.
I guess we did talk about other,
we had a couple episodes on operating system stuff,
including the one with Andreas about Serenity,
which was very interesting.
Yes.
Okay.
Well, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook,
Twitter, or email us at feedback at cppcast.com.
And don't forget to leave us a review on iTunes or subscribe on YouTube.
Joining us today is Sebastian Teofil.
Sebastian studied computer science in Berlin, Germany, and France. He met ThinkCell CEO Marcus at university as an undergrad research assistant and has been working at ThinkCell ever since.
He's currently a senior software engineer there.
Over the past few years, he worked on a macOS port and more recently built tools to let
WebAssembly applications interact with JavaScript through type-safe interfaces.
Sebastian, welcome to the show.
Thank you.
Thank you for having me.
So are you currently in Berlin now?
Yes.
Yes, I am.
You said Berlin and France you know, it's a Berlin and France in your,
in your bio there. I guess. Have you attended meeting C++ in the past?
I have. Yes. I think every time, I think. Every time that you were allowed to?
I think I was allowed to every time. No, no. I mean, as far as COVID goes.
Oh, yeah. Yeah, of course. Yeah. Yes. Yes. I hope. Yeah. And that's all you, Jason, I mean as far as COVID goes. Oh, yeah, yeah, of course. Yes, yes, yes.
I hope, yeah.
And I saw you, Jason.
I think you gave a talk, right?
I think.
Was that Berlin or was it some other conference?
Oh, it might have been some other conference, but I did speak in Berlin.
I'm just trying to remember what year it was now.
Was it 2017?
18?
No, it wasn't 18.
I don't know.
And I feel like every time I go to say, right, what year was that?
I'm always
off by one right now because of the missing year i'm always off by one i want to be like oh well
last year when i went to those four conferences no that was two years ago now the year when things
still happened yes last year was a year it was just a year that was weird and a lot of things didn't happen. Yeah. Okay.
Well, Sebastian, we got a couple news articles to discuss.
Feel free to comment on any of these.
And we'll start talking more about the work we mentioned in your bio that you've been doing.
I think so.
Okay.
Yeah.
All right.
So this first one we have is a blog on the Embarcadero blog, our blog post on Embarcadero.
And this is Threadripper3990x, the quest to compile 1 billion lines of C++ on 64 cores.
Interesting blog post.
You know, I had not heard of, you know,
trying to set something up to use so many cores.
And the author of this post kind of went through a lot of issues just trying to set something up to use so many cores. And the author of this post kind of went through a lot of issues
just trying to utilize all those cores
and also just kind of running into issues with using such a large code base.
What do you think about this post, Jason?
Well, I would say that the title is slightly understated.
It's 64 cores, but it's 128 thread machine.
So he is trying to keep all 128 threads busy.
Right.
Uh,
it's an interesting read.
There's,
well,
there's several interesting tidbits,
but I just want to point out that as far as I can tell,
it's actually compiling C the whole time.
That's true.
With his sample code base,
it is just C and yeah,
I'm not C++.
I'm not sure how much of a difference that would make for the purposes of this post,
but I would argue a lot because that was my,
that was the point I would,
I would make.
It would definitely make a difference in RAM usage,
right?
Like if these were like a bunch of heavily templated C++ versus a bunch of straight line C or something.
Yeah.
Well, it does look like the code base he used was just kind of generated code in order to
get to that a thousand or a billion line of code target that he was looking for.
He didn't just find a actual, you know, open source project that was close to a billion
lines of code.
He generated a bunch of code to do it.
Correct.
Yeah.
A script interestingly written in, what is it? Object Pascal.cal no what did he write it in delphi delphi it was in delphi that's
what it was well it's the embarkadero blog right so this is their bread and butter i read it i read
it more as a as an interesting test suit for all his for all his tool chain. Now, more than an exercise of the C++ compiler,
because the points he failed were
the linker executable size,
it was the length of the command line
that he can pass in his make script.
And then I checked what our current
Windows developed machines are.
And actually we are doing this experiment every day, 10 times in practice, because the
latest machines we are buying have 54 cores and 384 gigabytes of RAM.
Wow.
And then I checked our C++ files, some of which include the Windows headers, and then
maybe Boost and then a little bit of SDL.
And pre-compiled, Plang says, have 800,000 lines of code.
Per C++ file?
Per C++ file.
So I think in practice,
we will exceed 1 billion lines
if we have maybe 500,000 real lines of code
plus all the input.
But I think the performance,
and then look at the performance.
Xcode tells you how long it spent compiling each file.
And there was like a basic, essentially C with glasses file from some external library, not our code.
And that was pre-processed only 75,000 lines of code.
And that took two seconds to compile.
And then I looked at our code.
And I think the far bigger influence on our compile time is
actually C++ features. I mean,
all our pre-compiled
files are similarly
large, a few hundred thousand
lines of code. But you have a factor of
10 difference between some files.
And that
difference is C++,
I think. So when one file,
which is 800,000 lines of preprocessed code,
takes 60 seconds, and another one takes 160 seconds,
then I suppose the difference is not the size of the includes,
but some template instantiation stuff
that takes an enormous amount of time.
And possibly that we include too many headers.
Well, that's hard to know.
That's the problem with C++.
And the Windows headers are, yeah, big.
Yes, yes.
And greedy.
There's a small one that we include,
but it's also essentially all of the Windows headers.
Is that out of curiosity to use any of those tricks
like pound define, win32, lean and mean or whatever.
And it's still
800,000 lines of preprocessed code.
I didn't check
who's responsible for the 800,000 lines
of code. It could be Windows, it could be
Boost, it could be anything.
Oh yeah, if you've got Boost and Windows
and the STL in there
I'm surprised
it's only 800 000
okay i did find it interesting though before we move past this that dev c++
if you've like dev c++ has been around forever that's an old project bloodsheds bloodshed
software i think is who originally controlled it.
But it's an interesting aside in here,
now that Embarcadero is now the maintainer of Dev C++,
which is an open-source IDE with a MinGW compiler.
And a lot of these experiments were around Dev C++,
which I was like, wait a minute, I know this name.
So I had to do a little bit of research here.
They took over maintenance of it last year in 2020.
Oh, very cool.
Okay, next article we have is a post on the CPP subreddit, and this one is looking for approachable open source projects to contribute to.
And I thought there were some interesting things in the comments here.
They did mention our recent interview, which I actually just mentioned a few moments ago, with
Andreas Klang about Serenity.
But then there's also, I wanted to
highlight these two links,
which are really great answers
for this person's question on
where to find a new open source project
to contribute to. One is
goodfirstissue.dev,
and the other is firsttimersonly.
And it looks like both of these websites are devoted to
kind of curating possible open source
bugs or issues to go and work on if you're a first time
open source dev looking to contribute. And there were two awesome
looking games in there that I was really surprised to find. Did you see that?
One was called Zero
AD. I think it started
as an Age of Empires mod, but that looked
absolutely professional and fantastic.
No, I don't know that one.
I have to try that out if I have the time.
Oh yeah, Zero AD.
Which one is the other one then?
Pioneer Space Sim,
but I didn't look into that that much. But that also
sounded fun very cool
there's a a comment here which kind of implies you should be careful to very carefully type
the first timers only.com that's potentially a a url that might take you somewhere you don't want
to be um yeah it's one of those urls that kind just sounds suspect, but that's not really what it was intended to sound like, I guess.
Oh, that does. I've just pulled up zero AD. That actually does look pretty sweet.
Okay. And then the last thing we have is, this is actually just a changelog for this C++ search extension.
But I don't think we've ever mentioned this search extension before.
So it's a extension that you can add to the browser
for Chrome, Firefox, or Edge,
and it lets you just type in, like, cc and then your search term,
and it'll bring you straight to CPP Reference.
Yeah, I totally installed it.
Yeah, seems pretty nice.
It does an awesome job of taking you to cppreference.com.
I kind of hope at some point to get it to also be able to search
or see if they can search like yield.is for searching the standard.
That would be really awesome for me.
Yeah.
Cool little tool, though.
Okay.
So, Sebastian, we mentioned a couple things in your bio.
Could you maybe start off by telling us a little bit about what it is you work on at ThinkCell in some more detail
and a little bit about this project in porting the application to macOS?
Yeah.
Shall I tell you a little bit about what ThinkCell does, actually?
Yeah.
Absolutely, yeah.
Right.
So we are developing an ad for PowerPoint
that was originally used by consultant companies mostly.
They work a lot in PowerPoint.
They create a lot of PowerPoint slides.
And they used a lot of features to create nice-looking charts
that PowerPoint didn't support.
They wanted to have arrows showing the percentual difference between, I don't know,
your revenue in different years or something like that. And they used to do that by hand,
or they had maybe a few macros. And they didn't have an interactive software that would create
these things on demand and update them on demand and a software that would be easy to use.
So very often, like this is 15 years ago, they would have,
they would have the consultants, highly paid consultants, they would scribble a chart on paper
and fax that somewhere to the back office. And then they would create the chart and they would
fax it back or the slide, they would fax or send it back via email, and then he would make edits.
And there was a pretty inefficient process. Yeah. And we wanted to make a software that was easy to use
so the consult could actually do it himself.
And that's what we did.
And then we built on that software.
And by now, I think we have 800,000 users worldwide
in most big companies.
And we built on this initial project
to make nice looking charts. And now we have
algorithms that try to help you during your slide layout. So you can just assemble your
shapes on the slide. And then when you change content, we update the layout and you don't have
to move all the text boxes around again so that your slide looks good. So there's a lot of algorithms to
automatically create nice-looking charts or
slides. There's a lot of work we do to make
a good, easy-to-use, intuitive user interface.
So the challenges we have are quite different.
There are a lot of them in a very
in a lot of different areas okay so you you wrote this original application and it's a
uh powerpoint plugin on windows because you know 15 years ago microsoft only cared about windows
but more recently they've obviously you know embraced the open source ecosystem and made a
lot of their applications cross-platform.
So now you need to run this on a Mac too, I guess.
Exactly, exactly.
We wanted to move to the Mac.
That's also something I personally wanted.
That was something I was very fond of.
I was privately a Mac user for a long time.
And so this project to port our software to the Mac,
to Microsoft Office on the Mac,
became a little bit of a passion of mine as well. And that was also the first time we actually ported our software
to anything at all. So in the beginning, we had maybe 500,000, maybe a million lines of code. So
it's definitely a larger project. And this had never been ported to anything else. So you can imagine there was a lot of Windows API usage throughout our code base,
Windows data types that way you brought our code base.
So there was a lot of cleaning up we had to do in the beginning,
getting stuff to compile at all, removing Windows specific things
before we could actually move to the interesting part of really re-implementing things on a new operating system.
And the special, like I said, the special case is that we are an add-in.
So we have little control over the main application.
And that means, well, we have to be very, very flexible.
We have to do whatever the main application does. So that means we had to have a flexible rendering engine
that could render into whatever the host application provides us
using OpenGL on the Mac, using DirectX on Windows.
We have to be very quick in supporting platform-specific features.
So if PowerPoint, for example,
is a sandbox application on macOS, or it supports some macOS specific feature, maybe the triple
click to select an entire paragraph of text, then we have to support that as well. Otherwise,
there's a certain friction between our add-in and PowerPoint. And that means, in the end,
that means we have to do a lot of things ourselves.
We can't depend on existing cross-platform toolkits
that would take these decisions away from us.
We have to say, okay,
we have to do this all by ourselves, essentially.
Right.
Yeah.
And so the only cross-platform toolkit
we do use actually is Boost.
Yeah, Boost doesn't make any UI decisions for you, right?
Yeah, exactly.
I'm curious to go back a moment when you said that this entire project had been built for Windows originally.
I can't even imagine just how much Windows idioms and function calls would have just infiltrated the entire
code base. Yeah, how much time do you spend just de-Windows-ifying the code base before you can
even try to take it to the Mac? Quite a, well, quite a lot of time, because it can be little
things. It's your file handling, where you just pass handles to files around. It's your geometry rendering library,
where in some place you have a cast to some Windows API rect struct.
And yeah, these all had to go out.
Yeah, that was a lot.
That was the boring initial part.
And you said at that time you were...
How much? How big was the code base?
Between 500,000 and a million.
It depends a bit on what you include in the count, I think.
Yeah.
I couldn't help but note, by the way, you said, I believe you said that you have 800,000 users,
which is coincidentally how many lines of code your pre-processed files were earlier.
And about, on average, how many lines of code our code has you're gonna have more users as your
code base grows yeah so when you um finished kind of you know de-windowsifying it uh making the code
base more more generic um what were some of the first steps to actually bring it to the Mac and start compiling and building it there?
Well, I think the first one was rendering. You want to render something so you see something
that something is working. But porting the rendering, that wasn't the most interesting
part because, well, we had an internal implementation that was well designed.
I mean, you assemble triangles and textures that you want to render.
And then at some point, you go over them and you pass them on to DirectX.
And that mapped pretty directly to what you would do with OpenGL.
You just issue OpenGL commands.
I think there were other there were other challenges and i think the the the
biggest the biggest overall challenge is that when you want to make your software cross-platform
then you have to find the the right abstraction level where you put in this cross-platform
interface so you want to have this nice abstraction and here i call some function and then it does something different on windows and macOS and that turned out to be quite difficult to figure out how do
you do this and i think a lot of cross-platform toolkits actually don't do this do this correctly
and so if you if you look for example as a as motivating example, at the many ways you have to rename a file.
Then you have, I don't know, on Windows, you have move file X.
On POSIX, you have the basic rename function.
On macOS, there are at least two extensions of that POSIX rename that take more options and let you specify the behavior.
And then you have boost file system rename,
which is essentially the POSIX rename, and you have cool file rename. And you cannot really
say how these map to the capabilities of the operating system. So which of these
cross-platform implementations would allow you to say say i want the rename to fail if the target file name
already exists um what do they do with the access control lists on your on your disk do they take
that over to the new location do they accept the access controllers in your target directory it's
not it's not specified but your your writing system has these things.
And if you are developing a desktop application
where your application shares the computer
with other applications,
and you have to be kind of a nice citizen,
then you have to answer these questions.
And I think this rename example is an example
where the cross-platform function that Boost or Qt give you, they're actually true.
You can't use them in any meaningful way, at least we couldn't in our application.
You would want an interface that is much higher level that says maybe you say,
I want to have a function that creates a settings file that is user specific or staying with a
rename example maybe you have the scenario that you want to fill a cache of files so you download
files in different processes and they download files they want to cache this file so you download
them to some random file name and then you want to rename them to the target file name of your cache.
And you want to do that atomically so that you don't partially override the same file or that
nobody reads the partial file. So this is a function that you want to have that macOS coincidentally
has even. Say, download this file, give it some name I don't care about, and then replace it and
put it in this location so that i can fill my cache
and and not have any race conditions this is something you can implement very differently
on different operating systems but then you need the full power that that i don't know the
windows move file x function has or that the that the mac os rename file rename extension function
has the the cross-platform function that we have,
they're all too underpowered to even implement this.
And so they kind of unified the least common denominator
of all the implementations they can find.
And I'm not saying you can actually do better
in Boost or the standard, you probably can't,
but you can't really use these functions in all scenarios either. You have to take a step back and say, okay, what is the
semantics that we need? And then how we implement them on each of our systems. And there were other
challenges. File handling was pretty challenging because Windows is relatively special.
So, for example, when you have, in a good way,
I mean, my respect for the Windows internals grew while I was porting the software to the Mac
because they have made a lot of very good and very clever decisions.
So, for example, that you can have temporary files
that you can mark as, please delete this file
when every process that has
this file opened has died and the kernel takes care of this and you don't leak temporary files
somewhere you have to take care of delete them later or whatever you can just say please clean
up after me even if i crash and this for example something that's pretty hard to emulate um on
mac os right because, we tried.
We want to clean up after us.
So we tried to do some trickery,
but in the end,
it's probably not as reliant
as doing it on Windows,
the simple way that Windows provides.
So there were interesting differences
in semantics.
I'm curious now,
ultimately, how you solved this or what your what your internal
code looks like do you have like super high level functions that are like please create a temporary
unique named file in this directory and delete it when the process is gone or something like or um
yes we often often we create functions like that so we So let's say we have a function, create a temporary file that only I will ever read, so nobody else will ever read it, and delete it when I'm done.
Because if you know so specifically that you are the only one reading it, then you can use the simple Unix trick of opening the file and deleting it right away and just keeping the file handle open.
Oh, okay.
So that's the simple solution.
The hard problem occurs when you have multiple readers to the same file
or a reader and a writer to the same file,
but you still want to make sure that this file is eventually deleted
when everybody
who has it opened closes the handle maybe by crashing so that's the hard case and we actually
in the end we we try to circumvent this problem on on mac os and say let's do something else let's
create a c file where we put our temp files in like a single big file where we write the shared
bytes in.
So maybe we use a lot of shared memory.
That's what we use it for.
We have different processes.
So in the end, we're not only added to PowerPoint.
We are added to Excel.
And then there are a lot of different processes involved because we develop more features. So there is a utility that lets you
specify in a JSON file that you want to take a PowerPoint presentation as a template.
And then in JSON, you can specify the data that you want to write into some chart,
into some chart that you specify by name. So there's a tool that can open this JSON file,
open the PowerPoint presentation, pass all the data onto PowerPoint to our add-in so that we can process this data,
update all the charts and save it again. So for scenarios like these, we implement cross
inter-process communication using shared memory. And in some cases, there's quite a lot of data
that we pass around. So we pass it around in temporary files.
And to solve this problem of leaking temporary files on macOS, we actually decided in the end,
it's much easier to create one big file and write the shared data in there and then pass a handle to that offset, essentially.
And say, OK, you find the data at this offset.
And then there's only one file and you can clean that up later you know okay you've crashed you start again you find there's
an old file you can delete this file find the file instead of the mini files exactly exactly
although it does sound like you've effectively had to implement your own file system
on top of this no it does that sounds so. It's much easier than that because the files are
only written once. So it's write one
write data and then
pass a handle to that and then it becomes
read-only and it's much easier.
Do you have the problem though where maybe your temp
file continues to grow or do you always get
to restart back at zero? We compact
it because of course
there might be holes in it and at some point
we throw the holes away.
I guess one thing I'm curious about is if you had started this project fresh
and you knew you were going to be aiming for both Windows and Mac,
do you think you would still have wound up creating all,
using all these kind of platform APIs?
Or would you have tried to use something
like boost file system or stood file system and kind of maybe worked around any possible
limitations of it no i think we would have because in the end you need those operating
specific functions to to answer the question that the operating system has asked you like what
shall the access control list be shall i overwrite this file or shall I not?
And this is something that you have to say as a programmer.
And this boost file system or POSIX rename function,
it works well, of course, if you're just starting programming
and you have a toy problem.
It will also work okay if you are writing a server application
or maybe an embedded application
and you're essentially the only user of the entire computer in in those cases i think you can you can it's good to have this function
which is easy to use does something very simple uh and and works in a lot of cases but it doesn't
work in all cases and maybe we are the exception and now that we are writing good old desktop software. That could be.
Nobody else has this problem anymore.
Just ship a web app.
I think you commented earlier, you questioned whether or not the standard could do better
if it could give you more of the lower level.
Obviously, I wasn't around for the standardization of std file system,
but my understanding is that they intentionally aimed for those things that are common across all platforms yeah with so you
don't have a bunch of os specific flags and whatever in there you mean you don't you don't
yeah so that they didn't have to try to have a bunch of yeah exactly and you can't i think cute try to do try to do a little bit
better and they have this weird permission api i've never used cute but i looked at the at the
code when we were answering these questions for ourselves and they have a permissions api and then
they also had to wonder okay how can a cross-platform permissions api look like and what
they decided in the end was that they used the Unix permissions
flag and for user group other readable, writable, executable, and then they implement them on
Windows. Because the Windows ACL model is so much more complicated that of course you can
implement those flags using the Windows ACL. But again, it's not something that, I mean,
if you are really on Windows, then again, you have to specify some some security descriptor some acls
again this is probably good for the typical simplest use cases but not in all of them
yeah i believe it's been a long time since i've actively programmed in q but i believe that there
is a few cases and like you're talking about with file system and process management,
that instead of using Q process,
I would have to dig down and use like Q win process
or whatever, or even go and grab the source code
for the Windows one and pull it into my project
and modify it to what I needed it to be or whatever.
Yeah, should just do the right thing.
Yeah, because I mean, yeah, it's hard otherwise.
How should you standardize this?
And there were different, I mean, i spoke about file system and temp files and the the different lifetimes and this is something that happened over and over again because windows in general
when you create kernel objects in windows that can be temporary files but it can also be
shared memory objects or mutexes then they they're all reference counted, which is extremely
handy if you're programming. You don't have to worry about this resource. Windows will clean
this up for you. And on POSIX systems, they typically aren't. They have kernel persistence,
which just means they exist until the computer is restarted. So for example, we used Boost Interprocess
to implement the real shared memory,
like a real shared memory segment that you can access
using pointers from different processes.
And that had the same problem,
that they had a Windows implementation
where on Windows, the shared memory
would automatically be cleaned up for you
if all the participating processes had closed down.
So when they restarted, they would get a clean memory segment.
And on macOS, Boost implemented this using backing files.
And the backing file would stick around
even if your process is all trash.
And you restart your application, you get the old shared memory.
And this is something we had to work around and and made a patch for boost inter process
to to find a cleaner implementation like to to unify the semantics i think this was a lot of
a lot of times the the challenge when you're writing for different operating systems
that you want to you want them to behave the same way, if at all possible.
And for the shared memory, that meant making a patch for Boost into process.
The only resource that we could find on Unix that has process lifetime are file locks.
I think that's the only thing that is cleaned up if your process dies.
So we made an implementation that has proven robust.
I think it works okay.
Where we use file logs to synchronize access to this backing file.
So you have some backing file for your shared memory.
And when the first process comes along and tries to access your shared memory file, you try to open it using an exclusive log, exclusive file log.
And if that succeeds, then you know, okay okay i'm the first one to open this shared memory
segment so i can delete all its contents and truncate the file and if that doesn't succeed
then i try to get a shared lock well in both cases either after truncating or if the exclusive lock
doesn't succeed everybody tries to get a shared log in the end and this way you can synchronize
and make sure that that you know which process
is responsible for initializing the shared memory.
So that was easier than the temp file, I think.
Almost exactly the same process that's used
for initializing function local statics in the C++.
Is that? I didn't know how that was implemented, actually.
Because it has to be thread safesafe as of C++11.
As soon as the function is entered, it tries to get an exclusive lock,
and then it checks to see if the data has been initialized.
If it has been initialized, it releases the exclusive lock,
continues on with the function.
Well, I thought the idea itself was simple enough,
so I was pretty confident in the idea.
I think the problem is rather, do file locks on Unix actually work the way you think they work? Or is there some hidden...
I have no idea about that.
I know there are problems. No, you just Google for it. There are a lot of problems with file logs.
So if you have a mounted home directory that is mounted from some server via NFS,
you have to mount it with the right options for the file logs to work at all.
But I thought that was
still a good enough solution.
Different challenges. So at this point, have you centralized
all this cross-platform code, or do you still have it sprinkled throughout
the code base like the Windows idioms were? The cross-platform
you mean in the sense of,oms were? The cross-platform, you mean in the sense of,
do we have some cross-platform library that implements all these things?
Right, yeah, something that's abstracted away.
No, they are implemented where they are needed.
We have some CPP files that would be Windows-specific or Mac-specific.
That's like the simplest case.
You define the function in the header,
and then you have two different implementations.
We do have the
occasional ifdef. We
allow that.
I try to get out of hand.
But sure.
Sure. I just read
the comment maybe on Twitter that
code that has a lot of ifdefs is not portable
code. It's just code that has been ported a lot.
Maybe that's true. I might agree with that statement it's certainly true but i we have so now we have uh windows we have 32-bit windows we have 64-bit windows so that's already two
different if devs and then we have mac and on mac we have intel and arm so that's quite a lot of
sometimes different code bases,
sometimes different code paths.
Okay.
I want to interrupt the discussion for just a moment to bring a word from our sponsor, Visual Assist.
Visual Assist is used by serious C++ developers across the world.
It's got great code generation.
Do you need to implement methods from an interface?
What about changing a pointer to a smart pointer,
even an Unreal Engine smart pointer?
Adding a symbol you've typed but haven't declared?
Visual Assist will do these and much more.
Plus refactorings, more powerful than the ones included in Visual C++.
Or detecting errors in code and suggesting useful corrections.
Or navigation, helping you move anywhere in your code and open or locate what you need.
Or even the debug extensions.
Visual Assist is written by C++ developers for C++
developers. It includes everything you need and nothing you don't. It has a low UI philosophy.
It won't take over your IDE, but will show up when useful. It's there to help, not to advertise
itself. Visual Assist is relied on by the developers building software you've used. Whether
that's office suites, operating systems, or games, software you use was built with Visual Assist.
Get the same tooling for your own development.
Visual Assist supports Unreal Engine 4
and many versions of Visual Studio,
including VS 2019 and Community.
Get it at wholetomato.com.
We haven't talked at all about build systems.
Was that a big headache in converting over to the Mac?
It was.
So we came from Windows, and we were using Visual Studio. a big headache in converting over to the Mac? It was. It was.
So we came from Windows.
We were using Visual Studio.
And we were used to an IDE.
We are used to an interactive debugger.
We don't do printf debugging.
And so it was clear in some ways,
probably given that we have to use Xcode on macOS.
And also, I don't think I can see that we are moving to purely make or something as a build system that's not going to happen
and we we looked at cmake um i looked at it first a bit and then we also had a quite a big project
somebody a new colleague coming in who had experience with cmake, who tried to build our own build system to CMake.
And so what we started with and what we're still using now is we have this idea,
and of course we have Visual Studio.
And in Visual Studio, we have our Windows-specific build settings.
And they are defined in settings files.
So if you have those interactive IDEs, there are two
different ways to set up your builds, right? You can have the messy way where you have all your
files in those project files, and you specify your preprocessor defines your include paths
everywhere over the projects for individual files. So that's not the way to do it.
Visual Studio and Xcode,
they both have supported a good way
where you have like a single settings file
or maybe a layer of settings file
that can include each other
where you can say,
okay, these are my common compiler definitions.
These are the preprocessor defines
I use for every single source file.
These are the include paths
I use for every single source file.
Maybe in differences for debug and release file. These are the include paths I use for every single source file, maybe in
differences for debug and release builds.
So in that sense, our business was
already very simple. It was
a simple settings file, and
it was a lot of project
files for Visual Studio that
only contain file names.
And the build settings
are very, very different for
Clang and Windows.
There's maybe semantic overlap that you want to optimize for size or you don't want to optimize at all.
But the practical build systems are completely different.
So initially I said, OK, I don't really see what CMake gives us here since we are only building for two IDEs instead of introducing a third build system that we then also have to understand. And what we did is actually write a simple,
relatively simple Python script
that would spider through our source tree
and look at all the,
enumerate all the C++ and header files
and just write them into the Xcode
and the Visual Studio files.
And we did learn more about our native build system.
So we did understand,
we tried to understand more
about how MSBuild works,
which is very powerful
in its own right.
And at first seems a bit weird
because all these settings files
and sometimes actual scripts
are defined in XML format,
which is weird for imperative programming.
But it is very powerful.
And what we then did was to make our programming,
to make the setup for developers easier.
We didn't want to have developers adding their files to build lists
like you would in CMake.
Compile this file for that target.
The script spiders and takes care of that.
And then we have, by now, a relatively powerful name matching set.
So maybe you have a file for a shader.
Maybe it's a Windows shader.
Maybe it's a Mac shader.
We have resources
that we compile into our binary.
We have
platform-specific files that
just have a file-specific ending
underscore Mac or underscore Win.
And the spider script takes
care of all of this and matches
file names and can discover, okay,
this file is Windows 32
specific. This file is Mac specific.
That's a shader.
And then it can create the correct build steps and add them in the correct way, in the MS
build way to Visual Studio and in the Xcode way.
And I think in that way, we have separated the platform specific parts in the settings
files and the common part, which is really just spidering
and the definition of name suffixes.
And I think in spirit, it's pretty similar
to what this JIP, G-Y-P, from Google does.
It looks a bit like it.
I think the JIP project doesn't really work outside Google.
I think last time I checked, they had some Google specifics, I think.
But in spirit, it was like that.
I'm not sure if anyone's used JIP
outside of a Google project.
The documentation didn't look like it.
Shouldn't trash on it.
And we're laughing about it because of the...
We've talked about it recently.
Yeah, recent interaction with Patricia.
Okay, because she was building the Chrome.
Yeah, and I don't even remember if jip came up
there but i just started laughing because when you said jip is one of google or you said it's
a google build system i'm like i'm not basil the other one yeah there's like six of them
everybody has his own so yeah it's a our build system is a bit idiosyncratic but we have it's
like a 600 lines python script so the complexity is manageable and that has been stable.
Somebody commented at some time,
said to us, okay, yeah, that's not a good idea because everybody tries to do that.
And then in the end, you start redeveloping CMake.
But that hasn't happened for the last few years.
And in another life,
I wrote and maintained a very similar script
for the exact same purpose
where my team started on Visual Studio and then eventually moved to iOS and Android.
And we had a Ruby script that would parse out from the MS build scripts and update or create Xcode project files from it.
So it's been done a couple of times, but it works.
It works, right?
It's not so, I mean, the times I have to touch this script
are far and few in between.
I don't know.
I think it's possible.
I wrote automake code,
or a script that took automake code
and made a Visual Studio project from it.
So, yeah.
I wonder if, I mean, I guess
if anyone wanted to make their own script
in the open source community that did this,
it would just wind up being CMake.
It would, yeah.
Okay.
Yeah.
So one thing we haven't talked about yet, and I want to make sure we don't run out of
time, is you're also now targeting WebAssembly with this code base as well.
Can you tell us a little bit about that?
Yes, we can.
We started with small web projects.
So we don't build the entire code base for WebAssembly.
But we thought about it.
We thought about Office, a lot of different Office platforms, Google Office or Microsoft Office, moving to the web.
So the idea came up, okay, what happens if customers do this? that we have to have some browser extension or something that lets you edit, create better charts in Google Slides or in PowerPoint on the web.
And then we wrote a little Google extension just for trying things out.
That lets you, you can connect, there's this big data platform, tableau.com,
where you can create data visualizations.
And we wanted to let users connect
their PowerPoint charts to this data source.
And for that purpose, we built
a little extension. And we started
building it in JavaScript.
Somebody used...
We tried experimenting. Somebody used React.
Then we re-implemented
it in TypeScript and thought, okay,
this is definitely much nicer
to work with, especially
when you're coming from C++.
But then we ran into the problem that there was some code sharing that we couldn't do.
Maybe it was just simple enums, usually, maybe, or a simple struct where you say, okay, now
I'm building this data structure here in TypeScript, and then I send it over as a JSON
string, probably, but to our application, and then I send it over as a JSON string probably but to our
application and then I have to redeclare the same data structure and make sure the enums are in sync
so we thought okay how can we do this in in WebAssembly and when you do this in WebAssembly
then it's very easy and Emscripten is great for that if you just have a C++ code base that you're
compiling for WebAssembly but we have have to interface with JavaScript libraries. So this Tableau.com web app has a native
JavaScript library that we want to use to get the data out of the system. And when you're
interfacing with JavaScript libraries from WebAssembly, then all the type safety is gone again. And you send the method name essentially by string, which is very slow.
And then Emscript parses the 8-bit characters in memory and reassembles a JavaScript string from that and then tries to dispatch that JavaScript function.
And we knew from development in TypeScript that there's this awesome type repository on NPM, somebody hosted on GitHub, where they have type interface definitions in TypeScript for a lot of JavaScript libraries, for the entire DOM tree, for the standard JavaScript libraries, but also for your special libraries
like the Tableau API,
the Google Extension API, etc.
So you have a type-safe interface description.
And we thought, okay,
it would be cool if we could
take these two things together
and use the TypeScript interface definition language
from WebAssembly.
And one of my colleagues,
he did his masterpieces,
and he developed the systems, and then I took it over.
And what we did is just say, okay, let's take this TypeScript compiler
that Microsoft provides and parse this interface definition in TypeScript
using this TypeScript compiler,
and then generate a C++ header for that and implementational steps.
So let's say you have a very simple API, something in the DOM tree. I don't know,
you want to access the window. What is it? I don't know, window location, maybe the location
property in your global window from Emscripten, like ignoring that Emscripten provides some APIs to access the
standard DOM, you could make a call that passes the window string. So you get the window object,
and then you get an opaque object back. And then you can call the location property again as a
string on that window object, and then you get something back. And instead, we would take this
TypeScript definition, say, okay, there's a global window object, that global window object has a
property location, which has a specific type. And then we declare a header file that re-declares
essentially this type relation in C++. We have a window object, the window object is a property
location, and that location has this type. And then the
stub that we currently generate
is the Emscripten call.
That's the untypesafe Emscripten
call that passes a string for the
property name. But we have wrapped
it in something typesafe.
And this project is on
GitHub and it is currently
strong enough. It's almost
self-hosting. So it's powerful enough to
parse the TypeScript. Let me phrase that correctly. It's powerful enough to parse the
interface definition for the TypeScript compiler itself that it then includes. So this is the API
we use, and the project already compiles to WebAssembly.
And it's essentially a pre-compiler.
So you can take your own TypeScript code.
You can take anybody's type definition, or you can take your own JavaScript library or TypeScript library,
generate this interface definition, generate the C++ stub for it, and then use it from WebAssembly in a type-safe manner.
Sounds really useful. So this is TCJS on from WebAssembly in TypeSafe manner. Sounds really useful.
So this is TCJS on GitHub, right?
TCJS, yeah. I thought about renaming
it to TypeScript. That would be a much
better name. TypeScriptum?
TypeScriptum, yeah.
That would be good, actually. Yeah, you should rename
it.
So how far along
is the WebAssembly project now?
Are you able to use this now?
It is on the stack of things on my computer
that I haven't yet pushed to the repository.
So I think it's not quite the top of stack.
There's something above that.
And then I hope to get back to it.
But the project as it is on GitHub
is already a pretty
good compiler for those
interface definition things.
It's interesting.
First of all, it's always a nice thing if you're
starting to develop on a new code
base like this compiler instead of the
800,000 lines of code code base.
And
then there are interesting questions
like TypeScript supports generics,
a bit like templates in C++, but not quite.
But now you have to translate that to C++
in a meaningful way.
I mean, you don't have to do a perfect match
as long as it solves the problem.
There are naming collisions that TypeScript supports
that C++ cannot support,
and we have to maybe just introduce some suffix
to disambiguate those names.
You encounter those mismatches between the languages.
Or interesting choices that TypeScript made
to represent JavaScript in some TypeSafe manner.
I should at some point actually spend some time with TypeScript.
Have you used it at all, Rob?
No, I haven't. Just know about it from
what we've learned from your cousin.
Yeah.
Okay.
You don't need to anymore.
Using our tool, you can just use C++.
Yeah. It does sound very powerful.
That's awesome.
Okay.
Well, Sebastian, is there anything else you want to plug
or tell us about before we let you go today?
I think somebody is always happy when I'm plugging.
We're always looking for developers, of course.
Other than that, you can just write to HR.
I think so, of course.
Or look at our website.
No, other than that you just mentioned you're looking for developers and we've got a couple minutes left because this comes up on the show are you looking for remote developers we we noticed
in the video here that there's was one other person in the office with you true but socially
people who can relocate to berlin or yes practically
yes okay um i mean now most of my colleagues are at home of course but we in in general we
are a pretty small company um 20 to 30 developers now i would guess and yes we the way we work we
still work together and we still value this uh value this walking over to somebody else's office and asking a question.
Okay.
Well, it's been great having you on the show again today, Sebastian.
Thank you for having me.
It was great being on the show.
Thanks.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in.
Or if you have a suggestion for a topic, we'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can
email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on
Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at
Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon.
If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast.
And of course, you can find all that info and the show notes on the podcast website
at cppcast.com.
Theme music for this episode was provided by podcastthemes.com.