CppCast - Going Cross Platform

Starting point is 00:00:00 Episode 286 of CppCast with guest Sebastian Teofil recorded February 10th, 2021. This episode of CppCast is sponsored by Visual Assist, the well-known productivity extensions for Visual Studio. Visual Assist speeds up development with features like smart navigation, code inspection and suggestions, powerful refactoring commands, and a whole lot more. Even spell checking and comments. Start your free trial at wholetomato.com. In this episode, we discuss C++ search extension and building a billion lines of code. Then we talk to Sebastian Teofil from ThinkCell. Sebastian talks to us about porting a Windows code base

Starting point is 00:00:55 to Mac OS and then the web. Welcome to episode 286 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing? Doing okay. Things are starting to get a little crazy around my house. We are planning on moving again sometime in the next few months. So we'll see the backdrop behind you change.

Starting point is 00:01:41 You might still have cat castles whatever those are called but i mean that might disappear sometime soon because uh we're gonna have to show and sell this house uh it's not gonna be a very long move like you know five years ago we came from five or six years ago we came from new jersey down north carolina this time we're moving like 10 minutes away to a different house in the same area but but everything's all picked out and you know what your plans are. Yeah, making all those plans, getting ready, and starting to pack stuff up. Exciting. Very exciting.

Starting point is 00:02:13 Any news from you? No. Okay. Quiet time of year. Yeah. Okay, well, at the top of every episode, I'd like to read a piece of feedback. This week, I got an email from Eric writing, Hi, guys. I'm really enjoying your podcast. I started listening about a month ago. I always learn something I did not know. Anyways, it might be an interesting episode to talk with someone from Rocky Linux. Thanks from Eric. So I'm not up on all the new Linux distros. Have you heard of Rocky? I looked it up simply because of this comment here. It is apparently a fork of Red Hat Enterprise Linux,

Starting point is 00:02:54 although they don't actually say that on their own description because they don't like the direction that Red Hat Enterprise Linux is moving. I don't know the details there. But they want to be bug-for-bug compatible with those last versions of RHEL. So an interesting topic, although I'm not sure what it would have to do with C++. Right, because I'm assuming it's all going to be written in C.

Starting point is 00:03:20 Well, I mean, it's just a Linux distro, right? It's going to be mostly scripting tools, right? As far as the distro actually goes. Yeah. Okay. Well, it's something to consider. I guess we did talk about other, we had a couple episodes on operating system stuff,

Starting point is 00:03:35 including the one with Andreas about Serenity, which was very interesting. Yes. Okay. Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cppcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube.

Starting point is 00:03:53 Joining us today is Sebastian Teofil. Sebastian studied computer science in Berlin, Germany, and France. He met ThinkCell CEO Marcus at university as an undergrad research assistant and has been working at ThinkCell ever since. He's currently a senior software engineer there. Over the past few years, he worked on a macOS port and more recently built tools to let WebAssembly applications interact with JavaScript through type-safe interfaces. Sebastian, welcome to the show. Thank you. Thank you for having me.

Starting point is 00:04:17 So are you currently in Berlin now? Yes. Yes, I am. You said Berlin and France you know, it's a Berlin and France in your, in your bio there. I guess. Have you attended meeting C++ in the past? I have. Yes. I think every time, I think. Every time that you were allowed to? I think I was allowed to every time. No, no. I mean, as far as COVID goes. Oh, yeah. Yeah, of course. Yeah. Yes. Yes. I hope. Yeah. And that's all you, Jason, I mean as far as COVID goes. Oh, yeah, yeah, of course. Yes, yes, yes.

Starting point is 00:04:45 I hope, yeah. And I saw you, Jason. I think you gave a talk, right? I think. Was that Berlin or was it some other conference? Oh, it might have been some other conference, but I did speak in Berlin. I'm just trying to remember what year it was now. Was it 2017?

Starting point is 00:04:59 18? No, it wasn't 18. I don't know. And I feel like every time I go to say, right, what year was that? I'm always off by one right now because of the missing year i'm always off by one i want to be like oh well last year when i went to those four conferences no that was two years ago now the year when things still happened yes last year was a year it was just a year that was weird and a lot of things didn't happen. Yeah. Okay.

Starting point is 00:05:25 Well, Sebastian, we got a couple news articles to discuss. Feel free to comment on any of these. And we'll start talking more about the work we mentioned in your bio that you've been doing. I think so. Okay. Yeah. All right. So this first one we have is a blog on the Embarcadero blog, our blog post on Embarcadero.

Starting point is 00:05:48 And this is Threadripper3990x, the quest to compile 1 billion lines of C++ on 64 cores. Interesting blog post. You know, I had not heard of, you know, trying to set something up to use so many cores. And the author of this post kind of went through a lot of issues just trying to set something up to use so many cores. And the author of this post kind of went through a lot of issues just trying to utilize all those cores and also just kind of running into issues with using such a large code base. What do you think about this post, Jason?

Starting point is 00:06:19 Well, I would say that the title is slightly understated. It's 64 cores, but it's 128 thread machine. So he is trying to keep all 128 threads busy. Right. Uh, it's an interesting read. There's, well,

Starting point is 00:06:32 there's several interesting tidbits, but I just want to point out that as far as I can tell, it's actually compiling C the whole time. That's true. With his sample code base, it is just C and yeah, I'm not C++. I'm not sure how much of a difference that would make for the purposes of this post,

Starting point is 00:06:50 but I would argue a lot because that was my, that was the point I would, I would make. It would definitely make a difference in RAM usage, right? Like if these were like a bunch of heavily templated C++ versus a bunch of straight line C or something. Yeah. Well, it does look like the code base he used was just kind of generated code in order to

Starting point is 00:07:10 get to that a thousand or a billion line of code target that he was looking for. He didn't just find a actual, you know, open source project that was close to a billion lines of code. He generated a bunch of code to do it. Correct. Yeah. A script interestingly written in, what is it? Object Pascal.cal no what did he write it in delphi delphi it was in delphi that's what it was well it's the embarkadero blog right so this is their bread and butter i read it i read

Starting point is 00:07:38 it more as a as an interesting test suit for all his for all his tool chain. Now, more than an exercise of the C++ compiler, because the points he failed were the linker executable size, it was the length of the command line that he can pass in his make script. And then I checked what our current Windows developed machines are. And actually we are doing this experiment every day, 10 times in practice, because the

Starting point is 00:08:07 latest machines we are buying have 54 cores and 384 gigabytes of RAM. Wow. And then I checked our C++ files, some of which include the Windows headers, and then maybe Boost and then a little bit of SDL. And pre-compiled, Plang says, have 800,000 lines of code. Per C++ file? Per C++ file. So I think in practice,

Starting point is 00:08:30 we will exceed 1 billion lines if we have maybe 500,000 real lines of code plus all the input. But I think the performance, and then look at the performance. Xcode tells you how long it spent compiling each file. And there was like a basic, essentially C with glasses file from some external library, not our code. And that was pre-processed only 75,000 lines of code.

Starting point is 00:08:57 And that took two seconds to compile. And then I looked at our code. And I think the far bigger influence on our compile time is actually C++ features. I mean, all our pre-compiled files are similarly large, a few hundred thousand lines of code. But you have a factor of

Starting point is 00:09:16 10 difference between some files. And that difference is C++, I think. So when one file, which is 800,000 lines of preprocessed code, takes 60 seconds, and another one takes 160 seconds, then I suppose the difference is not the size of the includes, but some template instantiation stuff

Starting point is 00:09:38 that takes an enormous amount of time. And possibly that we include too many headers. Well, that's hard to know. That's the problem with C++. And the Windows headers are, yeah, big. Yes, yes. And greedy. There's a small one that we include,

Starting point is 00:09:55 but it's also essentially all of the Windows headers. Is that out of curiosity to use any of those tricks like pound define, win32, lean and mean or whatever. And it's still 800,000 lines of preprocessed code. I didn't check who's responsible for the 800,000 lines of code. It could be Windows, it could be

Starting point is 00:10:16 Boost, it could be anything. Oh yeah, if you've got Boost and Windows and the STL in there I'm surprised it's only 800 000 okay i did find it interesting though before we move past this that dev c++ if you've like dev c++ has been around forever that's an old project bloodsheds bloodshed software i think is who originally controlled it.

Starting point is 00:10:45 But it's an interesting aside in here, now that Embarcadero is now the maintainer of Dev C++, which is an open-source IDE with a MinGW compiler. And a lot of these experiments were around Dev C++, which I was like, wait a minute, I know this name. So I had to do a little bit of research here. They took over maintenance of it last year in 2020. Oh, very cool.

Starting point is 00:11:16 Okay, next article we have is a post on the CPP subreddit, and this one is looking for approachable open source projects to contribute to. And I thought there were some interesting things in the comments here. They did mention our recent interview, which I actually just mentioned a few moments ago, with Andreas Klang about Serenity. But then there's also, I wanted to highlight these two links, which are really great answers for this person's question on

Starting point is 00:11:35 where to find a new open source project to contribute to. One is goodfirstissue.dev, and the other is firsttimersonly. And it looks like both of these websites are devoted to kind of curating possible open source bugs or issues to go and work on if you're a first time open source dev looking to contribute. And there were two awesome

Starting point is 00:12:00 looking games in there that I was really surprised to find. Did you see that? One was called Zero AD. I think it started as an Age of Empires mod, but that looked absolutely professional and fantastic. No, I don't know that one. I have to try that out if I have the time. Oh yeah, Zero AD.

Starting point is 00:12:18 Which one is the other one then? Pioneer Space Sim, but I didn't look into that that much. But that also sounded fun very cool there's a a comment here which kind of implies you should be careful to very carefully type the first timers only.com that's potentially a a url that might take you somewhere you don't want to be um yeah it's one of those urls that kind just sounds suspect, but that's not really what it was intended to sound like, I guess. Oh, that does. I've just pulled up zero AD. That actually does look pretty sweet.

Starting point is 00:12:51 Okay. And then the last thing we have is, this is actually just a changelog for this C++ search extension. But I don't think we've ever mentioned this search extension before. So it's a extension that you can add to the browser for Chrome, Firefox, or Edge, and it lets you just type in, like, cc and then your search term, and it'll bring you straight to CPP Reference. Yeah, I totally installed it. Yeah, seems pretty nice.

Starting point is 00:13:20 It does an awesome job of taking you to cppreference.com. I kind of hope at some point to get it to also be able to search or see if they can search like yield.is for searching the standard. That would be really awesome for me. Yeah. Cool little tool, though. Okay. So, Sebastian, we mentioned a couple things in your bio.

Starting point is 00:13:44 Could you maybe start off by telling us a little bit about what it is you work on at ThinkCell in some more detail and a little bit about this project in porting the application to macOS? Yeah. Shall I tell you a little bit about what ThinkCell does, actually? Yeah. Absolutely, yeah. Right. So we are developing an ad for PowerPoint

Starting point is 00:14:06 that was originally used by consultant companies mostly. They work a lot in PowerPoint. They create a lot of PowerPoint slides. And they used a lot of features to create nice-looking charts that PowerPoint didn't support. They wanted to have arrows showing the percentual difference between, I don't know, your revenue in different years or something like that. And they used to do that by hand, or they had maybe a few macros. And they didn't have an interactive software that would create

Starting point is 00:14:36 these things on demand and update them on demand and a software that would be easy to use. So very often, like this is 15 years ago, they would have, they would have the consultants, highly paid consultants, they would scribble a chart on paper and fax that somewhere to the back office. And then they would create the chart and they would fax it back or the slide, they would fax or send it back via email, and then he would make edits. And there was a pretty inefficient process. Yeah. And we wanted to make a software that was easy to use so the consult could actually do it himself. And that's what we did.

Starting point is 00:15:10 And then we built on that software. And by now, I think we have 800,000 users worldwide in most big companies. And we built on this initial project to make nice looking charts. And now we have algorithms that try to help you during your slide layout. So you can just assemble your shapes on the slide. And then when you change content, we update the layout and you don't have to move all the text boxes around again so that your slide looks good. So there's a lot of algorithms to

Starting point is 00:15:46 automatically create nice-looking charts or slides. There's a lot of work we do to make a good, easy-to-use, intuitive user interface. So the challenges we have are quite different. There are a lot of them in a very in a lot of different areas okay so you you wrote this original application and it's a uh powerpoint plugin on windows because you know 15 years ago microsoft only cared about windows but more recently they've obviously you know embraced the open source ecosystem and made a

Starting point is 00:16:24 lot of their applications cross-platform. So now you need to run this on a Mac too, I guess. Exactly, exactly. We wanted to move to the Mac. That's also something I personally wanted. That was something I was very fond of. I was privately a Mac user for a long time. And so this project to port our software to the Mac,

Starting point is 00:16:42 to Microsoft Office on the Mac, became a little bit of a passion of mine as well. And that was also the first time we actually ported our software to anything at all. So in the beginning, we had maybe 500,000, maybe a million lines of code. So it's definitely a larger project. And this had never been ported to anything else. So you can imagine there was a lot of Windows API usage throughout our code base, Windows data types that way you brought our code base. So there was a lot of cleaning up we had to do in the beginning, getting stuff to compile at all, removing Windows specific things before we could actually move to the interesting part of really re-implementing things on a new operating system.

Starting point is 00:17:28 And the special, like I said, the special case is that we are an add-in. So we have little control over the main application. And that means, well, we have to be very, very flexible. We have to do whatever the main application does. So that means we had to have a flexible rendering engine that could render into whatever the host application provides us using OpenGL on the Mac, using DirectX on Windows. We have to be very quick in supporting platform-specific features. So if PowerPoint, for example,

Starting point is 00:18:05 is a sandbox application on macOS, or it supports some macOS specific feature, maybe the triple click to select an entire paragraph of text, then we have to support that as well. Otherwise, there's a certain friction between our add-in and PowerPoint. And that means, in the end, that means we have to do a lot of things ourselves. We can't depend on existing cross-platform toolkits that would take these decisions away from us. We have to say, okay, we have to do this all by ourselves, essentially.

Starting point is 00:18:39 Right. Yeah. And so the only cross-platform toolkit we do use actually is Boost. Yeah, Boost doesn't make any UI decisions for you, right? Yeah, exactly. I'm curious to go back a moment when you said that this entire project had been built for Windows originally. I can't even imagine just how much Windows idioms and function calls would have just infiltrated the entire

Starting point is 00:19:07 code base. Yeah, how much time do you spend just de-Windows-ifying the code base before you can even try to take it to the Mac? Quite a, well, quite a lot of time, because it can be little things. It's your file handling, where you just pass handles to files around. It's your geometry rendering library, where in some place you have a cast to some Windows API rect struct. And yeah, these all had to go out. Yeah, that was a lot. That was the boring initial part. And you said at that time you were...

Starting point is 00:19:43 How much? How big was the code base? Between 500,000 and a million. It depends a bit on what you include in the count, I think. Yeah. I couldn't help but note, by the way, you said, I believe you said that you have 800,000 users, which is coincidentally how many lines of code your pre-processed files were earlier. And about, on average, how many lines of code our code has you're gonna have more users as your code base grows yeah so when you um finished kind of you know de-windowsifying it uh making the code

Starting point is 00:20:16 base more more generic um what were some of the first steps to actually bring it to the Mac and start compiling and building it there? Well, I think the first one was rendering. You want to render something so you see something that something is working. But porting the rendering, that wasn't the most interesting part because, well, we had an internal implementation that was well designed. I mean, you assemble triangles and textures that you want to render. And then at some point, you go over them and you pass them on to DirectX. And that mapped pretty directly to what you would do with OpenGL. You just issue OpenGL commands.

Starting point is 00:21:02 I think there were other there were other challenges and i think the the the biggest the biggest overall challenge is that when you want to make your software cross-platform then you have to find the the right abstraction level where you put in this cross-platform interface so you want to have this nice abstraction and here i call some function and then it does something different on windows and macOS and that turned out to be quite difficult to figure out how do you do this and i think a lot of cross-platform toolkits actually don't do this do this correctly and so if you if you look for example as a as motivating example, at the many ways you have to rename a file. Then you have, I don't know, on Windows, you have move file X. On POSIX, you have the basic rename function.

Starting point is 00:21:54 On macOS, there are at least two extensions of that POSIX rename that take more options and let you specify the behavior. And then you have boost file system rename, which is essentially the POSIX rename, and you have cool file rename. And you cannot really say how these map to the capabilities of the operating system. So which of these cross-platform implementations would allow you to say say i want the rename to fail if the target file name already exists um what do they do with the access control lists on your on your disk do they take that over to the new location do they accept the access controllers in your target directory it's not it's not specified but your your writing system has these things.

Starting point is 00:22:46 And if you are developing a desktop application where your application shares the computer with other applications, and you have to be kind of a nice citizen, then you have to answer these questions. And I think this rename example is an example where the cross-platform function that Boost or Qt give you, they're actually true. You can't use them in any meaningful way, at least we couldn't in our application.

Starting point is 00:23:13 You would want an interface that is much higher level that says maybe you say, I want to have a function that creates a settings file that is user specific or staying with a rename example maybe you have the scenario that you want to fill a cache of files so you download files in different processes and they download files they want to cache this file so you download them to some random file name and then you want to rename them to the target file name of your cache. And you want to do that atomically so that you don't partially override the same file or that nobody reads the partial file. So this is a function that you want to have that macOS coincidentally has even. Say, download this file, give it some name I don't care about, and then replace it and

Starting point is 00:24:02 put it in this location so that i can fill my cache and and not have any race conditions this is something you can implement very differently on different operating systems but then you need the full power that that i don't know the windows move file x function has or that the that the mac os rename file rename extension function has the the cross-platform function that we have, they're all too underpowered to even implement this. And so they kind of unified the least common denominator of all the implementations they can find.

Starting point is 00:24:38 And I'm not saying you can actually do better in Boost or the standard, you probably can't, but you can't really use these functions in all scenarios either. You have to take a step back and say, okay, what is the semantics that we need? And then how we implement them on each of our systems. And there were other challenges. File handling was pretty challenging because Windows is relatively special. So, for example, when you have, in a good way, I mean, my respect for the Windows internals grew while I was porting the software to the Mac because they have made a lot of very good and very clever decisions.

Starting point is 00:25:18 So, for example, that you can have temporary files that you can mark as, please delete this file when every process that has this file opened has died and the kernel takes care of this and you don't leak temporary files somewhere you have to take care of delete them later or whatever you can just say please clean up after me even if i crash and this for example something that's pretty hard to emulate um on mac os right because, we tried. We want to clean up after us.

Starting point is 00:25:46 So we tried to do some trickery, but in the end, it's probably not as reliant as doing it on Windows, the simple way that Windows provides. So there were interesting differences in semantics. I'm curious now,

Starting point is 00:26:02 ultimately, how you solved this or what your what your internal code looks like do you have like super high level functions that are like please create a temporary unique named file in this directory and delete it when the process is gone or something like or um yes we often often we create functions like that so we So let's say we have a function, create a temporary file that only I will ever read, so nobody else will ever read it, and delete it when I'm done. Because if you know so specifically that you are the only one reading it, then you can use the simple Unix trick of opening the file and deleting it right away and just keeping the file handle open. Oh, okay. So that's the simple solution. The hard problem occurs when you have multiple readers to the same file

Starting point is 00:26:57 or a reader and a writer to the same file, but you still want to make sure that this file is eventually deleted when everybody who has it opened closes the handle maybe by crashing so that's the hard case and we actually in the end we we try to circumvent this problem on on mac os and say let's do something else let's create a c file where we put our temp files in like a single big file where we write the shared bytes in. So maybe we use a lot of shared memory.

Starting point is 00:27:30 That's what we use it for. We have different processes. So in the end, we're not only added to PowerPoint. We are added to Excel. And then there are a lot of different processes involved because we develop more features. So there is a utility that lets you specify in a JSON file that you want to take a PowerPoint presentation as a template. And then in JSON, you can specify the data that you want to write into some chart, into some chart that you specify by name. So there's a tool that can open this JSON file,

Starting point is 00:28:04 open the PowerPoint presentation, pass all the data onto PowerPoint to our add-in so that we can process this data, update all the charts and save it again. So for scenarios like these, we implement cross inter-process communication using shared memory. And in some cases, there's quite a lot of data that we pass around. So we pass it around in temporary files. And to solve this problem of leaking temporary files on macOS, we actually decided in the end, it's much easier to create one big file and write the shared data in there and then pass a handle to that offset, essentially. And say, OK, you find the data at this offset. And then there's only one file and you can clean that up later you know okay you've crashed you start again you find there's

Starting point is 00:28:50 an old file you can delete this file find the file instead of the mini files exactly exactly although it does sound like you've effectively had to implement your own file system on top of this no it does that sounds so. It's much easier than that because the files are only written once. So it's write one write data and then pass a handle to that and then it becomes read-only and it's much easier. Do you have the problem though where maybe your temp

Starting point is 00:29:15 file continues to grow or do you always get to restart back at zero? We compact it because of course there might be holes in it and at some point we throw the holes away. I guess one thing I'm curious about is if you had started this project fresh and you knew you were going to be aiming for both Windows and Mac, do you think you would still have wound up creating all,

Starting point is 00:29:41 using all these kind of platform APIs? Or would you have tried to use something like boost file system or stood file system and kind of maybe worked around any possible limitations of it no i think we would have because in the end you need those operating specific functions to to answer the question that the operating system has asked you like what shall the access control list be shall i overwrite this file or shall I not? And this is something that you have to say as a programmer. And this boost file system or POSIX rename function,

Starting point is 00:30:12 it works well, of course, if you're just starting programming and you have a toy problem. It will also work okay if you are writing a server application or maybe an embedded application and you're essentially the only user of the entire computer in in those cases i think you can you can it's good to have this function which is easy to use does something very simple uh and and works in a lot of cases but it doesn't work in all cases and maybe we are the exception and now that we are writing good old desktop software. That could be. Nobody else has this problem anymore.

Starting point is 00:30:50 Just ship a web app. I think you commented earlier, you questioned whether or not the standard could do better if it could give you more of the lower level. Obviously, I wasn't around for the standardization of std file system, but my understanding is that they intentionally aimed for those things that are common across all platforms yeah with so you don't have a bunch of os specific flags and whatever in there you mean you don't you don't yeah so that they didn't have to try to have a bunch of yeah exactly and you can't i think cute try to do try to do a little bit better and they have this weird permission api i've never used cute but i looked at the at the

Starting point is 00:31:31 code when we were answering these questions for ourselves and they have a permissions api and then they also had to wonder okay how can a cross-platform permissions api look like and what they decided in the end was that they used the Unix permissions flag and for user group other readable, writable, executable, and then they implement them on Windows. Because the Windows ACL model is so much more complicated that of course you can implement those flags using the Windows ACL. But again, it's not something that, I mean, if you are really on Windows, then again, you have to specify some some security descriptor some acls again this is probably good for the typical simplest use cases but not in all of them

Starting point is 00:32:16 yeah i believe it's been a long time since i've actively programmed in q but i believe that there is a few cases and like you're talking about with file system and process management, that instead of using Q process, I would have to dig down and use like Q win process or whatever, or even go and grab the source code for the Windows one and pull it into my project and modify it to what I needed it to be or whatever. Yeah, should just do the right thing.

Starting point is 00:32:40 Yeah, because I mean, yeah, it's hard otherwise. How should you standardize this? And there were different, I mean, i spoke about file system and temp files and the the different lifetimes and this is something that happened over and over again because windows in general when you create kernel objects in windows that can be temporary files but it can also be shared memory objects or mutexes then they they're all reference counted, which is extremely handy if you're programming. You don't have to worry about this resource. Windows will clean this up for you. And on POSIX systems, they typically aren't. They have kernel persistence, which just means they exist until the computer is restarted. So for example, we used Boost Interprocess

Starting point is 00:33:25 to implement the real shared memory, like a real shared memory segment that you can access using pointers from different processes. And that had the same problem, that they had a Windows implementation where on Windows, the shared memory would automatically be cleaned up for you if all the participating processes had closed down.

Starting point is 00:33:48 So when they restarted, they would get a clean memory segment. And on macOS, Boost implemented this using backing files. And the backing file would stick around even if your process is all trash. And you restart your application, you get the old shared memory. And this is something we had to work around and and made a patch for boost inter process to to find a cleaner implementation like to to unify the semantics i think this was a lot of a lot of times the the challenge when you're writing for different operating systems

Starting point is 00:34:20 that you want to you want them to behave the same way, if at all possible. And for the shared memory, that meant making a patch for Boost into process. The only resource that we could find on Unix that has process lifetime are file locks. I think that's the only thing that is cleaned up if your process dies. So we made an implementation that has proven robust. I think it works okay. Where we use file logs to synchronize access to this backing file. So you have some backing file for your shared memory.

Starting point is 00:34:53 And when the first process comes along and tries to access your shared memory file, you try to open it using an exclusive log, exclusive file log. And if that succeeds, then you know, okay okay i'm the first one to open this shared memory segment so i can delete all its contents and truncate the file and if that doesn't succeed then i try to get a shared lock well in both cases either after truncating or if the exclusive lock doesn't succeed everybody tries to get a shared log in the end and this way you can synchronize and make sure that that you know which process is responsible for initializing the shared memory. So that was easier than the temp file, I think.

Starting point is 00:35:31 Almost exactly the same process that's used for initializing function local statics in the C++. Is that? I didn't know how that was implemented, actually. Because it has to be thread safesafe as of C++11. As soon as the function is entered, it tries to get an exclusive lock, and then it checks to see if the data has been initialized. If it has been initialized, it releases the exclusive lock, continues on with the function.

Starting point is 00:35:58 Well, I thought the idea itself was simple enough, so I was pretty confident in the idea. I think the problem is rather, do file locks on Unix actually work the way you think they work? Or is there some hidden... I have no idea about that. I know there are problems. No, you just Google for it. There are a lot of problems with file logs. So if you have a mounted home directory that is mounted from some server via NFS, you have to mount it with the right options for the file logs to work at all. But I thought that was

Starting point is 00:36:26 still a good enough solution. Different challenges. So at this point, have you centralized all this cross-platform code, or do you still have it sprinkled throughout the code base like the Windows idioms were? The cross-platform you mean in the sense of,oms were? The cross-platform, you mean in the sense of, do we have some cross-platform library that implements all these things? Right, yeah, something that's abstracted away. No, they are implemented where they are needed.

Starting point is 00:36:56 We have some CPP files that would be Windows-specific or Mac-specific. That's like the simplest case. You define the function in the header, and then you have two different implementations. We do have the occasional ifdef. We allow that. I try to get out of hand.

Starting point is 00:37:15 But sure. Sure. I just read the comment maybe on Twitter that code that has a lot of ifdefs is not portable code. It's just code that has been ported a lot. Maybe that's true. I might agree with that statement it's certainly true but i we have so now we have uh windows we have 32-bit windows we have 64-bit windows so that's already two different if devs and then we have mac and on mac we have intel and arm so that's quite a lot of sometimes different code bases,

Starting point is 00:37:46 sometimes different code paths. Okay. I want to interrupt the discussion for just a moment to bring a word from our sponsor, Visual Assist. Visual Assist is used by serious C++ developers across the world. It's got great code generation. Do you need to implement methods from an interface? What about changing a pointer to a smart pointer, even an Unreal Engine smart pointer?

Starting point is 00:38:04 Adding a symbol you've typed but haven't declared? Visual Assist will do these and much more. Plus refactorings, more powerful than the ones included in Visual C++. Or detecting errors in code and suggesting useful corrections. Or navigation, helping you move anywhere in your code and open or locate what you need. Or even the debug extensions. Visual Assist is written by C++ developers for C++ developers. It includes everything you need and nothing you don't. It has a low UI philosophy.

Starting point is 00:38:30 It won't take over your IDE, but will show up when useful. It's there to help, not to advertise itself. Visual Assist is relied on by the developers building software you've used. Whether that's office suites, operating systems, or games, software you use was built with Visual Assist. Get the same tooling for your own development. Visual Assist supports Unreal Engine 4 and many versions of Visual Studio, including VS 2019 and Community. Get it at wholetomato.com.

Starting point is 00:38:56 We haven't talked at all about build systems. Was that a big headache in converting over to the Mac? It was. So we came from Windows, and we were using Visual Studio. a big headache in converting over to the Mac? It was. It was. So we came from Windows. We were using Visual Studio. And we were used to an IDE. We are used to an interactive debugger.

Starting point is 00:39:16 We don't do printf debugging. And so it was clear in some ways, probably given that we have to use Xcode on macOS. And also, I don't think I can see that we are moving to purely make or something as a build system that's not going to happen and we we looked at cmake um i looked at it first a bit and then we also had a quite a big project somebody a new colleague coming in who had experience with cmake, who tried to build our own build system to CMake. And so what we started with and what we're still using now is we have this idea, and of course we have Visual Studio.

Starting point is 00:39:54 And in Visual Studio, we have our Windows-specific build settings. And they are defined in settings files. So if you have those interactive IDEs, there are two different ways to set up your builds, right? You can have the messy way where you have all your files in those project files, and you specify your preprocessor defines your include paths everywhere over the projects for individual files. So that's not the way to do it. Visual Studio and Xcode, they both have supported a good way

Starting point is 00:40:29 where you have like a single settings file or maybe a layer of settings file that can include each other where you can say, okay, these are my common compiler definitions. These are the preprocessor defines I use for every single source file. These are the include paths

Starting point is 00:40:43 I use for every single source file. Maybe in differences for debug and release file. These are the include paths I use for every single source file, maybe in differences for debug and release builds. So in that sense, our business was already very simple. It was a simple settings file, and it was a lot of project files for Visual Studio that

Starting point is 00:40:57 only contain file names. And the build settings are very, very different for Clang and Windows. There's maybe semantic overlap that you want to optimize for size or you don't want to optimize at all. But the practical build systems are completely different. So initially I said, OK, I don't really see what CMake gives us here since we are only building for two IDEs instead of introducing a third build system that we then also have to understand. And what we did is actually write a simple, relatively simple Python script

Starting point is 00:41:29 that would spider through our source tree and look at all the, enumerate all the C++ and header files and just write them into the Xcode and the Visual Studio files. And we did learn more about our native build system. So we did understand, we tried to understand more

Starting point is 00:41:50 about how MSBuild works, which is very powerful in its own right. And at first seems a bit weird because all these settings files and sometimes actual scripts are defined in XML format, which is weird for imperative programming.

Starting point is 00:42:09 But it is very powerful. And what we then did was to make our programming, to make the setup for developers easier. We didn't want to have developers adding their files to build lists like you would in CMake. Compile this file for that target. The script spiders and takes care of that. And then we have, by now, a relatively powerful name matching set.

Starting point is 00:42:37 So maybe you have a file for a shader. Maybe it's a Windows shader. Maybe it's a Mac shader. We have resources that we compile into our binary. We have platform-specific files that just have a file-specific ending

Starting point is 00:42:53 underscore Mac or underscore Win. And the spider script takes care of all of this and matches file names and can discover, okay, this file is Windows 32 specific. This file is Mac specific. That's a shader. And then it can create the correct build steps and add them in the correct way, in the MS

Starting point is 00:43:12 build way to Visual Studio and in the Xcode way. And I think in that way, we have separated the platform specific parts in the settings files and the common part, which is really just spidering and the definition of name suffixes. And I think in spirit, it's pretty similar to what this JIP, G-Y-P, from Google does. It looks a bit like it. I think the JIP project doesn't really work outside Google.

Starting point is 00:43:43 I think last time I checked, they had some Google specifics, I think. But in spirit, it was like that. I'm not sure if anyone's used JIP outside of a Google project. The documentation didn't look like it. Shouldn't trash on it. And we're laughing about it because of the... We've talked about it recently.

Starting point is 00:43:57 Yeah, recent interaction with Patricia. Okay, because she was building the Chrome. Yeah, and I don't even remember if jip came up there but i just started laughing because when you said jip is one of google or you said it's a google build system i'm like i'm not basil the other one yeah there's like six of them everybody has his own so yeah it's a our build system is a bit idiosyncratic but we have it's like a 600 lines python script so the complexity is manageable and that has been stable. Somebody commented at some time,

Starting point is 00:44:30 said to us, okay, yeah, that's not a good idea because everybody tries to do that. And then in the end, you start redeveloping CMake. But that hasn't happened for the last few years. And in another life, I wrote and maintained a very similar script for the exact same purpose where my team started on Visual Studio and then eventually moved to iOS and Android. And we had a Ruby script that would parse out from the MS build scripts and update or create Xcode project files from it.

Starting point is 00:44:58 So it's been done a couple of times, but it works. It works, right? It's not so, I mean, the times I have to touch this script are far and few in between. I don't know. I think it's possible. I wrote automake code, or a script that took automake code

Starting point is 00:45:12 and made a Visual Studio project from it. So, yeah. I wonder if, I mean, I guess if anyone wanted to make their own script in the open source community that did this, it would just wind up being CMake. It would, yeah. Okay.

Starting point is 00:45:27 Yeah. So one thing we haven't talked about yet, and I want to make sure we don't run out of time, is you're also now targeting WebAssembly with this code base as well. Can you tell us a little bit about that? Yes, we can. We started with small web projects. So we don't build the entire code base for WebAssembly. But we thought about it.

Starting point is 00:45:49 We thought about Office, a lot of different Office platforms, Google Office or Microsoft Office, moving to the web. So the idea came up, okay, what happens if customers do this? that we have to have some browser extension or something that lets you edit, create better charts in Google Slides or in PowerPoint on the web. And then we wrote a little Google extension just for trying things out. That lets you, you can connect, there's this big data platform, tableau.com, where you can create data visualizations. And we wanted to let users connect their PowerPoint charts to this data source. And for that purpose, we built

Starting point is 00:46:30 a little extension. And we started building it in JavaScript. Somebody used... We tried experimenting. Somebody used React. Then we re-implemented it in TypeScript and thought, okay, this is definitely much nicer to work with, especially

Starting point is 00:46:46 when you're coming from C++. But then we ran into the problem that there was some code sharing that we couldn't do. Maybe it was just simple enums, usually, maybe, or a simple struct where you say, okay, now I'm building this data structure here in TypeScript, and then I send it over as a JSON string, probably, but to our application, and then I send it over as a JSON string probably but to our application and then I have to redeclare the same data structure and make sure the enums are in sync so we thought okay how can we do this in in WebAssembly and when you do this in WebAssembly then it's very easy and Emscripten is great for that if you just have a C++ code base that you're

Starting point is 00:47:22 compiling for WebAssembly but we have have to interface with JavaScript libraries. So this Tableau.com web app has a native JavaScript library that we want to use to get the data out of the system. And when you're interfacing with JavaScript libraries from WebAssembly, then all the type safety is gone again. And you send the method name essentially by string, which is very slow. And then Emscript parses the 8-bit characters in memory and reassembles a JavaScript string from that and then tries to dispatch that JavaScript function. And we knew from development in TypeScript that there's this awesome type repository on NPM, somebody hosted on GitHub, where they have type interface definitions in TypeScript for a lot of JavaScript libraries, for the entire DOM tree, for the standard JavaScript libraries, but also for your special libraries like the Tableau API, the Google Extension API, etc. So you have a type-safe interface description.

Starting point is 00:48:31 And we thought, okay, it would be cool if we could take these two things together and use the TypeScript interface definition language from WebAssembly. And one of my colleagues, he did his masterpieces, and he developed the systems, and then I took it over.

Starting point is 00:48:49 And what we did is just say, okay, let's take this TypeScript compiler that Microsoft provides and parse this interface definition in TypeScript using this TypeScript compiler, and then generate a C++ header for that and implementational steps. So let's say you have a very simple API, something in the DOM tree. I don't know, you want to access the window. What is it? I don't know, window location, maybe the location property in your global window from Emscripten, like ignoring that Emscripten provides some APIs to access the standard DOM, you could make a call that passes the window string. So you get the window object,

Starting point is 00:49:34 and then you get an opaque object back. And then you can call the location property again as a string on that window object, and then you get something back. And instead, we would take this TypeScript definition, say, okay, there's a global window object, that global window object has a property location, which has a specific type. And then we declare a header file that re-declares essentially this type relation in C++. We have a window object, the window object is a property location, and that location has this type. And then the stub that we currently generate is the Emscripten call.

Starting point is 00:50:10 That's the untypesafe Emscripten call that passes a string for the property name. But we have wrapped it in something typesafe. And this project is on GitHub and it is currently strong enough. It's almost self-hosting. So it's powerful enough to

Starting point is 00:50:26 parse the TypeScript. Let me phrase that correctly. It's powerful enough to parse the interface definition for the TypeScript compiler itself that it then includes. So this is the API we use, and the project already compiles to WebAssembly. And it's essentially a pre-compiler. So you can take your own TypeScript code. You can take anybody's type definition, or you can take your own JavaScript library or TypeScript library, generate this interface definition, generate the C++ stub for it, and then use it from WebAssembly in a type-safe manner. Sounds really useful. So this is TCJS on from WebAssembly in TypeSafe manner. Sounds really useful.

Starting point is 00:51:05 So this is TCJS on GitHub, right? TCJS, yeah. I thought about renaming it to TypeScript. That would be a much better name. TypeScriptum? TypeScriptum, yeah. That would be good, actually. Yeah, you should rename it. So how far along

Starting point is 00:51:24 is the WebAssembly project now? Are you able to use this now? It is on the stack of things on my computer that I haven't yet pushed to the repository. So I think it's not quite the top of stack. There's something above that. And then I hope to get back to it. But the project as it is on GitHub

Starting point is 00:51:44 is already a pretty good compiler for those interface definition things. It's interesting. First of all, it's always a nice thing if you're starting to develop on a new code base like this compiler instead of the 800,000 lines of code code base.

Starting point is 00:52:03 And then there are interesting questions like TypeScript supports generics, a bit like templates in C++, but not quite. But now you have to translate that to C++ in a meaningful way. I mean, you don't have to do a perfect match as long as it solves the problem.

Starting point is 00:52:23 There are naming collisions that TypeScript supports that C++ cannot support, and we have to maybe just introduce some suffix to disambiguate those names. You encounter those mismatches between the languages. Or interesting choices that TypeScript made to represent JavaScript in some TypeSafe manner. I should at some point actually spend some time with TypeScript.

Starting point is 00:52:48 Have you used it at all, Rob? No, I haven't. Just know about it from what we've learned from your cousin. Yeah. Okay. You don't need to anymore. Using our tool, you can just use C++. Yeah. It does sound very powerful.

Starting point is 00:53:05 That's awesome. Okay. Well, Sebastian, is there anything else you want to plug or tell us about before we let you go today? I think somebody is always happy when I'm plugging. We're always looking for developers, of course. Other than that, you can just write to HR. I think so, of course.

Starting point is 00:53:23 Or look at our website. No, other than that you just mentioned you're looking for developers and we've got a couple minutes left because this comes up on the show are you looking for remote developers we we noticed in the video here that there's was one other person in the office with you true but socially people who can relocate to berlin or yes practically yes okay um i mean now most of my colleagues are at home of course but we in in general we are a pretty small company um 20 to 30 developers now i would guess and yes we the way we work we still work together and we still value this uh value this walking over to somebody else's office and asking a question. Okay.

Starting point is 00:54:10 Well, it's been great having you on the show again today, Sebastian. Thank you for having me. It was great being on the show. Thanks. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in. Or if you have a suggestion for a topic, we'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can

Starting point is 00:54:28 email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - Going Cross Platform

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.