CppCast - SYCL

Episode Date: August 24, 2018

Rob and Jason are joined by Gordon Brown to discuss his work on SYCL the OpenCL abstraction layer for C++. Gordon is a senior software engineer at Codeplay Software in Edinburgh, specialising ...in designing and implementing heterogeneous programming models for C++. Gordon spends his days working on ComputeCpp; Codeplay's implementation of SYCL and contributing to various standards bodies including the Khronos group and ISO C++. Gordon also co-organises the Edinburgh C++ user group and occasionally blogs about C++. In his spare time, Gordon enjoys dabbling in game development, board games and walking with his two dogs. News CppCon 2018 Poster Program Announced A bug in the C++ Standard Synapse submitted for Boost review New C++ London Uni Course Sept 18 Gordon Brown @AerialMantis Gordon Brown's blog Links SYCL ComputeCpp Parallel Programming with Modern C++: from CPU to GPU P0443r7: A Unified Executors Proposal for C++ CppCon 2017: Gordon Brown "Designing a Unified Interface for Execution" SYCL building blocks for C++ libraries - Gordon Brown - Meeting C++ 2016 Sponsors PVS-Studio February 31 Patreon CppCast Patreon Hosts @robwirving @lefticus

Transcript
Discussion (0)
Starting point is 00:00:00 Episode 164 of CppCast with guest Gordon Brown recorded August 22, 2018. today at viva64.com. In this episode, we discuss CppCon posters and the new Boost library. Then we talk to Gordon Brown from Codeplay. Gordon talks to us about Sickle and his contributions to the OpenCL abstraction layer. Welcome to episode 164 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Bervering.
Starting point is 00:01:21 Joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing? I'm your host, Rob Bervink. Joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing? I'm doing good. Down to just over one month until CBPCon, right? Yes. Wow.
Starting point is 00:01:33 That's about right. So I figured I should probably go ahead and mention, because I haven't spent too much time talking about it, that, well, I have said I am giving two talks. The first one, I mean, the schedule is officially out now. On Monday, surprises in object lifetime. And then on Friday, so I kind of get the chance to book in the conference, is applied best practices. The concept is I just started a project from scratch
Starting point is 00:02:02 and I want to use all the best practices that I teach. What happens? Right. And so then on that note, I am doing two days of best practices training at the end of the conference as well. And there is still time to sign up for classes. And we'll talk about more about classes with our guests today. Yeah, definitely. Okay.
Starting point is 00:02:21 Well, yeah, you should definitely sign up for Jason's class and check out all the CPPCon classes that are available. There is a large set of offerings this year, for sure. Yeah. Well, top of our episode, I'd like to read a piece of feedback. This week we got an email from Henrik. He says, Hi from Norway. Love your podcast.
Starting point is 00:02:41 Keep up the good work. As an ex-troll, I was hoping to promote pronouncing cute the right way, which he phonetically spelled out as cute. I'm reading that the right way, right, Jason? Yeah, as far as I know. And for the record, I think most people do know that it's cute, but we all feel a little weird saying it because we're not sure if other people will know what we're talking about. Right. It's sometimes just easier to say the letter so people know exactly what we're talking about. Right. It's sometimes just easier, easier to say the letter. So people know exactly what you're saying,
Starting point is 00:03:08 but like I'm using a cute gooey toolkit. Okay. Which one? Like it becomes like a, at least in English. Anyhow. Yeah. Yeah.
Starting point is 00:03:17 It can be, can be confusing, but I will try to remember this note and I will try to remember to pronounce it as cute whenever we, uh, talk about the GUI framework on the show. We'd love to hear your thoughts about the show.
Starting point is 00:03:30 You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com, and don't forget to leave us a review on iTunes. Joining us today is Gordon Brown. Gordon is a senior software engineer at CodePlay Software in Edinburgh, specializing in designing and implementing heterogeneous programming models for C++.
Starting point is 00:03:48 Gordon spends his day working on ComputeCPP, Codeplay's implementation of SYCL, and contributing to various standards bodies, including the Kronos Group and ISO C++. Gordon also co-organizes the Edinburgh C++ User Group and occasionally blogs about C++. In his spare time, Gordon enjoys dabbling in game development, board games, and walking with his two dogs. Gordon, welcome to the show. Hi there. Thanks very much for having me on. So this is, if our listeners are not paying attention here, keeping track, the third guest from Codeplay
Starting point is 00:04:21 that we have had on in as many months. Now, this was not planned from our perspective, but we have to know if you all secretly planned this and didn't tell us. It wasn't intentional. It just kind of happened that way. So I've been
Starting point is 00:04:37 meaning to come on for a while, but work kept getting in the way. so it all just coincidentally lined up like this basically yeah so but when i sent the email to uh to ask about coming on i think i didn't realize that chris and simon were were also planning to come on in the same month yeah it does seem like there's a lot of interesting things going on there at code play and which of course we'll talk about more what you are doing there when we get into the interview.
Starting point is 00:05:07 But it seems like you all are having a good time, anyhow. Yeah, there's a lot of really interesting things happening, a lot of interesting projects, and there's a lot of people that are very involved in C++ and the standards in the community. That's cool. Okay, well, we've got a couple of news articles to discuss, and feel free to come
Starting point is 00:05:25 out to any of these then we'll talk more about the work of doing a code play and sickle okay okay so this first one uh we're mentioning cpp con already they just announced the poster program and they have chosen uh i think 16 different entries for this year yeah how is that a lot i mean it seems like, I remember two years ago when I went, I think there were only like five or six maybe. Yeah, two years ago, I was a judge, and I think there was like six, and it was impossibly difficult
Starting point is 00:05:54 to choose who the winners were. I cannot even imagine what it would be like doing these 15 or 16 or whatever it is. And they do have all the poster titles and the authors of those posters. Um, I see at least one previous CPP cast guest on here.
Starting point is 00:06:13 Uh, Eleanor Sagalava. I'm not probably not saying that right, but, she's the one who made the C++ lanes. And I guess she's making a poster based on that. Yeah, that looks like fun.
Starting point is 00:06:23 Uh, I recognized her. And then the, the only other project I recognize is cl looks like fun. I recognized her, and then the only other project I recognize is Cling Power Tools, I think. Everything else looks like it's going to be interesting, pretty unique, as far as stuff that I'm aware of.
Starting point is 00:06:36 Yeah, there's a lot of variety there. There's a lot of really interesting posters. I think there's one I noticed on memory tagging for improving safety, that sounds quite interesting hmm I'd also like to know
Starting point is 00:06:51 what Funky Pools is about it's a very abstract poster title and it's in all caps too it's not just Funky Pools, it's FUNKY POOLS yeah okay, next thing we have is this blog post from Borislav Stemrov
Starting point is 00:07:10 and he writes about a bug in the C++ standard although after his initial post he got a lot of comments and feedback and realized that it's not a bug in the C++ standard itself it was a bug in I think Clang and GCC's implementation.
Starting point is 00:07:28 Spoilers, Rob. Spoilers. I apologize. No, I'm just kidding. It's not actually technically a bug in the implementation. It's a missing feature in the Itanium C++ ABI that Clang and GCC follow for 64-bit builds. Yeah. But it was, I think, a really good example of the community kind of helping someone out because he went into this whole thing of what he thought was a bug and got all this nice feedback.
Starting point is 00:07:56 And people also pointed out some really nice ways you can completely avoid this using C++17. Yes, as usual, Lambdas come to the rescue. I think I have like 12 episodes of C++ Weekly on lambdas now and like three more in the queue getting ready to come up. There's a lot that you can talk about
Starting point is 00:08:17 with lambdas and a lot of ways you can use them that people don't necessarily think about. Did you look through this post, Gordon? Yeah, I had to look at it. It's very interesting, actually. I think I had to look at it quite carefully. I think member function pointer template parameters
Starting point is 00:08:36 isn't something I've used a great deal. But yeah, it's a very, very interesting use case. Yeah, I think it's really good that people post blog posts like this, because especially for quite niche use cases in the standard, like this kind of thing, it's good to kind of spread the knowledge and help identify if there is a defect in the standard or a bug in an implementation or something like that.
Starting point is 00:09:09 Well, it seemed like... I'm sorry, go ahead. No, that's okay. It seemed like a great way to really get people's attention and find out what the actual answer is. A bug in the standard! It's kind of like saying, I bet no one can solve this problem,
Starting point is 00:09:23 and then waiting to see who solves the problem for you. Yeah, that's true. I'm not sure if that was his intention, but it worked out quite well for him. Yeah, I don't think it was, but it definitely worked for him. Yeah. Okay, next thing we have is Synapse, which is a new library that was submitted for boost review. And this is like a signals library i think similar
Starting point is 00:09:48 to cute signals and slots right as far as i can tell and also similar to boost signal yeah or boost slot what's it called uh that's a interesting point i forgot that there was a boost signals library so how does this one differentiate to the other one? Do you know? There is, um, and I've completely lost it now, but I had it up before we started the interview. There is at the bottom of the documentation, a alternatives to synapse. There it is compared to signals too. Um, honestly, I'm having a hard time really differentiating the use case differences between the two of them, having even read the chart. I don't know, Gordon, do you have any insight on this? To be honest, I've not actually used the Boost Signal library before.
Starting point is 00:10:34 I've not really used signaling libraries like this. It looks really interesting, though. One thing I thought was quite nice is that it seems to register the signals at compile time. I'm not sure if that's something that Boost Signal does. I've not really looked into that. I don't think you can do that with Boost Signals. A couple of things stood out
Starting point is 00:10:58 to me. It does seem like if you really want to do static initialization of things then they have macros to help with that. I don't love macros, but there's uses for them sometimes. And it seems to require that your signal and slot mechanism use shared pointer, which I'm still a little bit lost on also. Hopefully I'm reading that wrong. Yeah, I noticed that the examples used shared pointer.
Starting point is 00:11:23 I wasn't sure if that was a requirement or just a typical example it says by default Synapse uses the following BoostSmart pointer components shared pointer, weak pointer, make shared and get deleter so it kind of implies that it needs some sort of shared pointer functionality I guess I'm looking through the comparison with BoostSignals 2 right now
Starting point is 00:11:46 and one comparison that they first point out is with BoostSignals 2 they use a signal of type T whereas Synapse uses a C function pointer type def and maybe the reason for that is that they could work with C callback APIs because that is another difference they highlight That's a good point, yeah That's probably the main driver for that is that they could work with C callback APIs because that is another difference they highlight. That's a good point, yeah. That's probably the main driver for that. So if you want to work with a C library,
Starting point is 00:12:12 then you would want to use Synapse because you couldn't use signals too, I guess. I think the last time I had to directly deal with that was when I was using P threads, which I strongly avoid using these days since thread gets standardized. Yeah. Okay, and the last thing I wanted to mention
Starting point is 00:12:32 was we had several guests on a couple months ago from C++ London Uni, including Tom Brezza, and he reached out to me that they're going to be starting a new set of courses. And I believe the first course will be September 18th.
Starting point is 00:12:49 So if you heard about that and you want to get in on the beginning of a new cycle of courses, then September 18th is the day to do it. Okay. So, Gordon, why don't we start off by talking about Sickle. Can you give us an intro to what SYCL is? Yeah, sure. So, SYCL is an open standard from the Kronos Group, which essentially defines a single-source C++ programming model for programming heterogeneous systems. So, that's systems with non-CPU devices like a GPU,
Starting point is 00:13:26 FPGA, DSP, other kind of accelerators. So for those who aren't familiar with the Kronos group, they're essentially a consortium of companies similar to ISO C++ Committee, which essentially work together to publish these open
Starting point is 00:13:42 standards that programmers can develop to and companies can essentially implement and gain conformance of the standard. So most people will be familiar with OpenGL and Vulkan. They're both standards from the Kronos Group. Another standard from the Kronos Group is OpenCL. This is similar to OpenGL,
Starting point is 00:14:12 but it's a programming model for heterogeneous systems for doing general-purpose compute on GPUs, FPGAs, etc. However, OpenCL is a C API with a C programming language. It's also a separate source. So you have your host application code and then your device code, which is run on a GPU or another device, separate. And then the host API is then used to load in that source, whether it be a string or another external file, and compile it and lower it down to some instruction set for that device online as part of your application.
Starting point is 00:14:54 So it's a very low-level API. One of the things people find with OpenCL is that it has quite a high barrier to entry. Even the Hello World application can take quite a lot of code to get going, and iterating over different approaches to a problem can take quite a lot of effort,
Starting point is 00:15:17 because there's a lot of API calls, a lot of things you have to change if you want to do something different. So, the aim of SYCL is to... It's another standard from the Kronos group, again. So the SYCL aims to sort of bring together the portability of OpenCL, so they built this ability to run code on different architectures like GPUs and FPGs, et cetera,
Starting point is 00:15:41 but with a more modern C++ programming model. So effectively, in SQL, when you write your code, it essentially allows you to write your code that will execute on a GPU or another device as a lambda or a function object within the same application. So this is the idea of single source.
Starting point is 00:15:59 So rather than having your separate host application code and your device code, it's all within the same code, your same application code. This gives you a lot of nice features, such as type safety across host and device code. You can template instantiate across host and device. And you can also have a much higher level C++ interface. So, yeah, so SQL provides a more high level C++ interface. So, yeah, so SQL provides a more high-level C++ interface
Starting point is 00:16:28 on top of the portability of OpenCL, but also provides some additional features. So it has this concept of separating out the storage and access of data, allowing SQL runtime to essentially perform data dependency analysis in order to more effectively schedule work and optimize data locality and movement. So, for example, when you describe the functions you want to run on a GPU, for example, you can say whether you want different functions to read or write from different memory,
Starting point is 00:17:08 and it can do a lot of clever optimizations through there. So a lot of the kind of... In traditional OpenCL, you'd have to spend a lot of boilerplate code to do event management and data movement. So it kind of automatically handles a lot of that for you, which makes the iterative process of creating an application for a GPU or another device a lot quicker. So you said it provides a higher-level API compared to OpenCL, right?
Starting point is 00:17:43 Does it require OpenCL? Does it build upon OpenCL, right? Does it require OpenCL? Does it build upon OpenCL? Yes, so SitCycle is based on top of OpenCL. So it requires, essentially, it can run on any platform that supports OpenCL. Okay. And you also mentioned, like, you said in OpenCL, Hello World has a lot of boilerplate.
Starting point is 00:18:06 And that made me wonder, what in the world exactly does a Hello World on a project like this look like? What are you actually doing in a Hello World program? The typical Hello World on something like a GPU is usually something like a vector add. So it's like sort like taking two input arrays and adding the values together and assigning them to some output array. But even for an application like that, there's a lot of
Starting point is 00:18:35 work in essentially discovering the topology of your systems or figuring out what devices you have, configuring a context and a queue for executing work. And then you have to create the a context and a queue for executing work. And then you have to create the buffer objects and call APIs for copying them to and from the device, compiling your kernel. So the device functions in OpenCL are referred to as kernels.
Starting point is 00:19:01 Okay. So how does that compare with SQL then? So with SQL, what we aim to do is create a more high-level interface where all of the functionality that you'd have with OpenCL is still available, but we have a lot of the defaults for the most common configurations. So for example, you can go into a lot of detail querying the different devices available on your system, configuring them in a very specific way if that's what you want to do. We also have simple mechanisms for quickly picking a GPU. So for example, if you're running on a system which only has a CPU and a GPU, you can create these function objects called selectors, which effectively say, I want a GPU and I want a CPU. And they can be effectively used just to create, effectively in one line,
Starting point is 00:19:56 create a queue that can run work on that device. Okay. But we have the more flexible options for the more complex configurations if that's what you need. So what kind of platforms are supported by SYCL and OpenCL? Is it basically anything that has a modern GPU or do you need specialized hardware? So essentially SYCL can run on any platform that supports OpenCL. The list of different platforms that supports OpenCL. The list of different platforms that supports OpenCL
Starting point is 00:20:28 is very long now. Everything from desktop, mobile, embedded platforms, HPC. The list of platforms that don't support OpenCL is a lot shorter. Generally, for portability,
Starting point is 00:20:48 there's another standard from the Khronos group called SPIR. It's the standard portable intermediate representation. It's essentially a portable IR that can be used to, essentially what you do is you compile them to this IR and it can make a lot easier. So you'd ship that IR rather than shipping source. So that could be used. It's a lot easier for platforms to support the standard intermediate representation because that's then just lowered down to some platform-specific instruction set. But the SQL doesn't mandate that you have to use SPIR. Essentially, SQL will work on any binary format that is supported by an OpenCL implementation.
Starting point is 00:21:29 So, for example, NVIDIA's OpenCL implementation supports PTX, their binary format. So you can have a SYCL implementation which compiles to PTX and then runs an OpenCL through that. So it's very flexible. So what is your specific involvement with this SYCL project? So I've actually been involved in SYCL
Starting point is 00:21:53 from almost the very beginning. So that's almost six years ago now. Wow. Back then it was actually going by a different name. It was the OpenCL high-level model. Okay. So then, yeah, so the group for what would become SYCL was formed about six years ago. At the time, Codeplay was working on a version of our offload compiler,
Starting point is 00:22:19 which was essentially... That's not the name of it. Offload compiler? Yeah, that was the name of our compiler at the time. Awful Compiler? No, Offload. Oh, Offload. Oh, okay. I also heard Awful.
Starting point is 00:22:38 Yeah, so we were working on a version of our Offload Compiler tick, which would essentially do single-source compilation for OpenCL. So this was very suitable to what the Kronos group wanted. So we proposed this as a way of achieving this kind of high-level programming model for OpenCL. So over the years, I've been working on implementing Codeplay's implementation. It's called ComputeCPP. And contributing to the standards document and the
Starting point is 00:23:10 conformance test suite up until now. I often, well, I wouldn't say often, it does come up where the C++ Standards Committee will approve something and then the implementers that we know and tweet often will say,
Starting point is 00:23:27 I just tried to implement this thing and found out it's impossible. Have you done that? Have you put something in the spec yourself that you then realized was impossible to implement? So we try to avoid that by prototyping everything so everything that we propose for the standard we tend to prototype and test and make sure that
Starting point is 00:23:51 it works well although it's not it's impossible to avoid any to catch everything there's always some issues that kind of sneak through so there have been times where we some issues that kind of sneak through. So there have been times where we've had to kind of rethink things slightly to improve something that didn't work as well as we intended.
Starting point is 00:24:16 It was more difficult in the beginning because when you're creating a standard from scratch, you have to implement the entire thing, so it's very fast-moving. In the beginning, things did change quite a lot. Before the first version of SYCL, things were changing a lot because we were iterating over it, trying to find the best solutions. But now that
Starting point is 00:24:37 the standards is out, it's not changing. We're introducing new features to improve on it, but everything's been prototyped to make sure that it's implementable. Okay. I wanted to interrupt the
Starting point is 00:24:56 discussion for just a moment to bring you a word from our sponsors. PVS Studio is a static code analyzer that can find bugs in the source code of C, C++, and C-sharp programs. The analyzer is able to classify errors according to the common weakness enumeration, so it can help remove many defects in code before they become vulnerabilities. The analyzer uses many special techniques to identify even the most complex and non-obvious
Starting point is 00:25:18 bugs. For example, data flow analysis mechanisms, which is implemented in the tool, has detected a very interesting bug in the protobuff project. You can read about this bug in the article February 31st. The link is under the description of this podcast episode. The analyzer works under Windows, Linux, and macOS environments. PVS Studio supports analysis of projects that are intended to be built by such common compilers as Visual C++, GCC, Clang, ARM compiler, and so on. The analyzer supports different usage scenarios. For example, you can integrate it with Visual Studio and with SonarCube. The blame notifier tool can notify developers about errors that PVS Studio detects during night run by mail.
Starting point is 00:26:02 You can get acquainted with PVS Studio features in detail on viva64.com. So we had, as we mentioned, 7Brand on recently, but we also had him on about a year ago, back in May. I think we talked about Sickle a little bit then. What's changed over the past year with Sickle? So the biggest change since then is that there's a new version of Sickle,
Starting point is 00:26:22 Sickle 1.2.1. So this introduces some new features to essentially better support some of the ecosystem projects that we have with SYCL, things like Eigen and TensorFlow, and things like Blast libraries. Because when we were, while implementing ComputeCPP, our implementation, we effectively had an evaluator program
Starting point is 00:26:46 where selected people would get early access to our pre-alpha version of ComputeCPP so that they could give feedback. And a lot of that feedback helped add improvements and additions to the standard. So the second 1.2.1 came out I think shortly after Simon was on. It wasn't too long
Starting point is 00:27:10 ago, only a couple of months ago. So what is the future direction for SYCL that you're working on? So future direction, we're aiming to integrate SYCL even closer with C++.
Starting point is 00:27:26 While we're standardizing SYCL, C++ is moving forward. We want to bring SYCL up to date with some more modern C++ features, C++ 17 and eventually C++ 20. and we'll also continue to look to introduce new features to help improve the way people create their workflows and how they can be applied to different architectures. So we're always looking for ways to try and improve things and make things easier for developers to use. We want to aim to have a very close alignment with the C++ standards. So if there's something that
Starting point is 00:28:09 the C++ standard is introducing that we would use in SYCL rather than standardizing something entirely new that does effectively the same thing we would use the features of C++. We don't want to reinvent the wheel. When we see the C++ standard moving in a particular direction
Starting point is 00:28:27 we'll try to move along with that so we don't want to be going against the direction of the standard for C++. Since you mentioned that then, what features from C++ 17 or C++ 20 are currently affecting SQL?
Starting point is 00:28:45 How do you mean affecting? Well, things that you want to take into account, you want to use instead of standardizing yourself, whatever. So some features are useful just in terms of improving what you can do in device code. So things like generic lambdas, new things like templating lambdas, that would be quite interesting. More support for different forms of templates.
Starting point is 00:29:17 But also, we want to align quite closely with the executors, and also introducing things like futures and promises. I'll say that the C++ committee is still debating over what different kind of futures will be used in C++. So that's something we're watching very closely to see how SQL will interact with that. What kinds of futures? That sounds very, I don't know, metaphysical or something at that rate. Well, we've mentioned executors on the show, I don't know, several times, Rob,
Starting point is 00:29:58 but I feel like I don't personally have a very clear picture for what that will actually mean to the standard and how it would affect a project like Sickle. Can you go into that? Yeah. So there's quite a few different domains that executors are aiming to support. It's one of the reasons that standardizing them has proven so difficult.
Starting point is 00:30:19 So from our perspective, our domain is the open standard heterogeneity system, so things like OpenCL and SYCL. There are a lot of other domains, like networking, distributed systems, parallel algorithms,
Starting point is 00:30:36 asynchronous tasking. So all of these different ways of executing work and different requirements have to be unified into a single approach so everyone can write their tasks and the work they want to perform in a unified way but still be able to take advantage of something like
Starting point is 00:31:00 an open sale platform but also network devices and that sort of thing, and standard thread pools. So you have some sort of tasks you want to perform, and if I understand correctly, you have some sort of requirements for what kind of hardware needs to be used
Starting point is 00:31:25 for this task to be executed, and then you pass that off to an executor and you say, go make this thing happen? Yeah, effectively, yeah. So the current design focuses around this idea of properties where effectively you can require different properties of an executor. So things like whether it does one-way or two-way execution, so whether it returns futures or not,
Starting point is 00:31:54 whether it's blocking or non-blocking, how it maps to different threads or potentially GPU threads or some other device. All of these kind of properties are properties of an executor. So you can request different kind of properties and then essentially if the implementation supports those, that configuration will return an executor that can do that kind of work. Okay. So an implementation would provide sort of different types of executors, and those executors can be customized and adapted to work in different kinds of ways.
Starting point is 00:32:34 So now you said you contribute to both the Kronos group and ISO C++. Are you involved with the executors' work for C++? Yes, so I've been involved in the executor's work for the last two years. So when I joined, I think there had already been work going on for about four years before that. So there were quite a few proposals already in the works. But when I joined, essentially the committee were looking for to bring all the different proposals together and come up with a unified design
Starting point is 00:33:10 that satisfied everyone's requirements. So I was invited in for the perspective of open standard heterogeneous domain. Okay. So, yeah. So how is that unification work looking? Do you think it'll
Starting point is 00:33:30 make it? It's been going very well. So whether or not it would make it into C++ 20, you mean? Yeah. Or ever. So that's a very interesting question.
Starting point is 00:33:46 The short answer is I don't really know. Yeah. Or ever. So that's a very interesting question. The short answer is I don't really know. Okay. So the last meeting at Rapperswil, the new directions group made executors a priority for C++20, and there's been some special joint LEWG and SG1 meetings scheduled at CppCon to work on this. So everyone's working very hard to try and get to a final paper that could potentially be introduced to C++20. But then the alternative is if it's not ready in time, it would likely go into TS, which would be targeted for C++23. It depends on if we can make enough progress by the next meetings.
Starting point is 00:34:36 Right. Do you plan on attending that special meeting that's going to be held during CppCon? Yes, I'll be at the meeting at CppCon. I'm not sure if I'll be at the following meeting at San Diego. I'll be at the CppCon? Yes, I'll be at the meeting at CppCon. I'm not sure if I'll be at the following meeting at San Diego. I'll be at the CppCon. Okay. So, on the topic of CppCon, you have a training
Starting point is 00:34:56 that you are going to be giving also at the end of the conference, is that correct? Yeah. Myself and Michael Wong will be doing a two-day class at CppCon on parallel programming in modern C++ from CPU to GPU. The aim of the course, the class is to essentially teach people the fundamentals of parallelism and how they can apply them to CPU and GPU architectures and how to take advantage of different kinds of parallelism
Starting point is 00:35:26 and how to solve the kind of challenges you encounter along the way. So it'll likely take a pattern-focused approach, so we'll be looking at different parallel algorithms and how you would implement them, how you'd solve certain challenges when trying to parallelize algorithms. So the idea is to have people leave with essentially the set of tools you need to be able to go and parallelize code in your own projects. So what kind of tools or C++ standards will you be teaching to?
Starting point is 00:36:06 Is it safe to assume SQL will be used? Yes, so SQL will be used as part of it. We're trying to focus mainly on existing standards, so not using anything that's still in development. So we'll be looking at C++ threads and also SYCL. We'll probably touch on SIMD parallelism to some extent. So there is a proposal for SIMD vectors in the upcoming Parallels MTS2, but that's still in development, so we probably won't go into that.
Starting point is 00:36:52 There's no implementations of... Well, there's an implementation of it, but there's no standard implementations yet. So we probably won't go into that in too much detail. Okay. And then in addition to the class, you're also going to be doing a talk with Michael? Yes, so we'll be doing a talk on SYCL. So the aim of the talk will be essentially covering
Starting point is 00:37:18 a lot of what I've been talking about today, but in more detail, looking at how SQL works and the different features of it and how you can use the different features to take advantage of GPUs and other
Starting point is 00:37:34 heterogeneous devices and C++ applications. Is this going to be a live coding kind of demonstration? I'm still working through exactly how the talk will go, but I'm hoping to have the talk kind of follow implementing some kind of parallel algorithm. So I might try to do a kind of live demo
Starting point is 00:38:02 of running the final application, but I probably won't do the whole thing live. Okay. I'm not sure if I'm that brave. And I also wanted to ask, you're working on a C++ standards proposal. Is that correct? Yes.
Starting point is 00:38:25 So one of the other proposals that I'm working on at C++ standards proposal. Is that correct? Yes. So one of the other proposals that I'm working on at the moment is to introduce affinity to C++. So this is essentially on the back of the Unified Executors proposal.
Starting point is 00:38:43 But it's kind of expanding on a bit more to provide support for essentially querying the topology of the system that you're on and kind of information about different processors, different memory regions, and being able to essentially tie execution and memory allocation to those tie execution and memory allocation
Starting point is 00:39:05 to those different processors and memory regions. So it will allow you to do things like take advantage of NUMA systems or distributed systems, but also it will be looking more towards scaling up to heterogeneous systems as well. So would you mind going into some more details for our listeners for what the advantage to tying affinity to a particular process to a particular system in a NUMA environment would be? If you write an application that, say, runs across multiple processors, across a sort of a NUMA domain, so a memory that's a single address space but distributed across multiple nodes, if the system sort of randomly allocates memory,
Starting point is 00:39:58 you won't necessarily always be allocating memory that is close to where the computation is being performed. So you won't get the best performance because the latency in reading the memory will not always be as fast as it can be. So if you have an algorithm that sort of modifies data in a particular pattern, you can tie the memory allocation to that pattern so that you're always accessing memory as close to the process as possible. And also you can bind the processes, the different executions to those processes so that you're always executing in the same place effectively
Starting point is 00:40:38 so you don't sort of jump over to another thread on a different core or different CPU node, and then have to recache or copy the memory over to another node and then recache. So generally we try to write code that is cache-friendly and stays in the local CPU cache, but you're talking about architectures where if the data you need isn't in your local CPU cache, and it might not even be in your local memory, it might actually be in a completely different computer across the network. Yeah, that's right. Which is like, we're talking hundreds of orders of magnitude slower than if it were right here next to you. Yeah, exactly. Okay. Yeah, so if you have to copy over across a bus, for example, from one CPU node to another,
Starting point is 00:41:28 then that's going to be a huge amount slower than accessing something on the local device. Okay. Okay, Gordon, is there anything else you wanted to go over before we let you go? There was one thing that I was going to mention about SYCL that I kind of went past without mentioning. One of the other
Starting point is 00:41:50 benefits of SYCL that we like to highlight is that there are some other heterogeneous programming models things like CUDA or C++ AMP or OpenMP, OpenACC they all allow you to do some sort of heterogeneous execution in C++ code.
Starting point is 00:42:11 One of the things that we really aim to do with SYCL is that all of these programming models that I mentioned have some form of language extension or pragmas or attributes or macros in order to define the block of code that you want to execute on the device. It means you have to introduce these changes to code so it's not standard C++ anymore. Also these are generally tied to a particular platform. For example, CUTECUDA is tied to NVIDIA. C++ AMP was only Microsoft
Starting point is 00:42:46 and AMD, I think. So one of the things we wanted to do with SYCL was make it truly cross-platform, truly standard. So all SYCL code is entirely standard C++. So any SYCL applications you run, even if you
Starting point is 00:43:02 don't have an OpenCL platform available, all OpenCL, all SYCL applications you run, even if you don't have an OpenCL platform available, all SYCL implementations are required to have what's called a host device. So it will compile down to standard C++ code and run on essentially an emulated device that matches the same execution and memory requirements of OpenCL as you would have if you were running on an actual OpenCL device. So this is very useful, particularly for debugging.
Starting point is 00:43:33 So if you're on a system where you don't have a debugger for your GPU or your FPGA, for example, it's very useful to be able to run in standard C++ code the same way as you would on the device in order to sort of debug your code. It debugs your application, essentially. So it's not running on the actual device, but it debugs your application.
Starting point is 00:43:56 It sounds like even just for debugging, this would also be good for R&D, like you could keep continuing to work on your slow laptop on the airplane or whatever while you're trying to test out how this code would look. Yep, exactly, yeah. Okay, great. Well, it's been great having you on the show
Starting point is 00:44:12 today, Gordon. So there's one other thing I'd like to mention, if that's possible. Yeah, go ahead. So as of, so not as of today, but as of when this episode airs, Codeplay will have released ComputeCPP 1.0, so the full implementation. So this is actually the first fully conformant implementation of SYCL that is available.
Starting point is 00:44:40 And Codeplay also provides a community edition of ComputeCBP that can be downloaded for free. So for that, we support multiple Windows and Linux platforms, and that also supports Intel CPU, Intel GPU, ARM CPU, ARM Mali GPU, and also we have experimental support for NVIDIA GPU through BTX. And when is this actually going to be released? So this will be released on Thursday, so tomorrow. Okay. Yeah, we'll be releasing this on Friday, so yeah, it'll be out.
Starting point is 00:45:18 So it'll be out by the time this airs, yeah. Very good. Okay. As I was saying, Codeplay is recruiting, so if anyone's interested, we have a lot of openings in compilers, debuggers, runtimes, tools, etc. It's a really great place to work, and there's a lot of really interesting
Starting point is 00:45:36 and great people to work with. It does sound like there's a lot of interesting going on there. And definitely great people to work with, because we've talked to at least three of them. They're all great. Okay, thanks so much for being on the show today, Gordon. Okay. Thanks very much for having me. Thanks. Thanks. Bye. Thanks so much for listening in as we chat about C++. I'd love to hear what you think of the
Starting point is 00:45:57 podcast. Please let me know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Leftkiss on Twitter. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.