CppCast - SYCL 2020

Episode Date: July 2, 2020

Rob and Jason are joined by Michael Wong from CodePlay. They first discuss GCC 11 changing its default dialect to C++17 and polymorphic allocators. Then Michael shares an announcement of a new version... of SYCL that was just released. And shares information about the multiple standards groups he is a member or chair of. News GCC 11: Change the default dialect to C++17 Build Bench Polymorphic Allocators, std::vector Growth and Hacking Links SYCL P2000 Michael Wong "Writing Safety Critical Automotive C++ Software for High Performance AI Hardware:" CppCon 2016: Gordon Brown & Michael Wong "Towards Heterogeneous Programming in C++" Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license Read the article "Checking the GCC 10 Compiler with PVS-Studio" covering 10 heroically found errors despite the great number of macros in the GCC code.

Transcript
Discussion (0)
Starting point is 00:00:00 Episode 254 of CppCast with guest Michael Wong recorded July 1st, 2020. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we discuss GCC 11 switching its default dialect. Then we talk to Michael Wong from Codeplay. Michael shares an announcement about Sickle and much more. Welcome to episode 254 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm okay, Rob.
Starting point is 00:01:21 I just got the email telling me that i am approved for a 2021 mvp oh did you congratulate me literally just this moment so i think that's year five for me now something like that very cool i don't think i've gotten mine yet but uh yeah i know those emails are going out today i forgot that today was the day, in fact. an IoT library for CPP SDK as well. And I thought that there were both C++ and C SDKs that we talked about last week, but maybe it is a pure C SDK. Maybe it is, but it works with C++. Yeah, obviously you can use it with C++.
Starting point is 00:02:17 Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at speakass.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Michael Wong. Michael is the Vice President of Research and Development at CodePlay Software. He is now a member of the Open Consortium Group known as Kronos, MISRA, and AUTOSAR and is Chair of the Kronos C++ Heterogeneous Programming Language, CICL,
Starting point is 00:02:41 used for GPU dispatch in native modern C++, OpenCL, as well as guiding the research and development teams of ComputeSuite, ComputeAorta, ComputeCPP. For 20 years, he was the senior technical strategy architect for IBM compilers. He is the Canadian head of delegation to the ISO C++ standard and a past CEO of OpenMP. He is also a founding member of the ISO C++ Directions Group and a director and VP of ISOCPP.org and chair of all programming languages for Canada's Standard Council. He also participates in ISO SC42 on AI and machine learning. He has so many titles, it's a wonder he can get anything done. Michael, welcome to the show. Thank you. I'm glad you're able to read that without taking
Starting point is 00:03:22 a breath. And That's pretty amazing. And that's a modified version of your full bio that's on your website. There's a lot more that could be mentioned. Well done. Well done. Hats off to you, sir. So if you don't mind, we were talking about this just for a few minutes before we got started. But with all of those hats you wear, you're used to normally traveling like five weeks a month, right? If I could, I probably would end up doing that, yes. There's a lot of committees I'm part of.
Starting point is 00:03:52 But my first love has always been C++. Coming out of my experience in C++, I've been asked or prodded to join and lead other groups and other committees. And I think that's how that cross-pollination between different committees, because many committees often go through similar problems. And it was very useful helping each other, you know, from OpenMP to C++, now from C++ to Kronos under the SQL Working Group. And then from there, I'm now actually on three other ISO committees, one on machine learning and AI. You didn't know that there was a committee for that. Well, yes, there is.
Starting point is 00:04:31 And these guys are just talking feverishly, having daily meetings now because of the lockdown. And the other one is about safety, which is my other huge research goal and trying to drive for more safety in autonomous vehicles. So there's a big group right now of all the top safety engineers of the world, people from General Motors, Bosch, a totally different area of my previous life that are constantly online talking about how to make self-driving cars safe. Wow. So are you just literally in meetings all the time right now? I'm often on, because of the lockdown, I'm actually working three times harder.
Starting point is 00:05:09 Lockdown will be over soon so I can work less because at least when I'm in an airport or in a lounge in a hotel, I can avoid going to calling into a meeting. I have a good excuse. I'm traveling. But now I'm on calls seven, eight hours a day, often triple booked. So I have a setup here in my house where I have three computers, three screens and three cameras and three speakers. And often there are three voices talking in my ear with my Bluetooth headset. So it's become quite fun juggling all these things at the same time. I just got off another call so I could focus totally on you guys. So I'm totally not on any other call right now. I'll have you guys know.
Starting point is 00:05:49 I'm only on your call. Okay, that's good to know you're not a double or tripling up podcast interviews. I was tempted because the other call was getting into an interesting area and their call was on MISRA, which is one of the safety standards that has to do with making C++ safe.
Starting point is 00:06:05 And they were on their face-to-face meeting, or their so-called virtual face-to-face meeting today, which started at 5am this morning Eastern Time. But it was going to end anyway by about 2pm, which is roughly 7pm in the UK, because they're following UK times. But I'm used to that. Ever since I joined Coldplay, I kind of live British times. I'm generally up at 5am in the morning because that's when my office
Starting point is 00:06:29 wakes up and people start going in. And by 1 or 2pm things are slowing down and then I stop for a moment before I choose whether I should continue working or go out and play some tennis. But actually, yeah, that's about the best you could possibly get huh eastern uh north america to uk is only only five hours difference you got it yeah
Starting point is 00:06:56 jason you must know this stuff by heart yeah yeah you you got it exactly right there's a period of time you can get a lot of things done for europe uk yeah early in the morning and then before the west coast coast wakes up and when the west coast wakes up they always demand time starting around noon to later on day right so so there's a brief period of time when i can actually do real things between before all the calls ramp up again. And ISO calls nowadays are just totally, they've just given up at trying to make it work for any one person. They would now be called at 1am, 3am, 5am in the morning because sometimes they have to favor Asian times.
Starting point is 00:07:37 You know, I'd be in bed with my Bluetooth headset, my wife would be sleeping beside me and I'd be listening to a call coming from Japan somewhere. I did that with SG-14 last month. What I did with SG-14 was I inverted our regular call from being 2 p.m. Eastern time to 2 a.m. Eastern time, so that it would be like 2 p.m. in New Zealand. We wanted to make sure that we could get our New Zealand friends and our Australian friends on SG14 on the call at least once so that they could be comfortable because they have never been able to call on the SG14. All the ISO call tends to be very European, US-centric, as you
Starting point is 00:08:19 guys know. I am curious, though, if you don't mind, you said that there's a lot of valuable cross-pollination. And I'm thinking, like, Misra, I mean, we've had other people on the show that have talked about, like, Misra and safety standards. And I'm just kind of curious if there's anything specific you could point to where you see, like, an overlap between Kronos and Misra because they feel like completely different topics. They are, but they are not anymore in our world. You could also say C++, to some extent, is not about safety because it generally could be more about performance and portability. But our world is definitely merging. A lot of workloads out there are now merging in terms of performance, being portable, but also needing to be safe because they're getting the consumer hands. I mean, what is a self-driving car but a supercomputer on wheel doing very high-performance stuff, potentially on portable things, on portable devices, at least when you get away from Tesla-type stuff?
Starting point is 00:09:25 Because people want to have things that can be reused the next year instead of keeping it from the ground up. So speaking about Kronos and MISRA, what's going on is a lot of Kronos standards are being tested with MISRA. So Kronos things like Kronos Vulcan has a safety-critical group, and what they do is they take the Vulcan API and run it through MISRA. So Kronos things like Kronos Vulcan has a safety critical group and what they do is they take the Vulcan API and run it through MISRA. Get it through all the
Starting point is 00:09:51 warnings and the defects that comes out of MISRA on it and then refix the API so that it can make Vulcan safe. They do that with OpenGL. We plan to do that with OpenCL. SICKO plans to build such a thing so that it can be appropriately used in a self-driving car.
Starting point is 00:10:09 So this is why there's now tighter and tighter integration between the groups. Kronos is using MISRA. C++ in some domains is definitely interested in having MISRA. We actually have MISRA coming to C++ standard meetings, attending SG12, which is the vulnerabilities optimization group. And we cross-pollinate the two groups so that, you know, there's a new MISRA rule we want to check with C++ to see that the experts agree that this is a good rule. Have we missed something? Do we not know about something about template argumenttype deductions that
Starting point is 00:10:45 we didn't know about that we should protect against? Or the opposite, which is, are we protecting too much? One of the big complaints in the past about MISRA is it's too blanket in its restrictions. Don't use template. Don't use exception. Well, that's just not going to fly in modern C++.
Starting point is 00:11:02 Right? And it's not flying with self-driving cars either. They're not saying that you should never allocate dynamic memory. They're saying you should not allocate dynamic memory after the car starts up. And you could allocate as much as you want before the car starts up. But after the car starts up, you can still allocate dynamic memory, but you have to make sure that it doesn't throw an exception. You have to make sure that it's got enough space. It doesn't fragment so that it can be deterministic so that you don't get a 10 second breaking warning when the requirement in the law is that it has to be two microseconds.
Starting point is 00:11:39 So these are very, very tightly connected fields that needs those cross-pollination of ideas. It needs the experts who are the safety experts, but it also needs the experts who are C++ experts, who are the heterogeneous experts, who are the concurrency experts. Yes, the world is definitely converging now. That is very interesting. Now, I apologize for the sake of our listeners because I definitely skipped ahead. We haven't gotten to the news yet. And some of these things like MISRA and Kronos, we definitely will have to come back and make sure we define them so that our listeners know what we're talking about.
Starting point is 00:12:14 Right. So let's quickly get through the news. And then, Michael, feel free to comment on any of these, and we'll start talking more about Sickle and your work in iso and maybe more about mizor and chronos sound good sounds good okay so this first one is an announcement uh that gcc 11 i believe is going to now target c++ 17 as its default dialect which is great news about time yes thank you it's not easy for them to do it, I can imagine. I was very close to following GCC when I was the Excel compiler lead, and I had to make sure that I had to duplicate almost every GCC feature back then. Everything, pretty much.
Starting point is 00:12:58 Because back then, before Clang, they were the 800-pound gorilla, and you had to follow them very closely. And it was painful for them to switch to c++ 11 because um c++ 11 broke a binary compatibility with sso and so to transition from but since then they've learned and so the transition to 14 was very quick and 17 you know it was much quicker but now we're talking we're now also talking about that the support build is also built with 17, right? Right. And they did point out they are still missing two C++17
Starting point is 00:13:30 library features, but they said they're not going to let that hold them up, which is good. Oh, okay. Yeah, to be honest, it was actually, one of them was one I didn't even know existed and I thought that I had covered almost every C++17 feature on my YouTube channel.
Starting point is 00:13:46 Which one is that? The standard hardware constructive, destructive interference. Yeah, yeah, yeah. What is that? A little known C++ feature we put in and it's actually really important in parallel programming. When I was the CEO of OpenMP, one of the biggest problems with parallel programming is false sharing. So this is the idea where you have a cash line and you touch the cash line with one thread and then a different thread, now because of the
Starting point is 00:14:18 locality, because it's so close, a different thread might keep pinging that cash line. And now you're unnecessarily thinking that the next bite that you need access is actually on that same cash line but it's not okay so what this does is it gives you the ability to create a distance of separation so to know that this is the size of the cash line that's called the destructive interference it means that you won't need to ping this cash line for the next bite because it's not anywhere close to it. The opposite is called constructive interference, where it is actually on that cash line, and you actually want it to be in that. You actually don't mind it pinging that particular cash line over and over again. So, yeah, it is actually really important, but it's a very rarely known.
Starting point is 00:15:02 I actually use that with many of my talks at an opening to let people know, hey, did you guys know that C++17 now supports false sharing or restricting false sharing? Well, theoretically, if GCC doesn't support it yet, then... That's true. Well, I just had this conversation about false sharing with someone the other day, so I believe they listened to the podcast. Yeah, if you are involved anywhere with parallel programming, that has always been a problem with every single parallel language, not just OpenMP. Every single parallel language, and now C++11 has introduced parallelism and concurrency, has this exact issue. It's not a huge problem to get around.
Starting point is 00:15:44 You just have to have a way to get around it. Before this, in order to get around it, you probably had to use the vendor hardware. The vendor will sometimes give you some way of getting around that for sharing because they know the cache size of their hardware. So it was kind of just vendor-specific always. So you have to use a different one for every machine,
Starting point is 00:16:04 from IBM to ARM to whatever to Intel. But now this gives you a standard way of doing it, and you don't have to keep changing it. It says, if I'm on ARM, then use this particular command to reduce the pinging on the cache line, things like that. It's very important, yeah. It is interesting, though, that it's constexpr, which means it has to be generalized a little bit
Starting point is 00:16:24 or tuned to a specific architecture or something right right right okay because it doesn't work for a different architecture well no but i mean even between like an eighth gen i5 versus a ninth gen i5 right that might change the cache line size right that's right that's right that's okay yeah yeah okay uh next Okay. Yeah. Yeah. Okay. Next thing we have here is a new web tool, and this is build-bench.com. And Jason, you said this is put together by Fred Tengau, who's been on the show before, right? Yeah. Yeah. So it's just like his quick bench to quickly test two different bits of code against each other.
Starting point is 00:17:01 This one lets you quickly test the compile time differences between two different bits of code against each other this one lets you quickly uh test the compile time differences between two different bits of code right so if you're playing with different options yeah right so and the option or the um example he has here if you just go to the website is you know a simple hello world program one using cstdio and one using iostream and it shows the pretty drastic difference between uh how long it takes to build the iostream version it shows the pretty drastic difference between how long it takes to build the iostream version but you could you know toggle different settings and then you know try out more things in your little code samples there it's pretty i i was actually really curious about this specific example so i commented out the body of main and it turns out that the entire compile
Starting point is 00:17:40 time difference is just in a processing iostream header file you don't have to actually use that yeah so so compile time differences has has not always been the most popular thing people are gonna was looking at it was always runtime differences but more recently compile time has has re-emerged as the big candidate. And indeed, this is why we have modules, right? In order to try to reduce the constant compilation problem, where every time you touch some include file, this is the second biggest problem in C++. The first biggest problem in C++ was the error novel problem, as you guys know, and that's what concept solves, right? So it doesn't give you the error novel on, as you guys know, and that's what concept solves, right? So it doesn't give you the error novel on every template instantiation.
Starting point is 00:18:28 The constant compilation, recompilation problem has always been an issue. And now with modules, theoretically that can bring it down to a more constant time. And having some sort of benchmark that can allow you to be able to test that is extremely important.
Starting point is 00:18:43 And it's definitely a welcome component for C++, I think. Yeah, definitely. Okay, and then the last thing we have is a post on Bartek's coding blog. And this is about polymorphic allocators, which is a C++17 feature that I don't think we've really talked about on the show before, right, Jason? It was briefly mentioned when... So, okay, a little bit of a timeline here if you don't mind it was briefly mentioned when we had uh john lake osama and he mentioned a talk
Starting point is 00:19:12 from some of his colleagues uh alistair meredith and pablo pablo i can't remember his last name pablo help and they're all good friends of mine john pablo and alistair meredith they're all good friends of mine so uh he mentioned a talk from them from CppCon from last year, which I went and watched, and that inspired me to make a C++ Weekly episode, which then he caught on his blog here and mentions my C++ Weekly episode and then does a little bit more of a digging into PMR. Right. So PMR is an important C++17 feature,
Starting point is 00:19:45 and I imagine it's going to get even more important. Polymorphic memory allocation is a key component that can control the growth and the regrowth of things like suit vector. And, you know, I talked about early on that in the self-driving car domain. One of the things that requirements for using dynamic memory after the car engine has started is that you've got to be able to use a memory allocator that is guaranteed to not throw exceptions. That's that is safe. That can that is not that will not fragment.
Starting point is 00:20:19 Well, this is something that polymorphic allocators can serve. OK, instead of the standard allocator that C++ offers you. And this is why I think it's an important, important, important addition. I've actually talked to John directly about how we can make an allocator that fits the self-driving car domain. And he's actually promised me that this actually can work. I mean, although I have to examine it closely, I think there's definitely usability there to give you something that is a safe allocator. Ultimately, at some point, memory allocation has always been the big crux of the problem for real-time safety-critical code, things that you use in a self-driving car, things that you use in a pacemaker, things that you use that are medical or extremely safety-critical systems.
Starting point is 00:21:10 It's the fact that memory allocation is non-deterministic. You don't know when it might fail. You don't know when it might suddenly take too long. You don't because of fragmentation. These are critical issues. And in a way, exception handling, the whole problem with exception handling, or one of the problems, not the only problem, is because it seeks memory allocation during the exceptions. Right. For the exception object and the hierarchies and all that stuff.
Starting point is 00:21:38 And solving memory allocation is critical in the safety domain for both just memory allocation itself and the issue with exception handling. So one thing I noticed, which is off topic from this particular article, but on topic from what you just said, is my understanding, at least from what I've seen so far, is that none of the standard container member functions are, let's say like pushback, is not conditionally no except on whether or not the allocator would throw an exception. And it seems like that's a hole for what you would need to prove to say that, no, we know that we can safely use this standard container with this memory allocator because we know the memory allocator is no except and that propagates through to the
Starting point is 00:22:22 container. Yeah, we would like them to be noexcept. Yeah, yeah. Okay. Yeah, we would like them to be, but they're not. Right. So there's no easy way around it. This obviously is suitable for a very specific domain, and we might have to...
Starting point is 00:22:37 There's been talk about creating a safety version of the C++ standard library, okay? Because right now, it's not specifically for that in some ways. It's good for performance, but it might not be the best for safety critical, but it can be if we set our mind to doing something like that. Right. All right. So, Michael, we had Gordon Brown on maybe two years ago talking to us about SICKL.
Starting point is 00:23:04 Could you start off by reminding our listeners of what SICKL is? Right. So SICKL is part of the Kronos portfolio of open specifications and belongs in the parallel computing group. It's a single-source C++ parallel programming language that takes standard ISO C++ applications, like even TensorFlow, and then compiles them with a whole CPU compiler and a device SQL compiler to generate code for many kinds of devices.
Starting point is 00:23:29 In a more simple way, let's just put it this way. ISO C++ doesn't support GPU programming or heterogeneous programming directly. What SQL does is takes ISO C++, adds the heterogeneous programming layer right now, today, and then makes it possible for you to dispatch the GPUs to artificial intelligent machine learning devices that are being used in self-driving cars, for FPGAs, for DSP, for any kind of offloaded devices. So that's really the only, you know, forget the words I said, they're important words, but at the end of the day, it's really just about making ISO going away, going towards a direction which we think ISO wants to go anyway. And certainly there's a large group of us within
Starting point is 00:24:14 ISO who are slowly driving ISO towards C++, towards that direction. But the reason they don't have it yet is because it takes about 10 years probably to get that done properly, because there's a lot of legacy code, millions of lines of code that might not care about heterogeneous or GPU programming, or they might in the future, but not yet. So we have to make sure that it works within that whole framework so that the memory model, the data model, the concurrency model works within the current framework.
Starting point is 00:24:42 So do you need to put pragmas or anything in the code? Or is it just kind of out of magic? Oh, God, no. That is totally anathematized. That is what OpenMP does. Right. OpenMP was a really great way of getting you to parallel and X parallel and then heterogeneous acceleration by adding pragmas.
Starting point is 00:25:04 I know because I was the CEO for about five years and actually led them through the transformation to accelerate the programming. We know that pragma, the problem with pragma, other than your dislike of it, I can see your face grimacing every time I use that word. Totally understandable. Cool, guys. No worries. The real technical problem is that it does not allow you to separate concerns. In a single pragma directive, you would encapsulate where you're going to do the offloading,
Starting point is 00:25:36 how you plan to do the offloading, when you do it, in what manner of splitting the threads you're going to do it. It makes it very hard in a C++ style to be able to say, wait a minute, I don't need to do it. It makes it very hard for, in a C++ style, to be able to say, wait a minute, I don't want to do it exactly like that. I want to do it this way. The other problem is, you know, I used to write the OpenMP compiler for IBM and added those things.
Starting point is 00:25:56 The other problem I had a huge problem with, and even today it's still not easy to solve, was where to put the error message. Because the pragmat directive encapsulates so much, but it often comes after, sorry, it often comes before the actual action. You have to set the directive, then you have to set the loop.
Starting point is 00:26:14 In the loop, you often can't figure out which directive is enforced and what error message should I issue. So most of the time, you just give up on the error message. Just don't say anything. I mean, that's just not going to fly in a commercial system that just can't give any reasonable error message. I mean, I'm not proving OpenMP. OpenMP is fantastic
Starting point is 00:26:35 and it works over three programming languages, C, C++, and Fortran. But the problem with not being able to generate good error message wasn't totally pervasive. It was there for about 30% of the cases where it was hard to generate error messages, but it was enough to be bothersome. The other problem is how you can put a pragma on a template argument. You can't. Can I make this template argument parallel? Well, you can't.
Starting point is 00:27:01 You can't put it anywhere. There's no grammar space to put that in so it is a real it is a difficult problem but now we don't need that sickle can do it all in all natural c++ it looks just like c++ okay but we're not the only one um many other programming languages have adapted to it so while i really enjoy sickle and i took over the leadership of it and the chairing of it. I also learned a lot from all the other programming languages that have come before us along the way to learn how to add heterogeneity to C++. HPX from Hartmut Kaiser does a great job of it and it's pure C++. CUDA 8, 9, the later CUDA is much more adaptable to C++. The early ones were more C-like, but definitely the later ones are definitely more C++.
Starting point is 00:27:48 NVIDIA has another thing called Agency that does it. Beyond that, there's obviously Boost Compute, which is compute, which effectively adds a C++ layer on top of OpenGL. There is the U.S. national labs at Cocos and Raja. They have also done an admirable job of adding a framework on top of C++. So all these other languages have learned how to add heterogeneous programming to C++ along with SQL. And they all have valuable learnings. And interestingly enough, they've all solved problems in similar ways. So I say it's time to add it to ISO because it's not like we're digging for gold in Alaska here.
Starting point is 00:28:30 We are not exploring Proxima Centauri. We are literally going through well-charted ground. We know where the problems lie, where the demons are, most of it. I gave a LLVM keynote in 2018 where I said that the four major problems of heterogeneous computing is data, data movement, data locality, data affinity. It's just all data. And I forget the last one anyway, but you can look it up. But the point is that we now know where the problems are, and it's time for us to try to add it to C++.
Starting point is 00:29:06 But that process will take time. But in the meantime, we have all these other languages that are available for you to try to use it. And, of course, one of them is standardized across an open community of many companies, and that's what Senko is. Sorry for the long-winded answer. No, that's great. And on that note, is there already an ISO study group that is working towards standardizing something like SICKL into ISO?
Starting point is 00:29:32 So there is movement within ISO trying to add heterogeneous programming. In my keynote from LLVM 2018, I called it the quiet revolution. There's no official study group, but a lot of people are trying to help it along. The reason is, this is not just the perennial any one study group. It crosses several study groups. It started when SG14
Starting point is 00:29:56 did a survey. SG14 is one of the groups I chair that has to do with low latency but with games programming. The games programmers came back and said, we really would like you guys to add heterogeneous programming. And so from SG14, because it's a lot of parallelism and concurrency, rightfully, it landed a lot of features in SG1, as well as the library group. So I can delineate some of the efforts. So within SG1, we've opened up the specification to make it possible to add heterogeneous programming. Part of that has to do with fixing things like
Starting point is 00:30:31 std thread, which only talks about heavyweight CPU thread, and changing that word to execution agents. By that change, by that very subtle change in the language, it means now you can add very, very lightweight threads that are typical GPU threads. Okay. There were other things that had to do with forward progress that we had to change the language for. That was imperative for GPU because GPUs typically have very weak forward progress. Okay.
Starting point is 00:30:58 Then we also did other things like add span and empty span through the library group, so that now you can have ways to pop up a little layout. It turns out that in GPU programming, it's extremely important to know how your data is laid out. Because in CPU, you guys have all heard about contiguous layout, things have to lay out contiguously. Well, that doesn't work
Starting point is 00:31:20 when you're in a GPU. GPUs do not like things contiguously because it's got thousands of threads working on it at the same time. So every thread that comes in needs to skip a certain amount of space. So that's why they call it coalesce. They actually have to be skipped.
Starting point is 00:31:35 They have to have a certain span and then they have to be re-coalesced as a result. So this is one of the key things that has to do with the fact that you have to be able to control the data layout. I and my team are working on another aspect on data affinity, which is the ability that certain memory and agent have small affinity to each other. They could be closely placed or spread out.
Starting point is 00:31:56 That feature is working through SG1. In order to make it all work, we can't just have one study group that does it. We kind of have to do it through everybody. And that's why I call it the quiet revolution. It's a revolution happening with a lot of people's help, but not any one single group controls it. And I actually like it that way. I don't want any one company or one group to dominate the design because it turns out that everybody have good ideas. Okay.
Starting point is 00:32:24 I want to interrupt the discussion for just a moment to bring you a word from our sponsor, PVS Studio. The company behind the PVS Studio Static Code Analyzer, which has proven itself in the search for errors, typos, and potential vulnerabilities. The tool supports the analysis of C, C++, C Sharp, and Java code. The PVS Studio Analyzer is not only about diagnostic rules, but also about integration with such systems as SonarCube, Platform.io, Azure DevOps, Travis CI, CircleCI, GitLab CI, CD, Jenkins, Visual Studio, and more.
Starting point is 00:32:54 However, the issue still remains, what can the analyzer do as compared to compilers? Therefore, the PVS Studio team occasionally checks compilers and writes notes about errors found in them. Recently, another article of this type was posted about checking the GCC 10 compiler. You can check out the link in the description of the podcast. Also, follow the link to the PVS Studio download page. When requesting a license, write the hashtag CppCast and receive a trial license not for one week, but for a full month. So I believe you have an announcement to make about a new version of SYCL. Is that right? Yes.
Starting point is 00:33:29 So this is actually a great time to come to it. We have been SYCL has been so in 2018 at IWACO, which is the workshop for OpenCL in Oxford. We had we had SYCL 1.2 in 2015 and then SYCL 1.2.1 was released in 2017, and that was aligned with C++11. At the most recent online IWACO, which was also called SYCL Conference 2020, I unveiled a possible future roadmap of SYCL, where in 2020, we planned to release, and we did yesterday, July 1st, actually. Actually, yesterday, June 30th. The first provisional
Starting point is 00:34:09 SYCL 2020 release that's going to be based on C++ 17. Because it's based on C++ 17, this means that we are enabling exciting C++ features like class template argument deduction and deduction guides. That makes code less verbose
Starting point is 00:34:25 there's also a number of great features but the whole point is to make it simpler to write sickle to make it possible to work with things like very complicated templated applications like like tensorflow okay that means that you can have things like specialization constants it also means that sickle we now also have make also have made it more generalized so that it doesn't purely depend on OpenCL. It still has OpenCL as a main component, but it now could certainly be ported to go on top of Vulkan or PTX or NVIDIA Coder or any as well as AMD's Wacom so that it can be on any other kind of backends as well because it's really just a language
Starting point is 00:35:11 framework. It's really not anything specific to the backend. So that's why it's a huge thing. So yes, now you can download the SQL 2020 provisional at the Kronos website and also give feedback at the Kronos community website channel. We're going to be going through about two or three months of review, public comment period,
Starting point is 00:35:32 and then we will be going to final in September aiming for supercomputing 2020. That's a big thing because we think that this makes SQL much more usable, less verbose, simpler code, and more closely adapted to C++ 17. My understanding is that there's multiple implementations of SYCL. Is that correct? Yes. Okay. You can probably find any of my talks on SYCL 2020 online right now.
Starting point is 00:35:58 But SYCL 2020 right now has about four or five implementations, all based on slightly different back ends. And I can go through some of these with you guys. Sure, sure. Part of it has to do with the fact that Intel recently really grasped onto SYCL because they were tapped to supply the next exascale supercomputer at Argonne National Lab. Okay. And they have developed an open-source Clang compiler under their one API framework, and it is called Data Parallel C++.
Starting point is 00:36:39 Okay. It's an open company collaboration. A lot of companies collaborate on it, but Intel has supplied a lion's share of the manpower at making it. And there's a lot of SQL 2020 features that are already in there. The other one that I wanted to add is that Coldplay, obviously the company I work for, has a commercial implementation as well as a free download version
Starting point is 00:37:03 that has various so we have already implemented various forms of SQL 20 within our compiler and it's called Compute CPP. You guys mentioned Compute Suite. Compute CPP is the SQL version. There's also
Starting point is 00:37:19 you mentioned Compute Aorta that is actually the one that supports OpenCL. So Compute CPP in use in things like the Renesas R car for self-driving cars, advanced driving assist systems, as well as a number of other commercial implementations and industries. There is also Xilinx FPGA implementation, which used to be based on OpenMP. The backend actually uses OpenMP constructs. So this shows you how. So you can see that, you know, for some of these, it was based on OpenCL, but these other implementations showed us that we could easily make it work for other backends like OpenMP or later on you'll see Rockham. There's a guy at Xilinx named Renan Cariel,
Starting point is 00:38:10 he's my editor for SYCL. He's done an amazing job in the last couple of weeks at making sure the spec goes out properly. What they do is they're trying to make it work for FPGA, which is a porn segment coming up, I believe, in the mobile and edge device domain. Then there's also University of Heidelberg, a gentleman named Axel Alpe. I think he's a graduate student. And what he's doing is he's built something called Hipsicle,
Starting point is 00:38:39 which was originally built for implementation that uses OpenMP for any CPU, CUDA for NVIDIA GPUs and ROKM for AMD GPUs. So that was very very useful. All these guys, all these implementers are on the call, on the working group call, which is like twice a week often. I also want to mention that Xilinx, yes, that adds to my other calls. So the one from Xilinx, I believe, now has switched to the Clang base as well, too, because of the work that Intel has done. So obviously the Clang one supports CPUs, multiple CPUs, GPUs, and FPGAs through OpenCL with Spear V
Starting point is 00:39:18 and NVIDIA GPUs through CUDA. The Coldplay one, the Compute CPU P1, can obviously support a host of GPUs, FPGAs. We tested it on ARM and AMD devices. And specialized accelerators through OpenCL or with Spear or SpearD, as well as NVIDIA GPUs through PTX
Starting point is 00:39:37 ingested through OpenCL. And it's the first conforming implementation for SQL 1.2.1. Cronos has this thing where unlike C++, ISOs, where they couldn't 1.2.1. Cronos has this thing where, unlike C++, where they couldn't have a conformance test, Cronos has this conformance test you have to pass. And the compiler that passes this conformance test
Starting point is 00:39:54 gets branded as being conformant to that specification. So we're going to do the same thing right now for SQL 2020 to create a conformance test so that people can walk through. If you're familiar with specs cpu benchmarks It's kind of like that you have you have to get through that benchmark to be conformant and the same thing happens here So the Xilinx one is of it's called tricycle and it used to use an open MP backend for any CPU and open CL Which spear LLVM for silence FPGAs, but I believe don't quote me on that but I think it's heading towards Clang
Starting point is 00:40:25 for using the Intel Clang distribution as well. So yeah, this is good as an ecosystem growth because it's like C++, right? C++ has GCC, has Clang,
Starting point is 00:40:41 and it used to have a lot more distributions. That has actually gone kind of bad that a lot of distributions are getting back to just Clang, and it used to have a lot more distributions. That has actually gone kind of bad, that a lot of distributions are getting back to just Clang. But at least there's still GCC to keep Clang on this. And honestly, both of them have done each other good. Before then, and I was deeply involved with it, there was the Sun compiler, the Microsoft compiler,
Starting point is 00:41:01 the Excel compiler, and of course, the EDG compiler. I think HP used the EDG and Intel used EDG. So really, way back when, before it consolidated to basically only three compilers, there was basically about seven compilers that compiles C++. So SYCL is the language. It's like C++. And these implementations, you know, ComputeCPP, Intel's DPC++, Xilinx and Heidelberg, is like GCC, Clang, Excel, Sun, the EDG compilers. So you know, in the C++ domain, we now are down essentially to three major compilers, right? Most of the other compilers have mostly floated to these three. I'm actually not a fan of that direction myself.
Starting point is 00:41:52 I think EDG is still being used in the IDE for Visual Studio, so that's... I believe so. Yes. So it is at least one, maybe there's four, but yeah. Visual Studio is still kind of unique in that it's still a surviving uh standalone its own implementation it hasn't gone totally to clang although i think they're using clang for the expression evaluator for the debugger uh i don't know yeah they're still they're definitely doing a lot of clang integration i know yeah so i'm curious um because you mentioned xilinx and fpgas a couple times
Starting point is 00:42:26 is the goal to make it so that sickle c++ code actually compiles into hardware effectively so i'm not a fpga expert so i can't really speak too intelligently about that i'm i'm going to punt all those questions to my friends at Xilinx, Renan Cario, as well as my friends at Intel Altera, like Mike Kinster, who's actually, before Altera, joined with Intel and all that.
Starting point is 00:42:55 These are two major FPGA vendors that are in the SQL group, who are potentially working hard. At the supercomputer last year, there was a major paper by a couple of the Altera people showing how to use SYCL in an FPGA format. So honestly, I don't know whether it's trying to push it into a Verilog or VHDL
Starting point is 00:43:15 or directly to hardware. I actually think it's mostly through SPIR-V to the hardware. So I think it's a shortcut to prevent you from having to use Verilog and VHDL, which takes days and days to compile. Right. So I actually think that that's the way it's going. We actually recently had a talk at my meetup about FPGA development and C++, and the speaker and others made the argument that when we're programming, generally thinking, we think linear flow. Hardware is inherently parallel,
Starting point is 00:43:48 and therefore those two things don't map together. But now I'm just hypothesizing here, thinking, oh, well, if SQL, you're thinking effectively inherently parallel, right? Because that's the point, is that you can offload it to a massively parallel GPU. Then maybe that makes that mapping easier. It could be, yeah. I wonder if that's what they're doing. Maybe we'll have to get someone on.
Starting point is 00:44:11 I'm on a CTV cast with my friends from Altera and Xilinx. I really hope we explore this area. Can we talk more about just some examples of what kinds of workloads, what kinds of applications benefit the most from something like SYCL? Right. So typically, with Coldplay's implementation, we focus a lot on specific vendors, GPUs, or things that they want to put out in about two or three years' time, and they want us to build a specialized tool chain for those GPUs or AI processors. And lately it's obviously focused a lot more into the type of processors that goes into self-driving car like the Renaissance R car. But Intel with Coldplay has now focused
Starting point is 00:45:00 a lot of their efforts on making this workload now work in a high-performance computing domain. And interestingly enough, high-performance computing and the machine learning AI domain is very similar, actually. And so what happens there is that SQL is being taken to be tested on all the computational fluid dynamics, quantum chromodynamics workload that the typical supercomputing center and applications do for things like weather forecasting, genome calculations, mapping to atomic pile safety calculations, which requires massive GPUs. So the typical supercomputing clusters now, that is aiming for exascale computing, are essentially thousands and thousands of CPUs with thousands and millions of GPUs.
Starting point is 00:45:57 Wow. The next three supercomputing that is coming online, the first one is called Aurora, and it has essentially Intel XE CPU processors with Intel GPU processors. And so what do you program that with? Well, you can either use OpenMP, right, if it's C or Fortran. But if it's a heavy C++ workload, you probably don't want to use OpenMP.
Starting point is 00:46:22 OpenMP has C++ support capability for acceleration to GPU, but it's not that great from a C++-centric point of view, especially if it has a lot of templates. So they chose in SYCL. And it's a good reason because they can participate in the open development of SYCL. It's wide open.
Starting point is 00:46:43 It's part of the Kronos group. Intel is part of Kronos, so they joined of SYCL. It's wide open. It's part of the Kronos group. Intel is part of Kronos, so they joined the SYCL group. The SYCL group started out with something like 10 people and now has routinely 25 to 30 people calling in to listen to this. ANL at Argonne National Lab is there
Starting point is 00:46:57 and Intel and obviously a whole bunch of other groups are there like Xilinx and Qualcomm, AMD and obviously Codeplay as well as a number of national labs and universities. But the point is that with that kind of workload, how do you program it in a standard way? Now, I don't know if you guys know about supercomputing evolutions in the last 10, 20, 30 years. I lived it for about 20 years when I was at IBM because we were trying hard to push through to the petaflop domain. That's 10 to the 15 floating point operations per second. Now, we've achieved that. In fact, the first computer to break
Starting point is 00:47:34 petaflop was the Roadrunner, which uses a cell processor, which is also incidentally the same processor as in the PlayStation. PlayStation, IBM, and Toshiba combined to create that specification. But it uses a computation model that is essentially using DMA access with a separate host and separate device code. And it was difficult, really hard to program. Now, the thing that you want to know is that with the DOE and the Exascale computing workload, they tend to have very stable workloads that live for 20, 30 years. Codebase will live for 20 or 30 years. But their machine is always the latest.
Starting point is 00:48:16 Every five years, they buy a brand new machine. It used to be $30, $50 million. That was $100 million. Now it's $600 million. Yes, the U.S. government has a lot of money. Well, maybe not anymore after this whole thing. But the point of it is that they have the latest hardware to run these things. So what do they want? They don't want anything proprietary language. They don't want any proprietary language. They want an open language that is a standard language that lives for 20, 30 years, that they can at least have some say in how that language is developed.
Starting point is 00:48:47 So that's why they love things like C++, generalized, standardized, general purpose language like standard C++, standard C, standard Fortran. Every DOE contract has those three listed right on the top. And as a vendor, you have to checkmark every single one of them. So SYCL works in that domain because SYCL is an open standard language. So that's why they're looking at using SYCL to program these kinds of workloads. So I talked about truenum, which has to do with machine learning chips and HPC. There are other ones, obviously. There's also high-performance computing.
Starting point is 00:49:27 Sorry, I mentioned that. There's also FPGA, embedded systems. SQL is particularly good to work with things like embedded systems, as well as AI and machine learning chips, like tensor processing units that are out there. So those are the kind of domains that SQL can work in.
Starting point is 00:49:43 We haven't gained leadership in every single one of those domains, but that's basically what the group is working towards. We've gained surprising leadership in the HPC domain, which is a very, very good win for us. But we're not resting on our laurels. We think that there's a lot more places for SQL. I'm sorry, just to clarify, did you just say that there are specific processors that are just designed to run tensor workflows?
Starting point is 00:50:10 Oh, yeah. That has been the big chip revolution in the last three, four, five years. Okay. It's because they've been looking at things that mix. They're called tensor processing units. They're loosely grouped as AI ML processors. And what they are, they optimize into machine hardware things like matrix calculations. Because most of machine learning
Starting point is 00:50:36 algorithm is nothing more than just a really giant regression algorithm that uses matrix calculations that does a lot of back and forward propagations across these matrices. So if you put those in hardware and then set the memory so that it works well with those constant retrievals of that little bits of data, it theoretically could make your tensor algorithm much faster than doing it all in software. So there are tensor processing units now in operations at Google as well as a number of large other companies.
Starting point is 00:51:11 Now you and I can't actually buy these things mostly. Okay, so that's why most people don't know on that aware but totally understandable. But yeah,
Starting point is 00:51:19 in that world where I live now is the other part of my research is machine learning and AI. That's the other group that I research is machine learning and AI. That's the other group that I lead in SG19 and ISO C++ is totally living.
Starting point is 00:51:36 I mean, to be honest, I honestly don't know if this is a really good solution trying to build in hardware, all of these things. You know, my life has never been about totally has always been about the fact that general purpose language, when right can actually do a lot of these things much um comparably with a comparable speed to to hardware and indeed that's one of the c++ um um part of the direction group that i am um their their philosophy is that the generalized purpose language can build um abstractions that are just as fast as any specialized hardware. And sometimes it takes a long period of time to prove that out. You know, we're still trying to convince the embedded group people that C++ can be just as fast in the embedded group. But it does take work, right? I mean, the fact that we think that many of these things
Starting point is 00:52:25 can be done away with by a really good optimizing compiler. Right. So the same thing can happen in TensorFlow, in the machine learning domain as well, too. Since you just mentioned the ISO directions group, and in your bio you mentioned that you were one of the founding members of it, is there any news coming out of that group about the direction of
Starting point is 00:52:45 C++ 2x 23? I want to correct one thing. I'm not a founding member. I'm one of the originally invited members. Okay. Although I'm chairing it this year, but it doesn't mean anything. We rotate the chairs every year to make it equitable and spread the
Starting point is 00:53:02 load. So yeah, in a way, I guess you're right. I am part of the founding member. I was in the original founding, but I just didn't want to make it equitable and spread the load. So yeah, in a way, I guess you're right. I am part of the founding member. I was in the original founding, but I just didn't want to make it sound that important. It isn't. So the C++ Direction Group was set, we think we were invited because we have been involved with C++ for a long time.
Starting point is 00:53:21 And we think because we have shown particular impartiality in our work in C++ for a long time, and we think because we have shown particular impartiality in our work in C++. So the members there are obviously Bjarne Stroustrup, the inventor of C++. There's Howard Hinnan, who used to be a chair of the library working group. There is David Vanderford, he is vice president of EDG compilers. There's Roger Orr. He's been a long-time ISO committee member and the head of delegation for the UK. And I am there as well, representing
Starting point is 00:53:53 well, I guess I have been with C++ for over 20, 25 years, almost 25 years now. I've put in a number of C++ 11, 14, 17 features, been chairing a number of different groups. So the direction group was trying to do something that was sorely needed, because what it was doing was that before this, believe it or not, C++ direction was more like a Ouija board.
Starting point is 00:54:20 It was more of a Brownian motion. You don't actually know where the molecules are going to go. Because it could be entirely surprising the direction it takes depending on who actually is present, who are the new people who just joined, who's particularly vocal, who's the loudest in the room.
Starting point is 00:54:38 You know what happens a lot. We've seen this in the last with C++20 where there was features that had wide support initially. And then depending on who showed up three meetings later, that feature is now removed from C++20. We've been fortunate that it hasn't totally made C++ ineffective since this style of leadership. Part of the problem is it also lacks coherency, as you just pointed out, and lacks consistency. Some features are not coherent with other features, and there's no consistency from
Starting point is 00:55:15 feature to feature, or even within a single feature, as you said, it might be there one day and gone the next day. So we attempted to do that by writing a coherent document that lays out what direction we think C++ should go in the short term, which is next one to three years, in the medium term, three to five, and then in the long term, five to ten years, so that we can lay out what area we should focus on. It's especially important when we have over 200 people now attending, or at least we did before the shutdown, and some of them are brand new. And so we figured that it would be helpful for us to set directions for many of the new people, even the old hands, who can now understand this is the direction we want to focus on right now.
Starting point is 00:55:56 And there's a document called P2000. You can do a WG21.link search on it, and you'll get the latest version of the document. That's another call I go on every two weeks, is I have to call with the direction group and say, you know, what's happening right now in C++? Is there anything we're concerned about? Are there features that need more coherence that are traveling separately through evolution
Starting point is 00:56:21 and separately through library? But they're actually similar features, or they're dependent on each other features. So we really need to bring those groups together. So we talk about things like that. We talk about what direction we should set for C++20. What are the key likely features that we should put in and focus on for C++20? What are the features we should relegate to C++23?
Starting point is 00:56:44 So in a way, we hope people listen to us. We don't make them listen to us. You can't because, you know, every company's there or people are there potentially for their own reasons or for the company's reasons. But we hope that by doing that and maybe people
Starting point is 00:56:59 respect that we actually have some history on C++, that we would be some history on C++, that we would be able to help narrow the focus of the committee, especially given the fact that there's so many people. So have a look at the document P2000. I'm about to publish another iteration, the second iteration of it. And, you know, so it's a pretty big document now. It's almost, I can't remember now, maybe 20 or 30 pages.
Starting point is 00:57:25 But it does talk about what we would like to see in future C++. But it also talks about what process we would like people to follow, given so many people are there. Part of that is because we think that the committee, with its large number, is getting somewhat more fragmented. And we would like a more coherent attitude, a more friendly attitude between people. You know, I was there when C++ was only 30 people, 40 people coming. And you pretty much knew what everyone was working on. Nowadays, with 200 people, 21 study groups,
Starting point is 00:58:02 most people don't know what someone else is working on. And we felt that there was a bit of lack of trust in the committee that someone, a colleague's work, and we're not trusting their expertise. So we wanted this document to assure people that, yes, you need to trust the process, the iterative process of the committee, where one group double-checks the work of another group,
Starting point is 00:58:24 they triple-check, and that's how the committee works and you should not vote down something just because you yourself have not checked over the work personally right okay that makes sense it's it won't work if everybody does that right right okay well michael i feel like we could go on for a lot longer with all the stuff you've been working on. We will definitely have to work you guys have done and I really admire the work you're doing at educating and spreading the C++ law.
Starting point is 00:59:13 The only thing I wanted to close with to say is that SQL 2020 is coming and it's going to be it's in provisional now and it's going to be finalized later this year. We have a number of ways for people to help with the specification. Right now it's going to be it's in provisional now and it's going to be finalized later this year we have a number of ways for people to help with the specification right now it's public so you can go read it and and get updates and can actually comment on it as you need but you can
Starting point is 00:59:36 also join as an advisory panel committee we've our advisory panels have launched now it's almost 10 10 10 people that have privy to the specification before the public. And they're all people from C++ and across the domain for heterogeneous computing. All the experts that I've known and worked with closely in the past. SQL is all about creating cutting-edge, royalty-free, open standard for heterogeneous C++, for compute, for vision, and inference acceleration, and high-performance computing. And SQL features are now available.
Starting point is 01:00:09 Now, SQL 2020 features are now available in Intel's DPC++ and Coldplay's ComputeCPP. And we'll certainly encourage people to give us feedback or join the committee in one form or another to participate. Thank you very much, guys. Sure. Yeah, thank you so much for coming on the show. Cheers. And stay safe and healthy, everybody.
Starting point is 01:00:29 Yes. Thanks, you too. Bye-bye. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in or if you have a suggestion for a topic. We'd love to hear about that, too. You can email all your thoughts to feedback
Starting point is 01:00:44 at cppcast.com We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at RobWIrving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon.
Starting point is 01:01:00 If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.