CppCast - SYCL 2020
Episode Date: July 2, 2020Rob and Jason are joined by Michael Wong from CodePlay. They first discuss GCC 11 changing its default dialect to C++17 and polymorphic allocators. Then Michael shares an announcement of a new version... of SYCL that was just released. And shares information about the multiple standards groups he is a member or chair of. News GCC 11: Change the default dialect to C++17 Build Bench Polymorphic Allocators, std::vector Growth and Hacking Links SYCL P2000 Michael Wong "Writing Safety Critical Automotive C++ Software for High Performance AI Hardware:" CppCon 2016: Gordon Brown & Michael Wong "Towards Heterogeneous Programming in C++" Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license Read the article "Checking the GCC 10 Compiler with PVS-Studio" covering 10 heroically found errors despite the great number of macros in the GCC code.
Transcript
Discussion (0)
Episode 254 of CppCast with guest Michael Wong recorded July 1st, 2020.
Sponsor of this episode of CppCast is the PVS Studio team.
The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we discuss GCC 11 switching its default dialect.
Then we talk to Michael Wong from Codeplay.
Michael shares an announcement about Sickle and much more. Welcome to episode 254 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm okay, Rob.
I just got the email telling me that i am approved for a 2021 mvp
oh did you congratulate me literally just this moment so i think that's year five for me now
something like that very cool i don't think i've gotten mine yet but uh yeah i know those emails
are going out today i forgot that today was the day, in fact. an IoT library for CPP SDK as well. And I thought that there were both C++ and C SDKs
that we talked about last week,
but maybe it is a pure C SDK.
Maybe it is, but it works with C++.
Yeah, obviously you can use it with C++.
Well, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook, Twitter,
or email us at feedback at speakass.com.
And don't forget to leave us a review on iTunes or subscribe on YouTube.
Joining us today is Michael Wong.
Michael is the Vice President of Research and Development at CodePlay Software.
He is now a member of the Open Consortium Group known as Kronos, MISRA, and AUTOSAR
and is Chair of the Kronos C++ Heterogeneous Programming Language, CICL,
used for GPU dispatch in native modern C++, OpenCL,
as well as guiding the research and development teams of ComputeSuite, ComputeAorta, ComputeCPP.
For 20 years, he was the senior technical strategy architect for IBM compilers.
He is the Canadian head of delegation to the ISO C++ standard and a past CEO of OpenMP.
He is also a founding member of the ISO C++ Directions Group and a director and VP of
ISOCPP.org and chair of all programming languages for Canada's Standard Council. He also participates
in ISO SC42 on AI and machine learning. He has so many titles, it's a wonder he can get anything
done. Michael, welcome to the show. Thank you. I'm glad you're able to read that without taking
a breath. And That's pretty amazing.
And that's a modified version of your full bio that's on your website.
There's a lot more that could be mentioned.
Well done.
Well done. Hats off to you, sir.
So if you don't mind, we were talking about this just for a few minutes before we got started. But with all of those hats you wear, you're used to normally traveling like five weeks a month, right?
If I could, I probably would end up doing that, yes.
There's a lot of committees I'm part of.
But my first love has always been C++.
Coming out of my experience in C++, I've been asked or prodded to join and lead other groups and other committees.
And I think that's how that cross-pollination between different committees, because many
committees often go through similar problems.
And it was very useful helping each other, you know, from OpenMP to C++, now from C++
to Kronos under the SQL Working Group.
And then from there, I'm now actually on three other ISO committees, one on machine
learning and AI. You didn't know that there was a committee for that. Well, yes, there is.
And these guys are just talking feverishly, having daily meetings now because of the lockdown. And
the other one is about safety, which is my other huge research goal and trying to drive for more
safety in autonomous vehicles. So there's a big group right now of all the top safety engineers of the world, people
from General Motors, Bosch, a totally different area of my previous life that are constantly
online talking about how to make self-driving cars safe.
Wow.
So are you just literally in meetings all the time right now?
I'm often on, because of the lockdown, I'm actually working three times harder.
Lockdown will be over soon so I can work less because at least when I'm in an airport or in a lounge in a hotel, I can avoid going to calling into a meeting.
I have a good excuse. I'm traveling. But now I'm on calls seven, eight hours a day, often triple booked.
So I have a setup here in my house where I have three computers, three screens and three cameras
and three speakers. And often there are three voices talking in my ear with my Bluetooth headset.
So it's become quite fun juggling all these things at the same time. I just got off another
call so I could focus totally on you guys.
So I'm totally not on any other call right now.
I'll have you guys know.
I'm only on your call.
Okay, that's good to know
you're not a double or tripling up podcast interviews.
I was tempted
because the other call was getting into an interesting area
and their call was on MISRA,
which is one of the safety standards
that has to do with making C++ safe.
And they were on their face-to-face meeting, or their so-called virtual face-to-face meeting
today, which started at 5am this morning Eastern Time.
But it was going to end anyway by about 2pm, which is roughly 7pm in the UK, because they're
following UK times.
But I'm used to that.
Ever since I joined Coldplay, I kind of live British times.
I'm generally up at 5am in the
morning because that's when my office
wakes up and people start going in.
And by 1 or 2pm
things are slowing down and then
I stop for a moment before I choose
whether I should continue working
or go
out and play some tennis.
But actually, yeah, that's about the best you could possibly get huh eastern uh north america to uk is only only five hours difference you got it yeah
jason you must know this stuff by heart yeah yeah you you got it exactly right there's a period of
time you can get a lot of things done for europe uk yeah early in the morning and then
before the west coast coast wakes up and when the west coast wakes up they always demand time
starting around noon to later on day right so so there's a brief period of time when i can actually
do real things between before all the calls ramp up again. And ISO calls nowadays are just totally, they've just given up at trying to make it work for
any one person.
They would now be called at 1am, 3am, 5am in the morning because sometimes they have
to favor Asian times.
You know, I'd be in bed with my Bluetooth headset, my wife would be sleeping beside
me and I'd be listening to a call coming from Japan somewhere.
I did that with SG-14 last month.
What I did with SG-14 was I inverted our regular call from being 2 p.m. Eastern time to 2 a.m. Eastern time, so that it would be like 2 p.m. in New Zealand.
We wanted to make sure that we could get our New Zealand friends and our Australian friends on SG14
on the call at least once so that they could be comfortable
because they have never been able to call on the SG14.
All the ISO call tends to be very European, US-centric, as you
guys know. I am curious, though, if you don't
mind, you said that there's a lot of valuable cross-pollination.
And I'm thinking, like, Misra, I mean, we've had other people on the show that have talked about, like, Misra and safety standards.
And I'm just kind of curious if there's anything specific you could point to where you see, like, an overlap between Kronos and Misra because they feel like completely different topics.
They are, but they are not anymore in our world. You could also say C++, to some extent,
is not about safety because it generally could be more about performance and portability. But our world is definitely merging. A lot of workloads out there are now merging in terms
of performance, being portable, but also needing to be safe because they're getting the consumer hands.
I mean, what is a self-driving car but a supercomputer on wheel doing very high-performance stuff, potentially on portable things, on portable devices, at least when you get away from Tesla-type stuff?
Because people want to have things that can be reused the next year
instead of keeping it from the ground up.
So speaking about Kronos and MISRA,
what's going on is a lot of Kronos standards are being tested with MISRA.
So Kronos things like Kronos Vulcan has a safety-critical group, and what they do is they take the Vulcan API and run it through MISRA. So Kronos things like Kronos Vulcan has a safety critical group
and what they do is they take the Vulcan
API and run it through MISRA.
Get it through all the
warnings and the
defects that comes out of MISRA on it
and then refix the API so that it can make
Vulcan safe. They do that with OpenGL.
We plan to do that with OpenCL.
SICKO plans to build
such a thing so that it can be appropriately used in a self-driving
car.
So this is why there's now tighter and tighter integration between the groups.
Kronos is using MISRA.
C++ in some domains is definitely interested in having MISRA.
We actually have MISRA coming to C++ standard meetings,
attending SG12, which is the vulnerabilities optimization group. And we cross-pollinate
the two groups so that, you know, there's a new MISRA rule we want to check with C++ to see that
the experts agree that this is a good rule. Have we missed something? Do we not know about
something about template argumenttype deductions that
we didn't know about that we should
protect against? Or the
opposite, which is, are we protecting too much?
One of the big complaints in
the past about MISRA is it's too
blanket in its restrictions. Don't use
template. Don't use exception.
Well, that's just not going to fly in modern C++.
Right? And it's not
flying with self-driving cars either.
They're not saying that you should never allocate dynamic memory.
They're saying you should not allocate dynamic memory after the car starts up.
And you could allocate as much as you want before the car starts up.
But after the car starts up, you can still allocate dynamic memory, but you have to make sure that it doesn't throw an exception.
You have to make sure that it's got enough space.
It doesn't fragment so that it can be deterministic so that you don't get a 10 second breaking warning when the requirement in the law is that it has to be two microseconds.
So these are very, very tightly connected fields that needs those cross-pollination of ideas.
It needs the experts who are the safety experts, but it also needs the experts who are C++ experts,
who are the heterogeneous experts, who are the concurrency experts.
Yes, the world is definitely converging now.
That is very interesting.
Now, I apologize for the sake of our listeners because I definitely skipped ahead.
We haven't gotten to the news yet. And some of these things like MISRA and Kronos, we definitely will
have to come back and make sure we define them so that our listeners know what we're talking about.
Right. So let's quickly get through the news. And then, Michael, feel free to comment on any of
these, and we'll start talking more about Sickle and your work in iso and maybe more about mizor and chronos sound
good sounds good okay so this first one is an announcement uh that gcc 11 i believe is going
to now target c++ 17 as its default dialect which is great news about time yes thank you
it's not easy for them to do it, I can imagine.
I was very close to following GCC when I was the Excel compiler lead,
and I had to make sure that I had to duplicate almost every GCC feature back then.
Everything, pretty much.
Because back then, before Clang, they were the 800-pound gorilla,
and you had to follow them very closely.
And it was painful for them to switch to c++ 11 because um c++ 11 broke a binary compatibility
with sso and so to transition from but since then they've learned and so the transition to 14 was
very quick and 17 you know it was much quicker but now we're talking we're now also talking
about that the support build is also built with 17, right?
Right. And they did point out they
are still missing two C++17
library features, but
they said they're not going to let that hold them up,
which is good. Oh, okay. Yeah, to be
honest, it was actually, one of
them was one I didn't even know existed
and I thought that I had covered almost
every C++17 feature on
my YouTube channel.
Which one is that?
The standard hardware constructive, destructive interference.
Yeah, yeah, yeah.
What is that?
A little known C++ feature we put in and it's actually really important in parallel programming.
When I was the CEO of OpenMP, one of the biggest problems
with parallel programming is false sharing. So this is the idea where you have a cash line
and you touch the cash line with one thread and then a different thread, now because of the
locality, because it's so close, a different thread might keep pinging that cash line.
And now you're unnecessarily thinking that the next bite that you need access is actually on that same cash line but it's not okay so what this
does is it gives you the ability to create a distance of separation so to know that this is
the size of the cash line that's called the destructive interference it means that you won't
need to ping this cash line for the next bite because it's not anywhere close to it.
The opposite is called constructive interference, where it is actually on that cash line, and you actually want it to be in that.
You actually don't mind it pinging that particular cash line over and over again.
So, yeah, it is actually really important, but it's a very rarely known.
I actually use that with many of my talks at an opening to let people know, hey, did you guys know that C++17 now supports false sharing
or restricting false sharing? Well, theoretically, if GCC doesn't support it yet, then...
That's true.
Well, I just had this conversation about false sharing with someone the other day,
so I believe they listened to the podcast.
Yeah, if you are involved anywhere with parallel programming, that has always been a problem with every single parallel language, not just OpenMP.
Every single parallel language, and now C++11 has introduced parallelism and concurrency, has this exact issue.
It's not a huge problem to get around.
You just have to have a way to get around it.
Before this, in order to get around it,
you probably had to use the vendor hardware.
The vendor will sometimes give you some way
of getting around that for sharing
because they know the cache size of their hardware.
So it was kind of just vendor-specific always.
So you have to use a different one for every machine,
from IBM to ARM to whatever to Intel.
But now this gives you a standard way of doing it,
and you don't have to keep changing it.
It says, if I'm on ARM, then use this particular command
to reduce the pinging on the cache line, things like that.
It's very important, yeah.
It is interesting, though, that it's constexpr,
which means it has to be generalized a little bit
or tuned to a specific
architecture or something right right right okay because it doesn't work for a different architecture
well no but i mean even between like an eighth gen i5 versus a ninth gen i5 right that might
change the cache line size right that's right that's right that's okay yeah yeah okay uh next Okay. Yeah. Yeah. Okay. Next thing we have here is a new web tool, and this is build-bench.com.
And Jason, you said this is put together by Fred Tengau, who's been on the show before, right?
Yeah.
Yeah.
So it's just like his quick bench to quickly test two different bits of code against each other.
This one lets you quickly test the compile time differences between two different bits of code against each other this one lets you quickly uh test the compile time differences
between two different bits of code right so if you're playing with different options yeah right
so and the option or the um example he has here if you just go to the website is you know a simple
hello world program one using cstdio and one using iostream and it shows the pretty drastic
difference between uh how long it takes to build the iostream version it shows the pretty drastic difference between how long it takes to build
the iostream version but you could you know toggle different settings and then you know try out more
things in your little code samples there it's pretty i i was actually really curious about
this specific example so i commented out the body of main and it turns out that the entire compile
time difference is just in a processing iostream header file you don't have
to actually use that yeah so so compile time differences has has not always been the most
popular thing people are gonna was looking at it was always runtime differences but more recently
compile time has has re-emerged as the big candidate.
And indeed, this is why we have modules, right?
In order to try to reduce the constant compilation problem, where every time you touch some include file, this is the second biggest problem in C++. The first biggest problem in C++ was the error novel problem, as you guys know, and that's what concept solves, right?
So it doesn't give you the error novel on, as you guys know, and that's what concept solves, right? So it doesn't give you the error novel
on every template instantiation.
The constant compilation,
recompilation problem has always been an issue.
And now with modules,
theoretically that can bring it down
to a more constant time.
And having some sort of benchmark
that can allow you to be able to test that
is extremely important.
And it's definitely a welcome component for C++, I think.
Yeah, definitely.
Okay, and then the last thing we have is a post on Bartek's coding blog.
And this is about polymorphic allocators, which is a C++17 feature that I don't think
we've really talked about on the show before, right, Jason?
It was briefly mentioned when...
So, okay, a little bit of a timeline here if
you don't mind it was briefly mentioned when we had uh john lake osama and he mentioned a talk
from some of his colleagues uh alistair meredith and pablo pablo i can't remember his last name
pablo help and they're all good friends of mine john pablo and alistair meredith they're all good
friends of mine so uh he mentioned a talk from them from CppCon from last year, which
I went and watched, and that inspired me to make a C++ Weekly episode,
which then he caught on
his blog here and mentions my C++ Weekly episode and then
does a little bit more of a digging into PMR. Right. So PMR is an
important C++17 feature,
and I imagine it's going to get even more important.
Polymorphic memory allocation is a key component
that can control the growth and the regrowth
of things like suit vector.
And, you know, I talked about early on
that in the self-driving car domain.
One of the things that requirements for using dynamic memory after the car engine has started is that you've got to be able to use a memory allocator that is guaranteed to not throw exceptions.
That's that is safe. That can that is not that will not fragment.
Well, this is something that polymorphic allocators can serve.
OK, instead of the standard allocator that C++
offers you. And this is why I think it's an important, important, important addition.
I've actually talked to John directly about how we can make an allocator that fits the
self-driving car domain. And he's actually promised me that this actually can work. I
mean, although I have to examine it closely, I think there's definitely usability there to give you something that is a safe allocator. Ultimately, at some point, memory
allocation has always been the big crux of the problem for real-time safety-critical code,
things that you use in a self-driving car, things that you use in a pacemaker, things that you use that are medical or extremely safety-critical systems.
It's the fact that memory allocation is non-deterministic.
You don't know when it might fail.
You don't know when it might suddenly take too long.
You don't because of fragmentation.
These are critical issues.
And in a way, exception handling, the whole problem with exception handling, or one of the problems, not the only problem, is because it seeks memory allocation during the exceptions.
Right.
For the exception object and the hierarchies and all that stuff.
And solving memory allocation is critical in the safety domain for both just memory allocation itself
and the issue with exception handling. So one thing I noticed, which is off topic from
this particular article, but on topic from what you just said, is my understanding,
at least from what I've seen so far, is that none of the standard container member functions
are, let's say like pushback, is not conditionally no except on whether or not the
allocator would throw an exception. And it seems like that's a hole for what you would need to
prove to say that, no, we know that we can safely use this standard container with this memory
allocator because we know the memory allocator is no except and that propagates through to the
container. Yeah, we would like them to be noexcept.
Yeah, yeah.
Okay.
Yeah, we would like them to be, but they're not.
Right.
So there's no easy way around it.
This obviously is suitable for a very specific domain,
and we might have to...
There's been talk about creating a safety version
of the C++ standard library, okay?
Because right now, it's not specifically for that in some ways.
It's good for performance, but it might not be the best for safety critical,
but it can be if we set our mind to doing something like that.
Right.
All right.
So, Michael, we had Gordon Brown on maybe two years ago talking to us about SICKL.
Could you start off by reminding our listeners of what SICKL is?
Right.
So SICKL is part of the Kronos portfolio of open specifications
and belongs in the parallel computing group.
It's a single-source C++ parallel programming language
that takes standard ISO C++ applications, like even TensorFlow,
and then compiles them with a whole CPU compiler
and a device SQL compiler to generate code for many kinds of devices.
In a more simple way, let's just put it this way.
ISO C++ doesn't support GPU programming or heterogeneous programming directly.
What SQL does is takes ISO C++, adds the heterogeneous programming layer right now, today, and then
makes it possible for you to dispatch the GPUs to artificial intelligent machine learning devices
that are being used in self-driving cars, for FPGAs, for DSP, for any kind of offloaded devices.
So that's really the only, you know, forget the words I said, they're important words,
but at the end of the day, it's really just about making ISO going away, going towards
a direction which we think ISO wants to go anyway. And certainly there's a large group of us within
ISO who are slowly driving ISO towards C++, towards that direction. But the reason they
don't have it yet is because it takes about 10 years probably to get that done properly,
because there's a lot of legacy code, millions of lines of code
that might not care about heterogeneous or GPU programming,
or they might in the future, but not yet.
So we have to make sure that it works within that whole framework
so that the memory model, the data model, the concurrency model
works within the current framework.
So do you need to put pragmas or anything in the code?
Or is it just kind of out of magic?
Oh, God, no.
That is totally anathematized.
That is what OpenMP does.
Right.
OpenMP was a really great way of getting you to parallel and X parallel
and then heterogeneous acceleration by adding pragmas.
I know because I was the CEO for about five years and actually led them through the transformation
to accelerate the programming.
We know that pragma, the problem with pragma, other than your dislike of it, I can see your
face grimacing every time I use that word.
Totally understandable. Cool, guys.
No worries.
The real technical problem is that it does not allow you to separate concerns.
In a single pragma directive, you would encapsulate where you're going to do the offloading,
how you plan to do the offloading, when you do it, in what manner of splitting the threads you're going to do it.
It makes it very hard in a C++ style to be able to say, wait a minute, I don't need to do it. It makes it very hard for, in a C++ style,
to be able to say, wait a minute,
I don't want to do it exactly like that.
I want to do it this way.
The other problem is, you know,
I used to write the OpenMP compiler for IBM
and added those things.
The other problem I had a huge problem with,
and even today it's still not easy to solve,
was where to put the error message.
Because the pragmat directive encapsulates so much, but it
often comes after,
sorry, it often comes before the actual action.
You have to set the directive, then you
have to set the loop.
In the loop, you often can't figure out
which directive is enforced
and what error message should I issue.
So most of the time, you just give up on the
error message. Just don't say anything.
I mean, that's just not going to fly
in a commercial system that just can't give any reasonable error
message. I mean, I'm not proving OpenMP. OpenMP is fantastic
and it works over three programming languages, C, C++, and Fortran.
But the problem with not being able to generate
good error message wasn't totally pervasive.
It was there for about 30% of the cases where it was hard to generate error messages, but it was enough to be bothersome.
The other problem is how you can put a pragma on a template argument.
You can't.
Can I make this template argument parallel?
Well, you can't.
You can't put it anywhere.
There's no grammar space to put that in so it is
a real it is a difficult problem but now we don't need that sickle can do it all in all natural c++
it looks just like c++ okay but we're not the only one um many other programming languages have
adapted to it so while i really enjoy sickle and i took over the leadership of it and the chairing of it. I also learned a
lot from all the other programming languages that have come before us along the way to learn how to
add heterogeneity to C++. HPX from Hartmut Kaiser does a great job of it and it's pure C++. CUDA 8,
9, the later CUDA is much more adaptable to C++. The early ones were more C-like, but definitely the later ones are definitely more C++.
NVIDIA has another thing called Agency that does it.
Beyond that, there's obviously Boost Compute, which is compute,
which effectively adds a C++ layer on top of OpenGL.
There is the U.S. national labs at Cocos and Raja.
They have also done an admirable job of adding a framework on top of C++.
So all these other languages have learned how to add heterogeneous programming to C++ along with SQL.
And they all have valuable learnings.
And interestingly enough, they've all solved problems in similar ways. So I say it's time to add it to ISO because it's not like we're digging for gold in Alaska here.
We are not exploring Proxima Centauri.
We are literally going through well-charted ground.
We know where the problems lie, where the demons are, most of it. I gave a LLVM keynote in 2018 where I said that the
four major problems of heterogeneous computing is data,
data movement, data locality, data affinity. It's just all
data. And I forget the last one anyway, but
you can look it up. But the point is that we now know where the problems are, and
it's time for us to try to add it to C++.
But that process will take time.
But in the meantime, we have all these other languages
that are available for you to try to use it.
And, of course, one of them is standardized across an open community
of many companies, and that's what Senko is.
Sorry for the long-winded answer.
No, that's great.
And on that note, is there already an ISO study group that is working towards standardizing something like SICKL into ISO?
So there is movement within ISO trying to add heterogeneous programming.
In my keynote from LLVM 2018, I called it the quiet revolution.
There's no official study group, but a lot of people
are trying to help it along.
The reason is, this is not just
the perennial any one study group.
It crosses several study groups.
It started when SG14
did a survey. SG14 is one
of the groups I chair that has to do with low latency
but with games programming. The games
programmers came back and said, we really
would like you guys to add heterogeneous programming. And so from SG14, because it's a lot of parallelism and
concurrency, rightfully, it landed a lot of features in SG1, as well as the library group.
So I can delineate some of the efforts. So within SG1, we've opened up the specification to make it possible to add heterogeneous programming.
Part of that has to do with fixing things like
std thread, which only talks about heavyweight CPU thread, and changing
that word to execution agents. By that change,
by that very subtle change in the language, it means now you can add very, very lightweight
threads that are typical GPU threads.
Okay.
There were other things that had to do with forward progress that we had to change the language for.
That was imperative for GPU because GPUs typically have very weak forward progress.
Okay.
Then we also did other things like add span and empty span through the library group, so that now
you can have ways to pop up a little layout.
It turns out that in GPU
programming, it's extremely important to know how
your data is laid out. Because
in CPU, you guys have all
heard about contiguous layout, things have to
lay out contiguously. Well, that doesn't work
when you're in a GPU.
GPUs do not like things contiguously
because it's got thousands of threads
working on it at the same time. So every
thread that comes in needs to skip a certain
amount of space. So that's why
they call it coalesce.
They actually have to be skipped.
They have to have a certain span and then they have to
be re-coalesced as a result.
So this is one of the key things that
has to do with the fact that you have to be able to control
the data layout.
I and my team are working on another aspect on data affinity,
which is the ability that certain memory and agent have small affinity to each other.
They could be closely placed or spread out.
That feature is working through SG1.
In order to make it all work, we can't just have one study group that does it.
We kind of have to do it through everybody.
And that's why I call it the quiet revolution.
It's a revolution happening with a lot of people's help, but not any one single group controls it.
And I actually like it that way.
I don't want any one company or one group to dominate the design because it turns out that everybody have good ideas.
Okay.
I want to interrupt the discussion for just a moment
to bring you a word from our sponsor, PVS Studio.
The company behind the PVS Studio Static Code Analyzer,
which has proven itself in the search for errors, typos,
and potential vulnerabilities.
The tool supports the analysis of C, C++, C Sharp, and Java code.
The PVS Studio Analyzer is not only about diagnostic rules,
but also about integration with such systems as SonarCube, Platform.io, Azure DevOps, Travis CI, CircleCI, GitLab CI, CD, Jenkins, Visual Studio, and more.
However, the issue still remains, what can the analyzer do as compared to compilers?
Therefore, the PVS Studio team occasionally checks compilers and writes notes about errors found in them.
Recently, another article of this type was posted about checking the GCC 10 compiler.
You can check out the link in the description of the podcast. Also, follow the link to the
PVS Studio download page. When requesting a license, write the hashtag CppCast and receive
a trial license not for one week, but for a full month. So I believe you have an announcement to make about a new version of SYCL.
Is that right?
Yes.
So this is actually a great time to come to it.
We have been SYCL has been so in 2018 at IWACO, which is the workshop for OpenCL in Oxford.
We had we had SYCL 1.2 in 2015 and then SYCL 1.2.1 was released in 2017,
and that was aligned with C++11. At the most recent online IWACO, which was also called
SYCL Conference 2020, I unveiled a possible future roadmap of SYCL, where in 2020, we
planned to release, and we did yesterday, July 1st, actually.
Actually, yesterday, June 30th.
The first provisional
SYCL 2020 release
that's going to be based on C++
17. Because
it's based on C++ 17, this means
that we are enabling exciting C++
features like class template argument deduction
and deduction guides.
That makes code less verbose
there's also a number of great features but the whole point is to make it simpler to write sickle
to make it possible to work with things like very complicated templated applications like
like tensorflow okay that means that you can have things like specialization constants
it also means that sickle we now also have make also have made it more generalized so that it doesn't purely depend on OpenCL.
It still has OpenCL as a main component, but it now could certainly be ported to go on top of Vulkan or PTX or NVIDIA Coder or any as well as AMD's
Wacom so that it can be on
any other kind of backends as well
because it's really just a language
framework. It's really not anything
specific to the backend. So that's why it's
a huge thing. So yes, now you can download
the SQL 2020 provisional
at the Kronos website and also
give feedback at the Kronos community
website channel. We're going to be going through about two or three
months of review, public comment period,
and then we will be going to final in September
aiming for supercomputing 2020. That's a big thing because
we think that this makes SQL much more usable, less verbose, simpler
code, and more closely adapted to C++ 17.
My understanding is that there's multiple implementations of SYCL. Is that correct?
Yes.
Okay.
You can probably find any of my talks on SYCL 2020 online right now.
But SYCL 2020 right now has about four or five implementations, all based on slightly different back ends.
And I can go through some of these with you guys.
Sure, sure.
Part of it has to do with the fact that Intel recently really grasped onto SYCL because they were tapped to supply the next exascale supercomputer at Argonne National Lab.
Okay.
And they have developed an open-source Clang compiler
under their one API framework,
and it is called Data Parallel C++.
Okay.
It's an open company collaboration.
A lot of companies collaborate on it,
but Intel has supplied a lion's share of the manpower at making it.
And there's a lot of SQL 2020 features that are already in there.
The other one that I wanted to add is that Coldplay,
obviously the company I work for,
has a commercial implementation as well as a free download version
that has various
so we have already
implemented various forms of SQL 20
within our compiler
and it's called Compute CPP. You guys
mentioned Compute Suite. Compute CPP
is the
SQL version. There's also
you mentioned Compute Aorta
that is actually the one that supports
OpenCL. So Compute CPP in use in things like the Renesas
R car for self-driving cars, advanced driving assist systems, as well as a number of other
commercial implementations and industries. There is also Xilinx FPGA implementation, which used to be based on OpenMP. The backend
actually uses OpenMP constructs. So this shows you how. So you can see that, you know, for
some of these, it was based on OpenCL, but these other implementations showed us that
we could easily make it work for other backends like OpenMP or later on you'll see Rockham. There's a guy at Xilinx named Renan Cariel,
he's my editor for SYCL. He's done an amazing job in the last couple of weeks at making sure
the spec goes out properly. What they do is they're trying to make it work for FPGA,
which is a porn segment coming up, I believe,
in the mobile and edge device domain.
Then there's also University of Heidelberg,
a gentleman named Axel Alpe.
I think he's a graduate student.
And what he's doing is he's built something called Hipsicle,
which was originally built for implementation
that uses OpenMP for any CPU, CUDA for
NVIDIA GPUs and ROKM for AMD GPUs. So that was very very useful. All these
guys, all these implementers are on the call, on the working group call, which is
like twice a week often. I also want to mention that Xilinx, yes, that adds to my other calls.
So the one from Xilinx, I believe, now has switched to the Clang base as well, too,
because of the work that Intel has done.
So obviously the Clang one supports CPUs, multiple CPUs, GPUs, and FPGAs through OpenCL with Spear V
and NVIDIA GPUs through CUDA.
The Coldplay one, the Compute CPU P1, can obviously support a host
of GPUs, FPGAs. We tested it on
ARM and AMD devices.
And specialized accelerators
through OpenCL
or with Spear or SpearD, as well
as NVIDIA GPUs through PTX
ingested through OpenCL.
And it's the first conforming
implementation for SQL 1.2.1.
Cronos has this thing where
unlike C++, ISOs, where they couldn't 1.2.1. Cronos has this thing where, unlike C++,
where they couldn't have a conformance test,
Cronos has this conformance test you have to pass.
And the compiler that passes this conformance test
gets branded as being conformant to that specification.
So we're going to do the same thing right now for SQL 2020
to create a conformance test so that people can walk through.
If you're familiar with specs cpu benchmarks
It's kind of like that you have you have to get through that benchmark to be conformant and the same thing happens here
So the Xilinx one is of it's called tricycle and it used to use an open MP backend for any CPU and open CL
Which spear LLVM for silence FPGAs, but I believe don't quote me on that
but I think it's heading towards Clang
for using the Intel Clang
distribution as well.
So yeah,
this is good
as an ecosystem growth
because it's like C++, right?
C++ has GCC,
has Clang,
and it used to have
a lot more distributions.
That has actually gone kind of bad that a lot of distributions are getting back to just Clang, and it used to have a lot more distributions. That has actually gone kind of bad,
that a lot of distributions are getting back to just Clang.
But at least there's still GCC to keep Clang on this.
And honestly, both of them have done each other good.
Before then, and I was deeply involved with it,
there was the Sun compiler, the Microsoft compiler,
the Excel compiler, and of course, the EDG compiler. I think HP used the EDG
and Intel used EDG. So really, way back when, before it consolidated to basically only three
compilers, there was basically about seven compilers that compiles C++. So SYCL is the
language. It's like C++. And these implementations, you know, ComputeCPP,
Intel's DPC++, Xilinx and Heidelberg, is like GCC, Clang, Excel, Sun, the EDG compilers.
So you know, in the C++ domain, we now are down essentially to three major compilers,
right? Most of the other compilers have mostly floated to these three.
I'm actually not a fan of that direction myself.
I think EDG is still being used in the IDE
for Visual Studio, so that's... I believe so. Yes.
So it is at least one, maybe there's four, but yeah.
Visual Studio is still kind
of unique in that it's still a surviving uh standalone its own implementation it hasn't
gone totally to clang although i think they're using clang for the expression evaluator for the
debugger uh i don't know yeah they're still they're definitely doing a lot of clang integration i know
yeah so i'm curious um because you mentioned xilinx and fpgas a couple times
is the goal to make it so that sickle c++ code actually compiles into hardware effectively
so i'm not a fpga expert so i can't really speak too intelligently about that i'm i'm going to punt
all those questions to my friends at Xilinx,
Renan Cario, as well as my friends at
Intel Altera,
like Mike Kinster,
who's actually, before
Altera, joined with Intel and all that.
These are two major FPGA vendors
that are in the SQL group,
who are potentially working hard.
At the supercomputer last year, there was a
major paper
by a couple of the Altera people
showing how to use SYCL in an FPGA format.
So honestly, I don't know whether it's trying to push it into a Verilog or VHDL
or directly to hardware.
I actually think it's mostly through SPIR-V to the hardware.
So I think it's a shortcut to prevent you from having to use Verilog and VHDL, which takes days and days to compile.
Right.
So I actually think that that's the way it's going.
We actually recently had a talk at my meetup about FPGA development and C++,
and the speaker and others made the argument that when we're programming, generally thinking, we think linear flow.
Hardware is inherently parallel,
and therefore those two things don't map together.
But now I'm just hypothesizing here, thinking,
oh, well, if SQL, you're thinking effectively inherently parallel, right?
Because that's the point,
is that you can offload it to a massively parallel GPU.
Then maybe that makes that mapping easier.
It could be, yeah.
I wonder if that's what they're doing. Maybe we'll have to get someone on.
I'm on a CTV cast with my friends from Altera and Xilinx. I really hope we explore this area.
Can we talk more about just some examples of what
kinds of workloads, what kinds of applications benefit the
most from something like SYCL?
Right. So typically, with Coldplay's implementation, we focus a lot on specific vendors, GPUs,
or things that they want to put out in about two or three years' time, and they want us to build a specialized tool chain for those
GPUs or AI processors. And lately it's obviously focused a lot more into the type of processors
that goes into self-driving car like the Renaissance R car. But Intel with Coldplay has now focused
a lot of their efforts on making this workload now work in a high-performance computing domain.
And interestingly enough, high-performance computing and the machine learning AI domain
is very similar, actually.
And so what happens there is that SQL is being taken to be tested on all the computational
fluid dynamics, quantum chromodynamics workload that the typical supercomputing center and
applications do for things like weather forecasting, genome calculations, mapping to atomic pile
safety calculations, which requires massive GPUs. So the typical supercomputing clusters now, that is aiming for exascale computing, are
essentially thousands and thousands of CPUs with thousands and millions of GPUs.
Wow.
The next three supercomputing that is coming online, the first one is called Aurora, and it has essentially Intel XE CPU processors
with Intel GPU processors.
And so what do you program that with?
Well, you can either use OpenMP, right,
if it's C or Fortran.
But if it's a heavy C++ workload,
you probably don't want to use OpenMP.
OpenMP has C++ support capability
for acceleration to GPU,
but it's not that great from a C++-centric point of view,
especially if it has a lot of templates.
So they chose in SYCL.
And it's a good reason because they can participate
in the open development of SYCL.
It's wide open.
It's part of the Kronos group.
Intel is part of Kronos, so they joined of SYCL. It's wide open. It's part of the Kronos group. Intel is part of Kronos,
so they joined the SYCL group.
The SYCL group started out
with something like 10 people
and now has routinely 25 to 30 people
calling in to listen to this.
ANL at Argonne National Lab is there
and Intel and obviously
a whole bunch of other groups are there
like Xilinx and Qualcomm,
AMD and obviously Codeplay as well as a number of national labs and universities.
But the point is that with that kind of workload, how do you program it in a standard way?
Now, I don't know if you guys know about supercomputing evolutions in the last 10, 20, 30 years.
I lived it for about 20 years when I was at IBM because we were trying hard to push through to the petaflop domain. That's 10 to the 15 floating point
operations per second. Now, we've achieved that. In fact, the first computer to break
petaflop was the Roadrunner, which uses a cell processor, which is also incidentally
the same processor as in the PlayStation. PlayStation, IBM, and Toshiba combined to create that specification.
But it uses a computation model that is essentially using DMA access
with a separate host and separate device code.
And it was difficult, really hard to program.
Now, the thing that you want to know is that with the DOE and the Exascale computing workload, they tend to have very stable workloads that live for 20, 30 years.
Codebase will live for 20 or 30 years.
But their machine is always the latest.
Every five years, they buy a brand new machine.
It used to be $30, $50 million.
That was $100 million.
Now it's $600 million.
Yes, the U.S. government has a lot of money. Well, maybe not anymore after this whole thing. But the point of
it is that they have the latest hardware to run these things. So what do they want? They don't
want anything proprietary language. They don't want any proprietary language. They want an open
language that is a standard language that lives for 20, 30 years, that they can at least have some say in how that language is developed.
So that's why they love things like C++,
generalized, standardized, general purpose language like standard C++,
standard C, standard Fortran.
Every DOE contract has those three listed right on the top.
And as a vendor, you have to checkmark every single one of them. So SYCL works in that domain because SYCL is an open standard language.
So that's why they're looking at using SYCL to program these kinds of workloads.
So I talked about truenum, which has to do with machine learning chips and HPC. There are other
ones, obviously. There's also high-performance computing.
Sorry, I mentioned that. There's also FPGA,
embedded systems.
SQL is particularly good to work with things like embedded systems,
as well as AI and machine learning
chips, like tensor processing
units that are out there.
So those are the kind of domains
that SQL can work in.
We haven't gained leadership in every single one of those domains,
but that's basically what the group is working towards.
We've gained surprising leadership in the HPC domain,
which is a very, very good win for us.
But we're not resting on our laurels.
We think that there's a lot more places for SQL.
I'm sorry, just to clarify,
did you just say that there are specific processors that are just designed to run tensor workflows?
Oh, yeah.
That has been the big chip revolution in the last three, four, five years.
Okay.
It's because they've been looking at things that mix.
They're called tensor processing units.
They're loosely grouped as AI ML processors. And what they are, they
optimize into machine hardware
things like matrix calculations. Because most of machine learning
algorithm is nothing more than just a really giant regression algorithm
that uses matrix calculations that does a lot of
back and forward propagations across these matrices.
So if you put those in hardware and then set the memory so that it works well with those constant retrievals of that little bits of data,
it theoretically could make your tensor algorithm much faster than doing it all in software. So there are tensor processing units now in
operations at Google
as well as a number
of large other companies.
Now you and I
can't actually buy
these things mostly.
Okay, so that's why
most people don't
know on that aware
but totally understandable.
But yeah,
in that world
where I live now
is the other part
of my research
is machine learning
and AI.
That's the other group that I research is machine learning and AI.
That's the other group that I lead in SG19 and ISO C++ is totally living.
I mean, to be honest, I honestly don't know if this is a really good solution trying to build in hardware, all of these things.
You know, my life has never been about totally has always been about the fact that general purpose language, when right can actually do a lot of these things much um comparably with a comparable speed to to hardware
and indeed that's one of the c++ um um part of the direction group that i am um their their
philosophy is that the generalized purpose language can build um abstractions that are just as fast as any specialized hardware.
And sometimes it takes a long period of time to prove that out.
You know, we're still trying to convince the embedded group people that C++ can be just as fast in the embedded group.
But it does take work, right?
I mean, the fact that we think that many of these things
can be done away with by a really good optimizing compiler.
Right.
So the same thing can happen in TensorFlow,
in the machine learning domain as well, too.
Since you just mentioned the ISO directions group,
and in your bio you mentioned that you were one of the founding members of it,
is there any news coming out of that group
about the direction of
C++ 2x
23? I want to correct one thing. I'm not a founding
member. I'm one of the
originally invited members.
Okay.
Although I'm chairing it this year, but it doesn't
mean anything. We rotate the chairs
every year to make it equitable and spread the
load. So yeah, in a way, I guess you're
right. I am part of the founding member. I was in the original founding, but I just didn't want to make it equitable and spread the load. So yeah, in a way, I guess you're right. I am part of the founding member.
I was in the original founding,
but I just didn't want to make it sound that important.
It isn't.
So the C++ Direction Group was set,
we think we were invited
because we have been involved with C++ for a long time.
And we think because we have shown particular impartiality in our work in C++ for a long time, and we think because we have shown particular impartiality in our
work in C++. So the members there are obviously Bjarne Stroustrup, the inventor of C++. There's
Howard Hinnan, who used to be a chair of the library working group. There is David Vanderford,
he is vice president of EDG compilers. There's Roger
Orr. He's been a long-time
ISO committee member and the head of delegation
for the UK. And I am
there as well, representing
well, I guess I have
been with C++ for over 20,
25 years, almost 25 years now.
I've put in a
number of C++ 11, 14, 17
features, been chairing a number of different groups.
So the direction group was trying to do something that was sorely needed,
because what it was doing was that before this, believe it or not, C++ direction was more like a Ouija board.
It was more of a Brownian motion.
You don't actually know where the molecules are going to go.
Because it could be
entirely surprising the direction it takes
depending on who actually is present,
who are the new people who just joined,
who's particularly vocal,
who's the loudest in the room.
You know what happens
a lot. We've seen this
in the last with C++20
where there was features that had wide support initially.
And then depending on who showed up three meetings later, that feature is now removed from C++20.
We've been fortunate that it hasn't totally made C++ ineffective since this style of leadership.
Part of the problem is it also lacks coherency, as you just pointed out, and lacks consistency.
Some features are not coherent with other features, and there's no consistency from
feature to feature, or even within a single feature, as you said, it might be there one
day and gone the next day.
So we attempted to do that by writing a coherent document that lays out what direction we think C++ should go in the short term,
which is next one to three years, in the medium term, three to five, and then in the long term, five to ten years,
so that we can lay out what area we should focus on.
It's especially important when we have over 200 people now attending, or at least we did before the shutdown,
and some of them are brand new.
And so we figured that it would be helpful for us to set directions for many of the new people, even the old hands, who can now understand this is the direction we want to focus on right now.
And there's a document called P2000.
You can do a WG21.link search on it, and you'll get the latest version of the document. That's another call I go on every two weeks,
is I have to call with the direction group
and say, you know,
what's happening right now in C++?
Is there anything we're concerned about?
Are there features that need more coherence
that are traveling separately through evolution
and separately through library?
But they're actually similar features,
or they're dependent on each other features.
So we really need to bring those groups together.
So we talk about things like that.
We talk about what direction we should set for C++20.
What are the key likely features that we should put in and focus on for C++20?
What are the features we should relegate to C++23?
So in a way,
we hope people listen to us.
We don't make them listen to us.
You can't because, you know, every company's
there or people are there potentially for their own
reasons or for the company's reasons.
But we hope that by doing that
and maybe people
respect that we actually have
some history on
C++, that we would be some history on C++,
that we would be able to help narrow the focus of the committee,
especially given the fact that there's so many people.
So have a look at the document P2000. I'm about to publish another iteration, the second iteration of it.
And, you know, so it's a pretty big document now.
It's almost, I can't remember now, maybe 20 or 30 pages.
But it does talk about what we would like to see in future C++.
But it also talks about what process we would like people to follow, given so many people are there.
Part of that is because we think that the committee, with its large number, is getting somewhat more fragmented.
And we would like a more coherent attitude,
a more friendly attitude between people.
You know, I was there when C++ was only 30 people, 40 people coming.
And you pretty much knew what everyone was working on.
Nowadays, with 200 people, 21 study groups,
most people don't know what someone else is working on.
And we felt that there was a bit of lack of trust in the committee
that someone, a colleague's work,
and we're not trusting their expertise.
So we wanted this document to assure people that, yes,
you need to trust the process,
the iterative process of the committee,
where one group double-checks the work of another group,
they triple-check, and that's how the committee works and you should not vote down something just
because you yourself have not checked over the work personally right okay that makes sense it's
it won't work if everybody does that right right okay well michael i feel like we could go on for
a lot longer with all the stuff you've been working on. We will definitely have to work you guys have done and I really admire the
work you're doing at educating
and spreading
the C++
law.
The only thing I wanted to close with to say
is that
SQL 2020 is
coming and it's going to be
it's in provisional now and it's
going to be finalized later this year. We have a number of ways for people to help with the specification. Right now it's going to be it's in provisional now and it's going to be finalized later this
year we have a number of ways for people to help with the specification right now it's public so
you can go read it and and get updates and can actually comment on it as you need but you can
also join as an advisory panel committee we've our advisory panels have launched now it's almost 10
10 10 people that have privy to the specification before the public. And they're all people from
C++ and across the domain for heterogeneous computing.
All the experts that I've known and worked with closely in the past.
SQL is all about creating cutting-edge, royalty-free, open standard
for heterogeneous C++, for compute, for vision,
and inference acceleration, and high-performance computing.
And SQL features are now available.
Now, SQL 2020 features are now available in Intel's DPC++
and Coldplay's ComputeCPP.
And we'll certainly encourage people to give us feedback
or join the committee in one form or another to participate.
Thank you very much, guys.
Sure.
Yeah, thank you so much for coming on the show. Cheers.
And stay safe and healthy, everybody.
Yes. Thanks, you too. Bye-bye.
Thanks so much for listening in as we chat
about C++. We'd love to hear
what you think of the podcast. Please let us
know if we're discussing the stuff you're interested in
or if you have a suggestion for a topic.
We'd love to hear about that, too.
You can email all your thoughts to feedback
at cppcast.com
We'd also appreciate if you can like
CppCast on Facebook and follow
CppCast on Twitter.
You can also follow me at RobWIrving
and Jason at Lefticus on Twitter.
We'd also like to thank all our patrons
who help support the show through Patreon.
If you'd like to support us on Patreon,
you can do so at patreon.com
slash cppcast.
And of course, you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode was provided by podcastthemes.com.