CppCast - SYCL
Episode Date: August 24, 2018Rob and Jason are joined by Gordon Brown to discuss his work on SYCL the OpenCL abstraction layer for C++. Gordon is a senior software engineer at Codeplay Software in Edinburgh, specialising ...in designing and implementing heterogeneous programming models for C++. Gordon spends his days working on ComputeCpp; Codeplay's implementation of SYCL and contributing to various standards bodies including the Khronos group and ISO C++. Gordon also co-organises the Edinburgh C++ user group and occasionally blogs about C++. In his spare time, Gordon enjoys dabbling in game development, board games and walking with his two dogs. News CppCon 2018 Poster Program Announced A bug in the C++ Standard Synapse submitted for Boost review New C++ London Uni Course Sept 18 Gordon Brown @AerialMantis Gordon Brown's blog Links SYCL ComputeCpp Parallel Programming with Modern C++: from CPU to GPU P0443r7: A Unified Executors Proposal for C++ CppCon 2017: Gordon Brown "Designing a Unified Interface for Execution" SYCL building blocks for C++ libraries - Gordon Brown - Meeting C++ 2016 Sponsors PVS-Studio February 31 Patreon CppCast Patreon Hosts @robwirving @lefticus
Transcript
Discussion (0)
Episode 164 of CppCast with guest Gordon Brown recorded August 22, 2018. today at viva64.com.
In this episode,
we discuss CppCon posters and the new Boost library.
Then we talk to Gordon Brown
from Codeplay.
Gordon talks to us about Sickle
and his contributions to the OpenCL abstraction layer. Welcome to episode 164 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Bervering.
Joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm all right, Rob. How are you doing? I'm your host, Rob Bervink. Joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob.
How are you doing?
I'm doing good.
Down to just over one month until CBPCon, right?
Yes.
Wow.
That's about right.
So I figured I should probably go ahead and mention, because I haven't spent too much
time talking about it, that, well, I have said I am giving two talks.
The first one, I mean, the schedule is officially out now.
On Monday, surprises in object lifetime.
And then on Friday, so I kind of get the chance to book in the conference,
is applied best practices.
The concept is I just started a project from scratch
and I want to use all the best practices that I teach.
What happens?
Right.
And so then on that note, I am doing two days of best practices training at the end of the conference as well.
And there is still time to sign up for classes.
And we'll talk about more about classes with our guests today.
Yeah, definitely.
Okay.
Well, yeah, you should definitely sign up for Jason's class and check out all the CPPCon classes that are available.
There is a large set of offerings this year, for sure.
Yeah.
Well, top of our episode, I'd like to read a piece of feedback.
This week we got an email from Henrik.
He says,
Hi from Norway.
Love your podcast.
Keep up the good work.
As an ex-troll, I was hoping to promote pronouncing cute the right way, which he phonetically spelled out as cute.
I'm reading that the right way, right, Jason?
Yeah, as far as I know.
And for the record, I think most people do know that it's cute, but we all feel a little weird saying it because we're not sure if other people will know what we're talking about.
Right.
It's sometimes just easier to say the letter so people know exactly what we're talking about. Right. It's sometimes just easier, easier to say the letter.
So people know exactly what you're saying,
but like I'm using a cute gooey toolkit.
Okay.
Which one?
Like it becomes like a,
at least in English.
Anyhow.
Yeah.
Yeah.
It can be,
can be confusing,
but I will try to remember this note and I will try to remember to pronounce
it as cute whenever we,
uh,
talk about the GUI framework
on the show.
We'd love to hear your thoughts about the show.
You can always reach out to us on Facebook,
Twitter, or email us at feedback
at cpcast.com, and don't forget to leave
us a review on iTunes.
Joining us today is Gordon Brown.
Gordon is a senior software engineer
at CodePlay Software in Edinburgh,
specializing in designing and implementing heterogeneous programming models for C++.
Gordon spends his day working on ComputeCPP, Codeplay's implementation of SYCL, and contributing to various standards bodies, including the Kronos Group and ISO C++.
Gordon also co-organizes the Edinburgh C++ User Group and occasionally blogs about C++.
In his spare time, Gordon enjoys dabbling in game development,
board games, and walking with his two dogs.
Gordon, welcome to the show.
Hi there. Thanks very much for having me on.
So this is, if our listeners are not paying attention here,
keeping track, the third guest from Codeplay
that we have had on in as many months.
Now, this was not planned
from our perspective, but
we have to know if you all secretly
planned this and didn't tell us.
It wasn't intentional.
It just kind of happened that way.
So I've been
meaning to
come on for a while, but
work
kept getting in the way. so it all just coincidentally lined
up like this basically yeah so but when i sent the email to uh to ask about coming on i think
i didn't realize that chris and simon were were also planning to come on in the same month yeah
it does seem like there's a lot of interesting things going on there at code play and which of
course we'll talk about more what you are doing there when we get into the interview.
But it seems like you all are having a good time, anyhow.
Yeah, there's a lot of really interesting things happening,
a lot of interesting projects,
and there's a lot of people that are very involved in C++
and the standards in the community.
That's cool.
Okay, well, we've got a couple of news articles to discuss,
and feel free to come
out to any of these then we'll talk more about the work of doing a code play and sickle okay
okay so this first one uh we're mentioning cpp con already they just announced the poster program
and they have chosen uh i think 16 different entries for this year yeah how is that a lot
i mean it seems like, I remember
two years ago when I went, I think there were only like five or six
maybe. Yeah, two years ago, I
was a judge, and I think there was
like six, and it was impossibly difficult
to choose who the winners were. I cannot
even imagine what it would
be like doing these 15 or 16
or whatever it is.
And they do have
all the poster titles and the authors of those posters.
Um,
I see at least one previous CPP cast guest on here.
Uh,
Eleanor Sagalava.
I'm not probably not saying that right,
but,
she's the one who made the C++ lanes.
And I guess she's making a poster based on that.
Yeah,
that looks like fun.
Uh,
I recognized her.
And then the, the only other project I recognize is cl looks like fun. I recognized her, and then the only other
project I recognize is Cling Power Tools,
I think. Everything else looks
like it's going to be interesting,
pretty unique,
as far as stuff that I'm aware of.
Yeah, there's a lot of variety there.
There's a lot of really interesting posters.
I think there's
one I noticed on memory
tagging for
improving safety, that sounds quite interesting
hmm
I'd also like to know
what Funky Pools is about
it's a very abstract poster title
and it's in all caps too
it's not just Funky Pools, it's FUNKY POOLS
yeah
okay, next thing we have
is this blog post from
Borislav Stemrov
and he writes
about a bug in the C++ standard
although after his
initial post he got a lot of comments
and feedback and realized
that it's not a bug in the C++ standard itself
it was a bug in
I think Clang and GCC's implementation.
Spoilers, Rob.
Spoilers. I apologize.
No, I'm just kidding. It's not actually technically a bug in the implementation.
It's a missing feature in the Itanium C++ ABI that Clang and GCC follow for 64-bit builds.
Yeah.
But it was, I think, a really good example of the community
kind of helping someone out because he went into this whole thing
of what he thought was a bug and got all this nice feedback.
And people also pointed out some really nice ways
you can completely avoid this using C++17.
Yes, as usual, Lambdas come to the rescue.
I think I have like 12 episodes
of C++ Weekly on lambdas now
and like three more in the queue getting ready
to come up.
There's a lot that you can talk about
with lambdas and a lot of ways you can use them
that people don't necessarily think about.
Did you look
through this post, Gordon?
Yeah, I had to look at it.
It's very interesting, actually.
I think I had to look at it quite carefully.
I think member function pointer template parameters
isn't something I've used a great deal.
But yeah, it's a very, very interesting use case.
Yeah, I think it's really good that people post blog posts like this,
because especially for quite niche use cases in the standard,
like this kind of thing,
it's good to kind of spread the knowledge
and help identify if there is a defect in the standard
or a bug in an implementation or something like that.
Well, it seemed like...
I'm sorry, go ahead.
No, that's okay.
It seemed like a great way to really get people's attention
and find out what the actual answer is.
A bug in the standard!
It's kind of like saying,
I bet no one can solve this problem,
and then waiting to see who solves the problem for you.
Yeah, that's true.
I'm not sure if that was his intention, but it worked out quite well for him.
Yeah, I don't think it was, but it definitely worked for him.
Yeah.
Okay, next thing we have is Synapse,
which is a new library that was submitted for boost review.
And this is like a signals library i think similar
to cute signals and slots right as far as i can tell and also similar to boost signal yeah or
boost slot what's it called uh that's a interesting point i forgot that there was a boost
signals library so how does this one differentiate to the other one? Do you know? There is, um, and I've completely lost it now,
but I had it up before we started the interview. There is at the bottom of the documentation,
a alternatives to synapse. There it is compared to signals too. Um, honestly, I'm having a hard
time really differentiating the use case differences between the two of them, having even read the chart.
I don't know, Gordon, do you have any insight on this?
To be honest, I've not actually used the Boost Signal library before.
I've not really used signaling libraries like this.
It looks really interesting, though.
One thing I thought was quite nice is that it seems to register the signals at compile time.
I'm not sure if that's something that Boost Signal
does. I've not
really looked into that.
I don't think you can do that with Boost Signals.
A couple of things stood out
to me. It does seem like if you really
want to do static initialization of things
then they have macros to help with that.
I don't love macros, but there's uses for them sometimes.
And it seems to require that your signal and slot mechanism
use shared pointer, which I'm still a little bit lost on also.
Hopefully I'm reading that wrong.
Yeah, I noticed that the examples used shared pointer.
I wasn't sure if that was a requirement or just a typical example
it says by default Synapse uses the following
BoostSmart pointer components
shared pointer, weak pointer, make shared and get deleter
so it kind of implies that it needs some sort of
shared pointer functionality
I guess I'm looking through the comparison
with BoostSignals 2 right now
and one comparison that they first point out is
with BoostSignals 2 they use a signal of type T
whereas Synapse uses a C function pointer type def
and maybe the reason for that is that they could work with C callback APIs
because that is another difference they highlight That's a good point, yeah That's probably the main driver for that is that they could work with C callback APIs because that is another difference they highlight.
That's a good point, yeah.
That's probably the main driver for that.
So if you want to work with a C library,
then you would want to use Synapse
because you couldn't use signals too, I guess.
I think the last time I had to directly deal with that
was when I was using P threads,
which I strongly avoid using these days since
thread gets standardized.
Yeah.
Okay, and the last thing I wanted to mention
was we had
several guests
on a couple months ago from
C++ London Uni, including
Tom Brezza, and he reached out
to me that they're going to be starting
a new set of courses.
And I believe the first course will be September 18th.
So if you heard about that and you want to get in on the beginning of a new cycle of courses,
then September 18th is the day to do it.
Okay.
So, Gordon, why don't we start off by talking about Sickle.
Can you give us an intro to what SYCL is?
Yeah, sure. So, SYCL is an open standard from the Kronos Group,
which essentially defines a single-source C++ programming model for programming heterogeneous systems.
So, that's systems with non-CPU devices like a GPU,
FPGA, DSP, other kind
of accelerators.
So for those who
aren't familiar with the Kronos group,
they're essentially a consortium
of companies similar to ISO
C++ Committee, which
essentially work together to publish these open
standards that programmers
can develop to
and companies can essentially implement
and gain conformance of the standard.
So most people will be familiar with OpenGL and Vulkan.
They're both standards from the Kronos Group.
Another standard from the Kronos Group is OpenCL.
This is similar to OpenGL,
but it's a programming model for heterogeneous systems for doing general-purpose compute on GPUs, FPGAs, etc.
However, OpenCL is a C API with a C programming language.
It's also a separate source.
So you have your host application code and then your device code,
which is run on a GPU or another device, separate.
And then the host API is then used to load in that source,
whether it be a string or another external file, and compile it and lower it
down to some instruction set for that device online as part of your application.
So it's a very low-level API.
One of the things people find with OpenCL is that it has quite a high barrier to entry. Even the
Hello World application can
take quite a lot of
code to get going, and
iterating over different approaches to
a problem can
take quite a lot of effort,
because there's a lot of API
calls, a lot of things you have to change if you want to do
something different.
So, the aim of SYCL is to...
It's another standard from the Kronos group, again.
So the SYCL aims to sort of bring together the portability of OpenCL,
so they built this ability to run code on different architectures
like GPUs and FPGs, et cetera,
but with a more modern C++ programming model.
So effectively, in SQL,
when you write your code,
it essentially allows you to write your code
that will execute on a GPU or another device
as a lambda or a function object
within the same application.
So this is the idea of single source.
So rather than having your separate host application code
and your device code,
it's all within the same code, your same application code.
This gives you a lot of nice features,
such as type safety across host and device code.
You can template instantiate across host and device.
And you can also have a much higher level C++ interface.
So, yeah, so SQL provides a more high level C++ interface. So, yeah, so SQL provides a more high-level C++ interface
on top of the portability of OpenCL,
but also provides some additional features.
So it has this concept of separating out the storage and access of data,
allowing SQL runtime to essentially perform data dependency analysis
in order to more effectively schedule work and optimize data locality and movement.
So, for example, when you describe the functions you want to run on a GPU, for example,
you can say whether you want different functions
to read or write from different memory,
and it can do a lot of clever optimizations through there.
So a lot of the kind of...
In traditional OpenCL, you'd have to spend a lot of boilerplate code
to do event management and data movement.
So it kind of automatically handles a lot of that for you,
which makes the iterative process of creating an application
for a GPU or another device a lot quicker.
So you said it provides a higher-level API compared to OpenCL, right?
Does it require OpenCL? Does it build upon OpenCL, right? Does it require OpenCL?
Does it build upon OpenCL?
Yes, so SitCycle is based on top of OpenCL.
So it requires, essentially,
it can run on any platform that supports OpenCL.
Okay.
And you also mentioned, like, you said in OpenCL,
Hello World has a lot of boilerplate.
And that made me wonder, what in the world exactly does a Hello World on a project like this look like?
What are you actually doing in a Hello World program?
The typical Hello World on something like a GPU is usually something like a vector add.
So it's like sort like taking two input arrays
and adding the values together
and assigning them to some output array.
But even for an application
like that, there's a lot of
work in essentially discovering
the topology of your systems or figuring out
what devices you have, configuring
a context and a queue for
executing work. And then you have to create the a context and a queue for executing work.
And then you have to create the buffer objects and call APIs for copying them to and from the device,
compiling your kernel.
So the device functions in OpenCL are referred to as kernels.
Okay. So how does that compare with SQL then?
So with SQL, what we aim to do is create a more high-level interface where all of the functionality that you'd have with OpenCL is still available,
but we have a lot of the defaults for the most common configurations.
So for example, you can go into a lot of detail querying the different
devices available on your system, configuring them in a very specific way if that's what
you want to do. We also have simple mechanisms for quickly picking a GPU. So for example,
if you're running on a system which only has a CPU and a GPU, you can create these function objects called selectors, which effectively say, I want a
GPU and I want a CPU. And they can be effectively used just to create, effectively in one line,
create a queue that can run work on that device.
Okay.
But we have the more flexible options for the more complex configurations if that's what you need.
So what kind of platforms are supported by SYCL and OpenCL?
Is it basically anything that has a modern GPU or do you need specialized hardware?
So essentially SYCL can run on any platform that supports OpenCL.
The list of different platforms that supports OpenCL. The list
of different platforms that supports OpenCL
is very long now.
Everything from desktop, mobile,
embedded platforms,
HPC.
The list of platforms
that don't support OpenCL is a lot
shorter.
Generally, for portability,
there's another standard from the Khronos group called SPIR. It's the standard portable intermediate representation. It's essentially a portable IR that can be
used to, essentially what you do is you compile them to this IR and it can make a lot easier. So you'd ship that IR rather than shipping source.
So that could be used.
It's a lot easier for platforms to support the standard intermediate representation
because that's then just lowered down to some platform-specific instruction set.
But the SQL doesn't mandate that you have to use SPIR.
Essentially, SQL will work on any binary format
that is supported by an OpenCL implementation.
So, for example, NVIDIA's OpenCL implementation
supports PTX, their binary format.
So you can have a SYCL implementation
which compiles to PTX and then runs an OpenCL through that.
So it's very flexible.
So what is your specific involvement
with this SYCL project?
So I've actually been involved in SYCL
from almost the very beginning.
So that's almost six years ago now.
Wow.
Back then it was actually going by a different name.
It was the OpenCL high-level model.
Okay.
So then, yeah, so the group for what would become SYCL was formed about six years ago.
At the time, Codeplay was working on a version of our offload compiler,
which was essentially...
That's not the name of it. Offload compiler?
Yeah, that was the name of our compiler at the time.
Awful Compiler?
No, Offload.
Oh, Offload.
Oh, okay.
I also heard Awful.
Yeah, so we were working on a version of our Offload Compiler tick,
which would essentially do single-source compilation
for OpenCL.
So this was very suitable to what the Kronos group wanted.
So we proposed this as a way of achieving this kind of high-level programming model for OpenCL.
So over the years, I've been working on implementing Codeplay's implementation. It's called ComputeCPP.
And contributing to the
standards document and the
conformance test suite
up until now.
I often,
well, I wouldn't say often,
it does come up where
the C++ Standards Committee will
approve something and then the implementers
that we know and tweet often will say,
I just tried to implement this thing and found out it's impossible.
Have you done that?
Have you put something in the spec yourself
that you then realized was impossible to implement?
So we try to avoid that by prototyping everything
so everything that we propose for the standard
we tend to
prototype and test and make sure that
it works well
although it's not
it's impossible to
avoid any
to catch everything
there's always some issues that kind of sneak through
so there have been times where we some issues that kind of sneak through.
So there have been times where we've had to kind of rethink things slightly to improve something that didn't work as well as we intended.
It was more difficult in the beginning because when you're creating a standard from scratch,
you have to implement the entire thing, so it's very fast-moving.
In the beginning, things did change
quite a lot. Before the
first version of SYCL, things were changing a lot
because we were iterating over it, trying to find
the best solutions.
But now that
the standards is
out, it's not changing.
We're introducing new features to
improve on it, but
everything's been prototyped
to make sure that it's implementable.
Okay.
I wanted to interrupt the
discussion for just a moment to bring you a word from our
sponsors. PVS Studio
is a static code analyzer that can find bugs
in the source code of C, C++,
and C-sharp programs.
The analyzer is able to classify errors according to the common weakness enumeration,
so it can help remove many defects in code before they become vulnerabilities.
The analyzer uses many special techniques to identify even the most complex and non-obvious
bugs. For example, data flow analysis mechanisms, which is implemented in the tool,
has detected a very interesting bug in the protobuff project. You can read about this bug in the article February 31st. The link is under the
description of this podcast episode. The analyzer works under Windows, Linux, and macOS environments.
PVS Studio supports analysis of projects that are intended to be built by such common compilers as
Visual C++, GCC, Clang, ARM compiler, and so on.
The analyzer supports different usage scenarios.
For example, you can integrate it with Visual Studio and with SonarCube.
The blame notifier tool can notify developers about errors that PVS Studio detects during night run by mail.
You can get acquainted with PVS Studio features in detail on viva64.com.
So we had, as we mentioned,
7Brand on recently,
but we also had him on about a year ago,
back in May.
I think we talked about Sickle a little bit then.
What's changed over the past year with Sickle?
So the biggest change since then is that there's a new version of Sickle,
Sickle 1.2.1.
So this introduces some new features to essentially better support
some of the ecosystem projects that we have with SYCL,
things like Eigen and TensorFlow,
and things like Blast libraries.
Because when we were,
while implementing ComputeCPP, our implementation,
we effectively had an evaluator program
where selected people would get early access
to our pre-alpha version of ComputeCPP
so that they could give feedback.
And a lot of that feedback helped add improvements
and additions to the standard.
So the second 1.2.1 came out
I think shortly after Simon was on.
It wasn't too long
ago, only a couple of months ago.
So what is the
future direction for SYCL
that you're working on?
So future direction,
we're aiming to
integrate SYCL
even closer with C++.
While we're standardizing SYCL, C++ is moving forward.
We want to bring SYCL up to date with some more modern C++ features, C++ 17 and eventually C++ 20. and we'll also continue to look to introduce new features
to help improve the way people create their workflows
and how they can be applied to different architectures.
So we're always looking for ways to try and improve things
and make things easier for developers to use.
We want to aim to have a very close alignment with the C++ standards.
So if there's something that
the C++ standard is introducing
that we would use in SYCL
rather than standardizing something entirely new
that does effectively the same thing
we would use the features of C++.
We don't want to reinvent the wheel.
When we see the C++ standard
moving in a particular direction
we'll try to
move along with that so we don't want to be going
against
the direction of the standard
for C++.
Since you mentioned that then, what
features from C++ 17 or C++
20 are currently affecting SQL?
How do you mean affecting?
Well, things that you want to take into account,
you want to use instead of standardizing yourself, whatever.
So some features are useful just in terms of
improving what you can do in device code.
So things like generic lambdas,
new things like templating lambdas, that would be quite interesting.
More support for different forms of templates.
But also, we want to align quite closely with the executors,
and also introducing things like futures and promises. I'll say that the C++ committee is still debating over
what different kind of futures will be used in C++.
So that's something we're watching very closely
to see how SQL will interact with that.
What kinds of futures?
That sounds very, I don't know, metaphysical or something at that rate.
Well, we've mentioned executors on the show, I don't know, several times, Rob,
but I feel like I don't personally have a very clear picture for what that will actually mean to the standard
and how it would affect a project like Sickle.
Can you go into that?
Yeah.
So there's quite a few different domains
that executors are aiming to support.
It's one of the reasons that standardizing them
has proven so difficult.
So from our perspective,
our domain is the open standard heterogeneity system,
so things like OpenCL
and SYCL.
There are a lot of other domains,
like networking, distributed
systems,
parallel algorithms,
asynchronous tasking.
So all of these different
ways of executing work and different
requirements have to be
unified into a single approach
so everyone can write their tasks
and the work they want to perform in a unified
way but still be able to take advantage of something like
an open sale platform but also
network devices
and that sort of thing,
and standard thread pools.
So you have some sort of tasks you want to perform,
and if I understand correctly,
you have some sort of requirements
for what kind of hardware needs to be used
for this task to be executed,
and then you pass that off to an executor
and you say, go make this thing happen?
Yeah, effectively, yeah.
So the current design focuses around this idea of properties
where effectively you can require different properties of an executor.
So things like whether it does one-way or two-way execution,
so whether it returns futures or not,
whether it's blocking or non-blocking,
how it maps to different threads or potentially GPU threads or some other device.
All of these kind of properties are properties of an executor.
So you can request different kind of properties
and then essentially if the implementation supports those,
that configuration will return an executor that can do that kind of work.
Okay.
So an implementation would provide sort of different types of executors, and those executors can be customized and adapted to work in different kinds of ways.
So now you said you contribute to both the Kronos group and ISO C++. Are you involved with the executors' work for C++? Yes, so I've been involved in the executor's work
for the last two years.
So when I joined, I think there had already been work going on
for about four years before that.
So there were quite a few proposals already in the works.
But when I joined, essentially the committee were looking for
to bring all the different proposals together
and come up with a unified design
that satisfied everyone's requirements.
So I was invited in
for the perspective of
open standard heterogeneous domain.
Okay.
So, yeah.
So how is that unification
work looking? Do you think it'll
make it?
It's been going very well.
So
whether or not it would make it into
C++ 20, you mean?
Yeah. Or ever.
So that's a very
interesting question.
The short answer is I don't really know. Yeah. Or ever. So that's a very interesting question. The short answer is I don't really know.
Okay.
So the last meeting at Rapperswil,
the new directions group made executors a priority for C++20,
and there's been some special joint LEWG and SG1 meetings scheduled at CppCon to work on this.
So everyone's working very hard to try and get to a final paper that could potentially be introduced to C++20.
But then the alternative is if it's not ready in time, it would likely go into TS, which would be targeted for C++23.
It depends on if we can make enough progress by the next meetings.
Right.
Do you plan on attending that special meeting that's going to be held during CppCon?
Yes, I'll be at the meeting at CppCon.
I'm not sure if I'll be at the following meeting at San Diego. I'll be at the CppCon? Yes, I'll be at the meeting at CppCon. I'm not sure if I'll be at the following meeting at San Diego.
I'll be at the CppCon.
Okay.
So, on the topic of
CppCon, you have a training
that you are going to be giving also
at the end of the conference, is that correct?
Yeah.
Myself and Michael Wong will be doing
a two-day class at CppCon on parallel programming
in modern C++ from CPU to GPU.
The aim of the course, the class is to essentially teach people the fundamentals of parallelism
and how they can apply them to CPU and GPU architectures and how to take advantage of different kinds of parallelism
and how to solve the kind of challenges you encounter along the way.
So it'll likely take a pattern-focused approach, so we'll be looking at different parallel
algorithms and how you would implement them, how you'd solve certain challenges
when trying to parallelize algorithms.
So the idea is to have people leave
with essentially the set of tools you need
to be able to go and parallelize code in your own projects.
So what kind of tools or C++ standards will you be teaching to?
Is it safe to assume SQL will be used?
Yes, so SQL will be used as part of it.
We're trying to focus mainly on existing standards,
so not using anything that's still in development.
So we'll be looking at C++ threads and also SYCL.
We'll probably touch on SIMD parallelism to some extent.
So there is a proposal for SIMD vectors in the upcoming Parallels MTS2,
but that's still in development, so we probably won't go into that.
There's no implementations of...
Well, there's an implementation of it, but there's no standard implementations yet.
So we probably won't go into that in too much detail.
Okay.
And then in addition to the class,
you're also going to be doing a talk with Michael?
Yes, so we'll be doing a talk on SYCL.
So the aim of the talk will be essentially covering
a lot of what I've been talking about today,
but in more detail,
looking at how SQL works
and the different features of it
and how you can use the different
features to
take advantage of
GPUs and other
heterogeneous devices and C++
applications.
Is this going to be a live coding
kind of demonstration?
I'm still working through exactly how the talk will go,
but I'm hoping to have the talk kind of follow
implementing some kind of parallel algorithm.
So I might try to do a kind of live demo
of running the final application,
but I probably won't do the whole thing live.
Okay.
I'm not sure if I'm that brave.
And I also wanted to ask,
you're working on a C++ standards proposal.
Is that correct?
Yes.
So one of the other proposals that I'm working on at C++ standards proposal. Is that correct? Yes. So one of the other proposals that
I'm working on at the moment
is to
introduce affinity
to C++.
So this is
essentially on the back of the
Unified Executors proposal.
But it's
kind of expanding on a bit more
to provide support for essentially
querying the topology of the system that you're on
and kind of information about different processors,
different memory regions,
and being able to essentially tie execution
and memory allocation to those tie execution and memory allocation
to those different processors and memory regions.
So it will allow you to do things like take advantage of NUMA systems
or distributed systems,
but also it will be looking more towards scaling up to heterogeneous systems as well.
So would you mind going into some more details for our listeners for what the advantage to tying affinity to a particular process to a particular system in a NUMA environment would be? If you write an application that, say, runs across multiple processors,
across a sort of a NUMA domain,
so a memory that's a single address space but distributed across multiple nodes,
if the system sort of randomly allocates memory,
you won't necessarily always be allocating memory
that is close to where the computation is being performed.
So you won't get the best performance because the latency in reading the memory
will not always be as fast as it can be.
So if you have an algorithm that sort of modifies data in a particular pattern,
you can tie the memory allocation to that pattern so that you're always accessing memory as
close to the process as possible. And also you can bind the processes, the different
executions to those processes so that you're always executing in the same place effectively
so you don't sort of jump over to another thread on a different core or different CPU node, and then have to recache or copy the memory over to another node and then recache.
So generally we try to write code that is cache-friendly and stays in the local CPU cache,
but you're talking about architectures where if the data you need isn't in your local CPU cache, and it might not even be in your local memory, it might actually be in a completely different computer across the network.
Yeah, that's right.
Which is like, we're talking hundreds of orders of magnitude slower than if it were right here next to you.
Yeah, exactly.
Okay. Yeah, so if you have to copy over across a bus, for example,
from one CPU node to another,
then that's going to be a huge amount slower
than accessing something on the local device.
Okay.
Okay, Gordon, is there anything else you wanted to go over
before we let you go?
There was one thing that I was going to mention about SYCL that
I kind of went past without
mentioning. One of the other
benefits of SYCL
that we like to
highlight is that
there are some other
heterogeneous programming models
things like CUDA or C++ AMP
or OpenMP, OpenACC
they all allow you to do some sort of heterogeneous execution in C++ code.
One of the things that we really aim to do with SYCL
is that all of these programming models that I mentioned
have some form of language extension or pragmas or attributes or macros
in order to define
the block of code that you want to execute on the device. It means you have to
introduce these changes to code so it's not standard C++ anymore.
Also these are generally tied to a particular platform. For example,
CUTECUDA is tied to NVIDIA. C++ AMP was only Microsoft
and AMD, I think.
So one of the
things we wanted to do with SYCL was make it
truly cross-platform, truly
standard. So all SYCL
code is entirely standard C++.
So
any SYCL applications you run, even if you
don't have an OpenCL
platform available, all OpenCL, all SYCL applications you run, even if you don't have an OpenCL platform available,
all SYCL implementations are required to have what's called a host device.
So it will compile down to standard C++ code
and run on essentially an emulated device
that matches the same execution and memory requirements of OpenCL
as you would have if you were running on an actual OpenCL device.
So this is very useful, particularly for debugging.
So if you're on a system where you don't have a debugger
for your GPU or your FPGA, for example,
it's very useful to be able to run in standard C++ code
the same way as you would on the device
in order to sort of debug your code.
It debugs your application, essentially.
So it's not running on the actual device,
but it debugs your application.
It sounds like even just for debugging,
this would also be good for R&D,
like you could keep continuing to work on your slow laptop
on the airplane or whatever
while you're trying to test out
how this code would look.
Yep, exactly, yeah.
Okay, great. Well, it's been great having you on the show
today, Gordon.
So there's one other thing I'd like to
mention, if that's possible.
Yeah, go ahead.
So as of,
so not as of today, but as of when
this episode airs, Codeplay will have released ComputeCPP 1.0, so the full implementation.
So this is actually the first fully conformant implementation of SYCL that is available.
And Codeplay also provides a community edition of ComputeCBP that can be downloaded for free.
So for that, we support multiple Windows and Linux platforms,
and that also supports Intel CPU, Intel GPU, ARM CPU, ARM Mali GPU,
and also we have experimental support for NVIDIA GPU through BTX.
And when is this actually going to be released?
So this will be released on Thursday, so tomorrow.
Okay.
Yeah, we'll be releasing this on Friday, so yeah, it'll be out.
So it'll be out by the time this airs, yeah.
Very good.
Okay.
As I was saying, Codeplay is recruiting, so if anyone's interested,
we have a lot of openings in compilers,
debuggers, runtimes, tools, etc.
It's a really great place to work,
and there's a lot of really interesting
and great people to work with.
It does sound like there's a lot of interesting
going on there.
And definitely great people to work with,
because we've talked to at least three of them.
They're all great. Okay, thanks so much for being on the show today, Gordon.
Okay. Thanks very much for having me. Thanks. Thanks. Bye.
Thanks so much for listening in as we chat about C++. I'd love to hear what you think of the
podcast. Please let me know if we're discussing the stuff you're interested in, or if you have
a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like
CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason
at Leftkiss on Twitter. And of course, you can find all that info and the show notes on the
podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.