Hardware-Conscious Data Processing (ST 2023) - tele-TASK - Task 1: SIMD Delta Scan
Episode Date: May 10, 2023...
Transcript
Discussion (0)
Yeah, I guess we can get going.
I'm not sure if anyone else is going to show up.
So, for all of those who don't know me,
I'm Lawrence. Together with Marcel,
I'm leading the exercises here.
So, today I'm going to give you
a quick introduction to the general setup
that we have for the programming exercise or programming tasks.
At the end of the session,
I'll give you a quick intro to the current task that is now
online.
There should, right about now, there should also be like an email coming around from Moodle
that it's now online, so you can click on the stuff later and fiddle around with it.
Yeah, this session will also probably not last full 90 minutes.
Okay, so the general task setup that we have, that we've used this now in a few iterations,
this works quite well, is that we use GitHub Classroom. So, basically, this is, everything
this is, for those of you who don't know this, it's a small wrapper that GitHub provides around
its own GitHub architecture format, whatever.
So what you'll need for this definitely
is you need a GitHub account.
If you don't have a GitHub account
or would like to create a new one or whatever,
then feel free to do this.
But we will be linking your name to a GitHub account.
So for whatever privacy reasons you would not like to do that,
then please create a new one.
We don't care what account you use.
In Moodle, there will be a link.
You can click on that.
That will take you to GitHub Classroom.
The first time you do this, it will
ask you to select the student name that you are, basically.
Because unfortunately, somehow Moodle, the integration
is broken right now with GitHub Classroom.
We couldn't fix it.
So now manually click and say, this is me.
Please make sure you're using the correct GitHub user
for this, because it makes it a lot easier than un-linking
and re-linking everything.
So then once you do that once, then we
have the mapping from GitHub user to your real name.
And we need that for grading for obvious reasons.
Because otherwise, if your GitHub user is xx123,
then it's very hard for us to figure out who's
behind that.
Once you accept that, you will get a private repository in our teaching organization.
And then basically from there on the flow is a regular Git workflow that you would know
within the GitHub ecosystem.
So you would develop that.
You work on the repository.
You develop locally.
You can test locally.
When you have something, you push something.
And then we have an automated test cycle that runs.
So we have a GitHub runner that will take your code.
It will test it.
We'll do some basic tests, some hidden tests,
and some benchmarks.
And then when that's done, we'll upload the results to a leaderboard that I'll show in a second.
And then you can kind of rank yourselves against that.
But I'll get to that in a second.
Okay, so the evaluation in general for this class.
So, for each, as I've mentioned before, there's no final exam or anything.
So 100% of your grade is determined based on the programming exercises and stuff around it.
So we have four programming tasks.
Each of them will have 20 points.
And then 20 points in the end will be awarded for like a small oral discussion we have with
you together, maybe like 15, 20 minutes.
We've randomly assigned you, each of you to one task that you can present or should present.
And we'll reach out to the individual people after the task is completed. And in this discussion, it's just more about you
explaining what you did in your code.
It's kind of for us to figure out
if you did this yourself or if ChatGPT did this for you.
And if you're kind of understanding
the theoretical concepts behind it.
So why maybe is something better or worse than something else?
And kind of just getting understanding.
It's just a discussion.
Within each task that you can submit,
we have the basic structure as follows.
We have basic tests.
We have advanced tests.
We have baseline performance and optimized baseline performance.
So for passing the basic tests, you get five points out of 20,
which we consider this is like the bare minimum. but also from the points you can see technically this is not
enough to pass the single task and then we say once you have the advanced test
pass this is a passing grade this means you you've implemented it in a way that
is correct but we're kind of also looking at like the hardware and
performance here so we kind of want we kind of want to encourage you to also spend some time on baseline performance optimization.
And so the final five points.
So if you parcel test, you have 75%.
That's a percent of the points.
And the last 25% will then be given if you manage to beat some baselines.
So we kind of also want to tease you a bit to implement and tweak here.
And this worked out very well last year.
So, we're doing it again.
This year or the same as last year,
but this year I think it's going to be a bit more interesting.
We have a leaderboard bonus.
So, the fastest three solutions and
fastest in this sense will be mainly runtime.
For one or the other task, we might
include something like memory consumption.
We did that last year once.
But basically, there's one metric.
And the three students with the highest points in this metric
will receive bonus points.
The first person will get three, second two,
and the third will get one point.
And now, as I think there are 21 active participants this year,
there's actually going to be a bit more of a challenge.
Last year, there were eight.
So then it was a lot easier to get bonus points.
So this year, maybe that's a kind of,
we're hoping for a bit of a challenge here
that students also feel like digging into this.
Oh, yeah, OK. I kind of gave this away. There's a each of you will present or discuss one task with us. This is just going to be like 15, 20 minutes. We need to
figure out exactly depending on schedule and these kind of things. And they'll be the remaining
20% of the grade. And you can see the grouping in Moodle is online already. Yeah, and it's random, so I hope no one can complain.
If something if we need to shift something, this is also possible.
But this should not be a case of like, I'd rather present this task than that task.
There should be like valid reasons for not doing this.
Yes.
You can also see the list here
again chosen at random.
Yes, just a bit of expectations. I gave this away at the beginning a bit.
So, under the assumption that we hope
that all of you will do well in this oral discussion.
This is more of a sanity check,
kind of checking that you understand.
So we kind of expect you to get higher grades there.
If someone fails on the oral discussion, that's probably a very bad sign.
And so basically having the passing the advanced tests will get you something in the 2.x grade
range.
Having the baseline performance will get you somewhere in the lower part of the 1.X grade range, having the baseline performance
will get you somewhere in the lower part of the 1.7-ish.
And then we're basically saying, if you pass all the optimized
baselines and you have a good orgs on,
then that's how you get a 1.0.
Because then that means you've basically
written code that's very efficient,
can do what we ask you to do very efficiently. So, yeah, that's the target here for the grades.
We just want to be upfront about this so people can see what
the expectations is or where you want to see yourself in here,
and what the path to that might be.
Then obviously, there's bonus points so you can shift
a few things a bit if you're really into that.
Yes. Okay. So, as I said, there's no exam. So,
we are kind of treating the programming task like an exam. Because this is based on your
grade is based on this, there are some basic rules that apply. So, yeah, basically, every
task is an individual solution. They're not group assignments, so you may not submit the same code twice.
It kind of says at the bottom, you can discuss the task,
and there's no way we're going to stop you
from doing that anyway.
But please just don't share your code.
We have some checks that we run on the code
to see if it's the same code.
And this is smarter than just checking
you have the same variable names.
And we've run into issues with this
in previous iterations of other courses.
It's not a lot of fun if we can see people have obviously
copied or submitted the same code
and just changed a few names.
That's kind of awkward for everyone.
And we're going to be pretty strict on this,
purely for the fact that this is how
you're greatest determined.
So basically, anything that we consider cheating
will lead you to fail the course,
like cheating would fail an exam.
So, as I said, we have
a somewhat elaborate setup
for the tests and hidden tests and benchmarks.
And basically because Git
doesn't really allow us to do this any differently,
there's a cmakeList.txt
in your repository at all times.
You may not modify this.
Because in it, we're basically setting up the environment and there should be no need
for you to ever modify any of this.
We kind of have constraints in there.
For the first task,
for example, we're setting specific compile flags for different files. If you modify this,
then we kind of have to consider the cheating. The same also holds for the there's a GitHub
actions YAML file in the repository. The same goes for that. And even more for that,
there should be exactly zero reasons for you to touch this.
And this has secrets in it and certain commands and whatnot.
We cannot hide them from you.
We don't need to hide them from you.
Our setup is kind of built in a way
that unless you really try to break something, you won't.
So please just don't break anything.
Because if you do, then we consider this cheating.
Because this setup only works if we can trust you.
So obviously, if it's an honest mistake,
then that's not an issue.
But there's a big distinction between an honest mistake
and trying to break something on purpose.
Please don't try to extract any hidden information, anything
like this.
Again, we have a basic setup that
prevents you from doing this.
If we figure out that you're doing this, then that's bad.
One very important thing is you're only
going to be working on a main branch.
And for you, there's basically no reason to do anything else.
You don't need to create another branch.
We don't really care about what you're doing in the setup.
For you, it probably even is better
if we can see a somewhat random history,
because then we know you've been working on this and there's progress and it's not suddenly
one commit and perfect solution. That's always tricky. But very important, please do not
force push any changes to the main branch, because we track the commit hashes for the
benchmark runs. And if you force push something and the commit hashes change, then we can't see the history
that gave you the results.
And that's bad because then, again, this is what we're giving you a grade on.
If we can't see where this grade is coming from, then that's a problem.
Yeah.
So just there's no need to force push.
Just don't.
You can just resubmit.
It doesn't matter.
For the first program task especially, the setup, the GitHub action setup is so fast, it doesn't really matter.
For some, it was a bit more complex,
and this would take maybe a minute to run.
Now I think we're in the order of less than 10 seconds.
So basically, it doesn't matter.
So just if you change something, push something.
But on the other hand, again, only push something
if you know you're actually going
to pass the basic test.
Because if you fail the basic test locally, you're going to fail them remotely.
And then, well, it's just going to build something and then fail.
Yes.
Remote development.
So, this is a bit from last year.
So not all students will have access to the required hardware.
Again, GPU and PMEM
will probably not happen this year for setup reasons that didn't work out too well last
year. But especially for the first task, also SIMD, not all of you will have access to Intel's
full SIMD suite. So this is SSE AVX2 and AVX512 instructions. Some of you might have, which
in that case is good for you because then you can work locally, which is a bit nicer.
But we provide you access to one of our group servers on which you have all of this set
up.
So basically if we require special hardware, we'll make sure you have access to it.
This is exactly the same hardware as the evaluation server.
So the development server is a dual socket system,
and the evaluation server only has one socket,
but the CPU and memory and everything is kind of the same.
And this for the first task,
most importantly the CPU is exactly the same.
Memory is not going to be an issue here.
Again, shared resource, please don't break.
We changed the setup a bit from last year.
That means this year,
every student will have their own Docker container.
After this task is online,
I'll sit down and I'll send each of you
an initial combination of username,
random password, and the fixed SSH port.
You can then log into that server.
Then for example, you can use CLion or VS Code to do
remote development
on there and it's your own kind of environment which makes it a bit easier to we ran into
some file permission issues with the shared setup from last year so this year everyone
will have their own so maybe something will break on there and it's the first time we're
trying this please just let us know if something
does break. We tried it manually a few times. It looked okay. But you never really know
until something scales to more than one person. Yes. So, again, after the session here, after
lunch, I'll send you the information. Then you can log into the server. And then this
is the standard SSH connection into an Ubuntu Docker.
And then you can fiddle around.
You don't have Zulu access in the Docker containers.
You probably also don't need it.
We have some basic stuff installed.
So there's vim, neo, vim, emacs are installed.
Nano is installed.
You can do VS Code.
You can do CLion for development
or whatever you like.
If there's something that a few people are interested in setup-wise, we can also just
add that to the Docker container for everyone.
But if it gets too complicated or esoteric, then we probably just won't do it because
it's not worth it.
But again, that really depends on how you develop and how your setup is.
One very important thing.
So this is something new we're going to test.
Hopefully this is not going to happen.
It didn't happen last year.
The Docker containers, should they crash for whatever reason,
anything that is just in this Docker container will be lost.
So make sure you don't, if you actually do this,
so for example, if you work in C line or VS code,
then you always have your code locally on your laptop
and it's just pushed to the remote server,
so then you're fine.
If you, for example, just develop with Vim
in the Docker container,
then make sure you do regular pushes to GitHub
because again, from the setup, it's
quite tricky to persist all of these images
across all volumes and whatnot.
So I've opted to not do this.
If we need to, we'll maybe have to figure out something.
But for now, hopefully this works.
Again, there shouldn't be any reason for these Docker images
to ever stop.
And again, most people will probably have code locally somewhere anyway.
But again, we need to figure this out the first time we're trying to set up.
But yeah, hopefully this works.
That should be fine.
Okay, yes, the evaluation server, just for your information, basically, this is the CPU we have in there.
It's the same as in the development server.
Important for the very first task now,
especially at Cascade Lake Intel CPU,
because this kind of gives you the information about AVX 512
capabilities, so SIMD capabilities.
But in the server, it's running in Ubuntu 20.
We wanted to update this, but basically because of how GitHub Docker or GitHub actions runners
work, it was a bit tricky to do this because GitHub actions kind of somewhat supports it.
It's tricky.
We're running everything in GCC 11 because that's the standard install we can get in
Ubuntu 20.
So we don't have to do anything complex there.
So yeah, you can basically, that's the version we have.
Feel free to use whatever you need in it.
But again, you're not going to need super fancy C++ constructs for anything.
Everything's run in Docker.
Exactly.
There's only a single instance on the evaluation server. So we're not going
to run into any issues of like, oh, my benchmark numbers are bad because someone else was also
benchmarking at the same time. This benchmark run or the GitHub run is the only thing running
on this server for the entire time. Again, for the first task, not super important. But
really only push changes when you feel like you need to.
So again, you can push a lot for the task.
It doesn't really matter because the queue is quick.
But some tasks, again, if they take a bit longer,
there's a bit more setup, then this
gets expensive very quickly.
And then you have an hour-long backlog, which in itself
is also fine, but it might be a bit annoying for you
to wait forever to get feedback on what's happening.
OK, fun part.
There is a leaderboard.
So this is kind of a gamification aspect.
I don't know if any of you have participated
in something like competitive programming
at the Algorithm Engineering Chair.
When I did this many years ago,
they also had something like this and we've also copied this
from a course at the TU Darmstadt and adapted it a bit.
So, last year this worked out quite well.
It's a nice little challenge
for you to beat the baseline and each other.
I'll quickly just go through the layout.
This is the URL.
You can only reach this from the HPI network.
VPN is also fine.
But this obviously will not work from outside.
Again, OK.
So we have anonymous usernames.
There will always be a user called baseline,
which is the baseline.
So you can see our I'll get to that in a second.
And then the other ones are, I don't know, hardworking lobster, willing bird, whatever.
It's just a random combination of adjectives and birds.
Animals, sorry.
This will be assigned to you once.
The very first time your GitHub user enters this leaderboard, and it will stay the same for the entire semester.
The first time, I've tried to think of a nice way
how to do this, but unfortunately, I haven't really
figured out a better way.
The first time you pass the basic tests,
it will say in the GitHub actions log,
it will say this is your user, this is your anonymous name.
All of this is also documented in the task description,
so you don't have to memorize this.
Then basically, once you're hardworking lobster,
you're hardworking lobster for the semester.
If there's a very unfortunate combination
that people are not comfortable with, we can change this.
I think we've removed a few names.
I think there was something with
weird adjectives and rats and stuff,
that's not an ideal case.
So we've removed that.
But other than that, they should be rather harmless.
OK, yes, then we have a few columns here.
Oh, let's start at the top.
This is the name, obviously.
Here, start and end date, so you can always
check what's going on if
there are no active tasks then this the first page will be empty but on the top
you don't see this here there's kind of like a little archive box here and then
you can click on the past tasks so in the first column basic test this is just
an indication have you passed the basic test and this will basically always be
set to true because you're not on the leaderboard until you've passed the
basic tests the advanced test again willboard until you've passed the basic tests.
The advanced tests, again, will indicate if you've passed the advanced tests.
Then the baseline indicates if you've passed the baseline performance.
And the optimized baseline indicates if you've passed the optimized baseline performance.
And then the next two columns kind of mean two things.
So for you, as a student submitting something, the last run time will tell you this is the run time or the performance value you've determined for your last run.
And then the best run time will be the best time you've achieved.
So here you can see this is the best time.
The last time was a bit slower, but it's still better than the baseline.
So yeah, all is good.
And once you've passed the baseline, you'll never go back.
So it's like a, because then you have one solution that passes this,
and this is the one we track, and we track this internally.
So whatever, basically the commit that achieved this, we store this.
And if we can't retrieve that afterwards because you've done something weird
to your repository, that's bad.
For the baseline, which is why it's
a double meaning the baseline,
this is the time you need to beat.
So, for the baseline value here,
in this case, it would need to be 4,400,
and the optimized baseline would be 2,050.
Those are the values that you should beat.
In this case, we did because I think hardworking
lots in this case was me running my own code.
So you can also see there's a bit of the optimized base.
Like it's somewhat the performance that I've achieved minus a bit of leeway.
So we kind of discuss internally what we think is appropriate here.
And this in theory can also be changed.
If we realize this doesn't work out,
or we've miscalculated something,
then we can adopt the baseline,
and then we will need to see.
So, if we have the optimized baseline
and exactly zero people achieve that,
then we've probably done something wrong on our end.
Meaning that maybe it
was a bit unclear, the expectations were
a bit off.
So far, this has all worked out very well.
So far, we've always had a healthy number of people passing the optimized baseline.
But again, just from the expected grade range, we do not expect everyone to pass the optimized
baseline.
Okay.
Okay.
Any questions to the general setup?
Yes?
How do we know what the events checks do?
Okay.
So good question.
So we have basically three test files in each setup.
We have a basic test test which is what you get
So this is basically a very small test set of tests you get this in the repository. You can see this
Then in it it says like I know it's like for the first task
It's like a few tests very small examples. You can run this and you can also modify this locally
Because on the CI server we always use a fresh version
So here you can you can add your own tests. So here, you can add your own tests.
And we actually encourage you to add your own tests
because there's a high chance that just passing
the simple tests will not mean you're
going to pass the advanced tests because, again, they're
basic and advanced.
Then we have hidden advanced tests.
You will not see those.
They are only available on the CI server.
When we run them, you'll have a standard Google test.
We use Google tests for this.
So you have the standard Google test,
pass, pass, fail, whatever.
In the basic tests, you get the full output and what's expected.
In the hidden test, this is hidden because it's hidden.
But we try to provide you with
error messages that indicate what went wrong.
So in the very first iteration,
we had something like test failed and people like,
I don't really know why.
So we're trying to give you some guidance onto,
okay, maybe think about this aspect or think about
this input range to guide you along how to do this.
Because the test names,
it depends on the setup.
I think for this one is just advanced test 1, 2, 3, 4, 5. So we're trying to the test names, it depends on the setup. I think for this one, it's just advanced test 1, 2, 3, 4, 5.
So the error messages are trying to give you some insights
into what's going wrong.
And that worked quite well last year.
So we're kind of trying to stick with that.
And then we have a benchmark.
Again, that's also hidden.
And that just, so in the task description,
we kind of say what we're running, how it's run.
And then also we provide some output there.
So, for this one, you need to implement a few methods,
and you can see the runtime with each method,
and how we determine the final value.
But again, we're not really telling you what's happening
internally because, well,
otherwise you might be able to optimize
explicitly for this benchmark which we don't want.
Does that answer the question?
Any other questions?
Okay.
Then I'll move on to this task.
So the first task you're going to implement is a delta column scan on compressed data using SIMD.
So we started with the SIMD in the lecture yesterday.
And due to a bit of timing conflicts, unfortunately, the second SIMD lecture isn't until next Tuesday.
But I hope that shouldn't be too much of an issue.
So basically, I'll get to some details in the beginning.
Basically, you need to implement a decompress method.
So we have compressed input. You you need to implement the decompress method. So we have compressed input.
You now need to implement the decompress.
That just takes all the values and decompresses them.
And then you need to implement a scan method, which
decompresses the values, checks if they match a predicate.
So let's say if they are smaller than 10.
And if they're smaller than 10, then you write them to an output.
Again, there's a way more detailed task description
rule.
This is only a brief introduction,
because you can read the document yourself.
I don't need to read it for you.
The deadline for this is Monday the 29th.
So yeah, midnight.
Please make sure to submit before this. There is a tiny grace period. So if you submit something just, midnight, please make sure to submit before this.
There is a tiny grace period.
So if you submit something just after midnight, it's also fine.
But please don't do your first commit on the 30th at 1 in the morning
and be like, sorry, I'm late, which is why, again, I said,
if you have a longer history, we can kind of see you've been working on this for a while.
I know how deadlines are at the end.
They get a bit tight.
That's fine.
But please don't just start two hours before the deadline
and then complain that the deadline was too early,
because that's not fair to everyone else.
So that's the deadline.
OK.
The first task.
So we're going gonna do delta compression.
So, this is, yeah,
kind of widely used compression technique,
which basically what it does is if I have a series of numbers,
I always just store the difference to the number before.
So, I'll get to this picture in a second.
It's a bit complex for what it is.
I couldn't find a nice way to represent this.
So in this task specifically, we have 32-bit unsigned integer values.
That's our input column.
And then what we do is we basically, for each value at position i,
we just store what is the difference to the value at i minus 1.
So we can see this here.
The first value is 10.
The second value, oh, yeah, okay. I'll get to that in a second. Let's start from the right just because I'll get to the we can see this here. The first value is 10. The second value, okay, I'll get to that.
Let's start from the right just because I'll get to the start value in a second.
So this value is 24,
and we see the difference to the value before is 15,
so the output is 15.
Here we have 9.
The difference to the value before is minus 3,
so we store minus 3,
then 12 to 10 is 2.
And then because the first value obviously doesn't have a previous value, we also
need to store a start value.
And the start value is always just the first value
of the input.
That means also the first delta is always 0.
So that's basically it.
Now if we go from the other direction,
we can see 12 minus 10 is 2, 9 minus 12 is minus 3,
and 24 minus 9 is 15.
And that's how we get this.
And the main reason we're doing this is because we expect the deltas to be small.
So in a database, for example, you would have 32-bit or 64-bit values.
But let's say it's just an increasing range of numbers.
In the best case, let's just say it's 1, 2, 3, 4, 5, 6, 7.
Then the delta is only ever 1.
And then we can store 1 very concisely.
In this case, we wouldn't even need a bit,
we could store this differently.
But let's say it's always only 2 bits,
then we only need to store 2 bits instead of
the entire 32-bit or 64-bit values.
In this programming task,
the diff or the delta here is 8-bit signed integer.
That means instead of storing 4 bytes, we only need to store 1 byte, programming task, the diff or the delta here is 8-bit signed integer.
That means instead of storing 4 bytes, we only need to store 1 byte, which means we
have a reduction of 3 quarters.
Having 8-byte signed integer also means the value range is minus 28 to plus 27.
So the diff between two values next to each other, two consecutive values in the array is never going to be smaller than 128 or larger than
120 plus 127. So the value range you can fit in an 8-bit signed integer.
Okay, decompression basically works the same way around, or the same way just the other way around.
Let's say we take the start value, which is 10 here, then we take the output,
the compressed values here, which is 0, so we do 10 plus 0 equals 2, then we have a new
start value, so 10 plus 2 equals 12, 12 plus minus 3 is 9, and 9 plus 15 is 24.
So that's basically the concept of how this works.
You're just going through and it's always looking at the previous value and the diff
to the previous value, then the next value and the next value and the next value and always just a diff to the
previous one.
So interesting this is something that's actually not very well suited for SIMD processing,
which also makes it a bit challenging because there's a lot of dependencies within.
So if you had this in one SIMD register, so there's four values here,
you can see there's obviously dependencies
between the values here.
But this also makes it a bit
challenging to think of this.
So, what we basically say here or giving you a hint on how to
start with this is to think about how can you do
some horizontal additions in and around your vector registers,
and how can you maybe combine and shift entire registers.
If you kind of get started with this,
and there's also a bit of a description in the task description on this,
we are aware that at the beginning this might be a bit overwhelming,
and this kind of works as
intended. Understanding SIMD code is quite complex. There's a lot of documentation around
it and for us we think as like the teaching aspect it's very important to be able to understand
this and kind of navigate documentation around it, understand what's actually happening there.
So there's a very, very good guide from Intel that's very detailed, has a lot of descriptions
around what's going on.
But it takes a while to understand the names,
to understand what they do, how to navigate this.
And we consider this part of what you're supposed
to achieve during the programming task,
is to understand and navigate this.
So here in the small example about combining
and horizontal shifting.
So basically what we have, if we look at our input values here, if we let's say we want to add we have a
starting value 10 here so we can just use a register where all lanes are 10
and now we can see we can do some arithmetics here somehow that's for you
to figure out is where basically the first value is just 0 then the second
value is these two added up and then this one is these two value added up, and this one is all four values added up.
And then we can just do one addition here with the register, and then we get the same output
that we expect here. But instead of doing this in this kind of chain fashion,
we've done something to create a SIMD register where all values represent something.
Then we can do one addition with the base value.
And then we get all our output.
Again, that's the task.
That's kind of trying to figure out how this works.
So yes, as I mentioned at the beginning,
you need to implement two methods, or eight methods
in total, but two different types of methods.
One is the decompress method.
The decompress method, what you do is you take this here
and this value, so the blue values here,
and your task is to just output the red values.
And the second one is to implement the scan method, which
does the same thing internally, but instead
of returning all values, you also have to do a filter.
So a filter would be like, please just return all values that are larger than 15.
So then in the output you would only have 24 and these values you would not store.
Two types of methods.
And you have to implement each of these four times.
The motivation behind this is, I wrote in the task description that your boss is evil
and makes you implement this multiple times.
And in this case, your boss is me.
But on a more serious note, so this is actually also some active research we've been working
on and kind of looking into these things.
Developing this for different platforms is actually quite tricky.
So let's say you're actually a database vendor or you want to write code that runs efficiently on different platforms,
then you don't know which platforms you're running on, then you might not
have AVX 512 with a high probability or like for x86 you're guaranteed to have
SSE, so the Scala one is just very basic for you to get the understanding of
what's going on. That should take only a few minutes. SSE is guaranteed on all x86 servers. But realistically, if you run something
in the cloud, let's say an AWS, you don't know 100% sure, depending on how your setup
is, if you get AVX 512 or if you only have AVX 2. Most laptops, if you have Intel laptops,
will only have AVX 2 and not AVX512 because Intel killed this for consumer hardware.
So it kind of gives you an insight into, okay, what do I need to do if I actually have to support this on multiple platforms?
And this is only Intel.
Nowadays, a lot of people are starting to move their cloud stuff to ARM, to like the Graviton instances, for example, and AWS.
And that now means you would have to implement this also
in a completely different SIMD instruction set
for different CPU architecture.
So this kind of gives you an idea of, OK,
what are the limitations and the advantages of certain SIMD
instruction sets?
Most specific or most notably, you'll
see that AVX-512 is a huge addition
and does a lot of things and also implements a lot
of instructions again for smaller instruction sets, so for like 128 bits. But then again
we're kind of forcing you not to use those because we use specific compiler flags to
tell the compiler you can assume this or you can assume that. So yeah, after a while you kind of figure out that each of these has their
own limitations. So yeah, there's some limitations in AVX2 for example, regarding like cross
register stuff. AVX512 can do all the fancy things, but has some other disadvantages.
I'm not sure if we covered this in the lecture, but most notably, ABX 512 on most CPUs will
cause a throttling of the CPU clock frequency
because the CPU otherwise gets too hot.
So, yeah, again, each of these has
some ups and downs, you need to figure this out.
The compressed method is implemented by us,
you don't need to deal with that.
Hint, this is probably a very good starting point to think about the scalar implementation
because it's just the reverse.
Again, the scalar is just for you to kind of do a sanity check that you've understood
what this actually is.
This is covered in the tests.
We test.
I'm not sure if we...
We test all four implementations in the tests, in the basic tests and in the
advanced tests.
So, you need to have a correct implementation for all versions for the advanced test to
pass.
And when we hit the benchmark, we will also run all versions, but we'll only pick the
fastest.
In my code, AVX512 is the fastest.
So, there's a good chance that this will also be
the fastest for you depending on how you implement it.
But it's also towards the lower end.
It's going to be interesting for you to observe
the performance differences between Scalar,
SSE for example, what changes to AVX2 have in there,
and then what can we do on top with AVX512.
Just a few random assumptions again all in the task description. You can assume a few basic
things so we don't really care about handling edge cases that much because it doesn't like
help a lot with the understanding. So generally for anything you do with SIMD I hope this was
also covered in the lecture already. Generally let's say we have a SIMD register I hope this was also covered in the lecture already. Generally, let's say we have a SIMD
register with four values in it, but if my input is not divisible by four, then obviously I need
to have a small tail that covers the last few values. In this case, we don't want you to
implement that. So we're saying all input is divisible by 16. You can just assume that. You
can assume that the input and output buffers you're reading from and writing to are always aligned to 64 bytes.
This is very helpful if you're using something like AVX 512, which is also 64 bytes.
And the output has 64 byte padding at the end, which allows you to, if you want, to write behind the end of the output buffer,
because otherwise,
that would technically be undefined behavior,
and we don't want that.
So, this is something that you would
commonly do in these setups.
Again, we encourage you to really write
more tests in the basic test CPP file.
We cover very basic ones.
This is something that is very easily to test manually.
You can come up with different edge cases,
different implementations. You can more or with different edge cases, different implementations.
You can more or less just copy the code
that we have for some basic tests
and extend them to cover some ranges yourself.
And whatever's in there, we really don't care.
We throw that away.
We don't really even look at it.
And that's just for you.
On the server, we always have fresh files
that we use that just cover the basic tests.
So if the plain basic tests that you have locally run,
they should also run on the server.
If they don't, then that's going to be an interesting problem
that's probably related to setup.
There are quite a few files here because we
have different files for each of the SSE scalar AVX2
and AVX512 versions.
And you should only modify code in a file that
starts with SIMD underscore, delta underscore, which
is exactly four files.
The rest, please just don't touch.
Or you can also obviously do the basic tests.
Please don't create any new files,
because that would require you to change the CMake lists file,
which we don't want you to do.
And there's also no need to do this.
So please just stay in the files that there are.
But again, there shouldn't be any need to do anything else.
Yes, we will discuss, actually, are there any questions to this so far?
Probably a lot and there's a lot more information
in the task description.
A lot of what I said also is in there, or all of it.
OK.
Now let me finish with the discussion that we'll have.
This will be the day after the deadline.
So generally, always we'll have the deadline at midnight
one day, and then the discussion will be the morning after in the lecture slot.
So, in this case, the Tuesday 11 AM lecture slot on
the 30th of May will be the discussion of task one,
and then we'll also introduce task two.
With the major difference today,
this is being recorded because we're also giving general information.
These other tasks discussions will not be recorded purely for the fact that as
soon as I discuss the solution with you I can't use this task ever again because
then it's on teletask forever and then the next year people will just come look
at the solution copy that and we're done which is not what we want so if you I
know there's some conflicts with with other lectures in this time slot.
If you cannot attend or you know someone who cannot attend,
you can always just come by or write us an email
and we can explain something and discuss how, yeah,
if there's something unclear,
or maybe you can discuss with your friends or fellow students
that were here and kind of
got the solution.
Yeah, not recorded.
We'll introduce the next task.
But the task introduction is always going to be kind of the level I did today.
The ground truth for anything will always be the PDF in Moodle with the task description.
That's basically what we rely on.
That's what we give you.
If we update it. We let you know
Yes, and So this year again
Three of the four programming tasks will be new and so last year was four out of four this year
It's three out of four. The first one is new. The second one is old and the other two are new
So if anything goes wrong if anything seems off if something seems like it should not work like this,
please let us know, there is a good chance that something is wrong.
The number of students has nearly gone up by 3x, so we expect some other setup related issues to pop up here and there.
We've gone from 8 to now 21 students doing this course, and these problems, they kind of scale linearly with random setup issues.
So if something feels wrong or seems wrong, just let us know.
Write us an email.
If something's generally unclear,
feel free to use the forum to discuss these things.
Maybe provide additional information in the forum.
Please don't post your code in the forum.
But again, we said, feel free to discuss something.
But maybe don't explicitly say, write this line of code
to do something.
That would kind of be too much.
If you're unclear about whether this is too much sharing,
just write us an email.
So you can write Marcel and me or Professor Rabl an email.
And we can figure out how to solve the issue.
And with that, I think we are good for today.
Any questions?
Okay, perfect.
Then thank you very much for your attention.
And I will see you, well, yeah, next lecture is 72.
That's on Tuesday.
And I'll see you probably on the 30th when we discuss the solution or if you have any questions in between.
Thanks.