Hardware-Conscious Data Processing (ST 2023) - tele-TASK - Task 1: SIMD Delta Scan

Starting point is 00:00:00 Yeah, I guess we can get going. I'm not sure if anyone else is going to show up. So, for all of those who don't know me, I'm Lawrence. Together with Marcel, I'm leading the exercises here. So, today I'm going to give you a quick introduction to the general setup that we have for the programming exercise or programming tasks.

Starting point is 00:00:19 At the end of the session, I'll give you a quick intro to the current task that is now online. There should, right about now, there should also be like an email coming around from Moodle that it's now online, so you can click on the stuff later and fiddle around with it. Yeah, this session will also probably not last full 90 minutes. Okay, so the general task setup that we have, that we've used this now in a few iterations, this works quite well, is that we use GitHub Classroom. So, basically, this is, everything

Starting point is 00:00:54 this is, for those of you who don't know this, it's a small wrapper that GitHub provides around its own GitHub architecture format, whatever. So what you'll need for this definitely is you need a GitHub account. If you don't have a GitHub account or would like to create a new one or whatever, then feel free to do this. But we will be linking your name to a GitHub account.

Starting point is 00:01:17 So for whatever privacy reasons you would not like to do that, then please create a new one. We don't care what account you use. In Moodle, there will be a link. You can click on that. That will take you to GitHub Classroom. The first time you do this, it will ask you to select the student name that you are, basically.

Starting point is 00:01:36 Because unfortunately, somehow Moodle, the integration is broken right now with GitHub Classroom. We couldn't fix it. So now manually click and say, this is me. Please make sure you're using the correct GitHub user for this, because it makes it a lot easier than un-linking and re-linking everything. So then once you do that once, then we

Starting point is 00:01:54 have the mapping from GitHub user to your real name. And we need that for grading for obvious reasons. Because otherwise, if your GitHub user is xx123, then it's very hard for us to figure out who's behind that. Once you accept that, you will get a private repository in our teaching organization. And then basically from there on the flow is a regular Git workflow that you would know within the GitHub ecosystem.

Starting point is 00:02:24 So you would develop that. You work on the repository. You develop locally. You can test locally. When you have something, you push something. And then we have an automated test cycle that runs. So we have a GitHub runner that will take your code. It will test it.

Starting point is 00:02:40 We'll do some basic tests, some hidden tests, and some benchmarks. And then when that's done, we'll upload the results to a leaderboard that I'll show in a second. And then you can kind of rank yourselves against that. But I'll get to that in a second. Okay, so the evaluation in general for this class. So, for each, as I've mentioned before, there's no final exam or anything. So 100% of your grade is determined based on the programming exercises and stuff around it.

Starting point is 00:03:22 So we have four programming tasks. Each of them will have 20 points. And then 20 points in the end will be awarded for like a small oral discussion we have with you together, maybe like 15, 20 minutes. We've randomly assigned you, each of you to one task that you can present or should present. And we'll reach out to the individual people after the task is completed. And in this discussion, it's just more about you explaining what you did in your code. It's kind of for us to figure out

Starting point is 00:03:51 if you did this yourself or if ChatGPT did this for you. And if you're kind of understanding the theoretical concepts behind it. So why maybe is something better or worse than something else? And kind of just getting understanding. It's just a discussion. Within each task that you can submit, we have the basic structure as follows.

Starting point is 00:04:12 We have basic tests. We have advanced tests. We have baseline performance and optimized baseline performance. So for passing the basic tests, you get five points out of 20, which we consider this is like the bare minimum. but also from the points you can see technically this is not enough to pass the single task and then we say once you have the advanced test pass this is a passing grade this means you you've implemented it in a way that is correct but we're kind of also looking at like the hardware and

Starting point is 00:04:41 performance here so we kind of want we kind of want to encourage you to also spend some time on baseline performance optimization. And so the final five points. So if you parcel test, you have 75%. That's a percent of the points. And the last 25% will then be given if you manage to beat some baselines. So we kind of also want to tease you a bit to implement and tweak here. And this worked out very well last year. So, we're doing it again.

Starting point is 00:05:10 This year or the same as last year, but this year I think it's going to be a bit more interesting. We have a leaderboard bonus. So, the fastest three solutions and fastest in this sense will be mainly runtime. For one or the other task, we might include something like memory consumption. We did that last year once.

Starting point is 00:05:31 But basically, there's one metric. And the three students with the highest points in this metric will receive bonus points. The first person will get three, second two, and the third will get one point. And now, as I think there are 21 active participants this year, there's actually going to be a bit more of a challenge. Last year, there were eight.

Starting point is 00:05:50 So then it was a lot easier to get bonus points. So this year, maybe that's a kind of, we're hoping for a bit of a challenge here that students also feel like digging into this. Oh, yeah, OK. I kind of gave this away. There's a each of you will present or discuss one task with us. This is just going to be like 15, 20 minutes. We need to figure out exactly depending on schedule and these kind of things. And they'll be the remaining 20% of the grade. And you can see the grouping in Moodle is online already. Yeah, and it's random, so I hope no one can complain. If something if we need to shift something, this is also possible.

Starting point is 00:06:31 But this should not be a case of like, I'd rather present this task than that task. There should be like valid reasons for not doing this. Yes. You can also see the list here again chosen at random. Yes, just a bit of expectations. I gave this away at the beginning a bit. So, under the assumption that we hope that all of you will do well in this oral discussion.

Starting point is 00:07:01 This is more of a sanity check, kind of checking that you understand. So we kind of expect you to get higher grades there. If someone fails on the oral discussion, that's probably a very bad sign. And so basically having the passing the advanced tests will get you something in the 2.x grade range. Having the baseline performance will get you somewhere in the lower part of the 1.X grade range, having the baseline performance will get you somewhere in the lower part of the 1.7-ish.

Starting point is 00:07:29 And then we're basically saying, if you pass all the optimized baselines and you have a good orgs on, then that's how you get a 1.0. Because then that means you've basically written code that's very efficient, can do what we ask you to do very efficiently. So, yeah, that's the target here for the grades. We just want to be upfront about this so people can see what the expectations is or where you want to see yourself in here,

Starting point is 00:07:54 and what the path to that might be. Then obviously, there's bonus points so you can shift a few things a bit if you're really into that. Yes. Okay. So, as I said, there's no exam. So, we are kind of treating the programming task like an exam. Because this is based on your grade is based on this, there are some basic rules that apply. So, yeah, basically, every task is an individual solution. They're not group assignments, so you may not submit the same code twice. It kind of says at the bottom, you can discuss the task,

Starting point is 00:08:31 and there's no way we're going to stop you from doing that anyway. But please just don't share your code. We have some checks that we run on the code to see if it's the same code. And this is smarter than just checking you have the same variable names. And we've run into issues with this

Starting point is 00:08:46 in previous iterations of other courses. It's not a lot of fun if we can see people have obviously copied or submitted the same code and just changed a few names. That's kind of awkward for everyone. And we're going to be pretty strict on this, purely for the fact that this is how you're greatest determined.

Starting point is 00:09:02 So basically, anything that we consider cheating will lead you to fail the course, like cheating would fail an exam. So, as I said, we have a somewhat elaborate setup for the tests and hidden tests and benchmarks. And basically because Git doesn't really allow us to do this any differently,

Starting point is 00:09:24 there's a cmakeList.txt in your repository at all times. You may not modify this. Because in it, we're basically setting up the environment and there should be no need for you to ever modify any of this. We kind of have constraints in there. For the first task, for example, we're setting specific compile flags for different files. If you modify this,

Starting point is 00:09:49 then we kind of have to consider the cheating. The same also holds for the there's a GitHub actions YAML file in the repository. The same goes for that. And even more for that, there should be exactly zero reasons for you to touch this. And this has secrets in it and certain commands and whatnot. We cannot hide them from you. We don't need to hide them from you. Our setup is kind of built in a way that unless you really try to break something, you won't.

Starting point is 00:10:19 So please just don't break anything. Because if you do, then we consider this cheating. Because this setup only works if we can trust you. So obviously, if it's an honest mistake, then that's not an issue. But there's a big distinction between an honest mistake and trying to break something on purpose. Please don't try to extract any hidden information, anything

Starting point is 00:10:41 like this. Again, we have a basic setup that prevents you from doing this. If we figure out that you're doing this, then that's bad. One very important thing is you're only going to be working on a main branch. And for you, there's basically no reason to do anything else. You don't need to create another branch.

Starting point is 00:10:58 We don't really care about what you're doing in the setup. For you, it probably even is better if we can see a somewhat random history, because then we know you've been working on this and there's progress and it's not suddenly one commit and perfect solution. That's always tricky. But very important, please do not force push any changes to the main branch, because we track the commit hashes for the benchmark runs. And if you force push something and the commit hashes change, then we can't see the history that gave you the results.

Starting point is 00:11:28 And that's bad because then, again, this is what we're giving you a grade on. If we can't see where this grade is coming from, then that's a problem. Yeah. So just there's no need to force push. Just don't. You can just resubmit. It doesn't matter. For the first program task especially, the setup, the GitHub action setup is so fast, it doesn't really matter.

Starting point is 00:11:48 For some, it was a bit more complex, and this would take maybe a minute to run. Now I think we're in the order of less than 10 seconds. So basically, it doesn't matter. So just if you change something, push something. But on the other hand, again, only push something if you know you're actually going to pass the basic test.

Starting point is 00:12:04 Because if you fail the basic test locally, you're going to fail them remotely. And then, well, it's just going to build something and then fail. Yes. Remote development. So, this is a bit from last year. So not all students will have access to the required hardware. Again, GPU and PMEM will probably not happen this year for setup reasons that didn't work out too well last

Starting point is 00:12:29 year. But especially for the first task, also SIMD, not all of you will have access to Intel's full SIMD suite. So this is SSE AVX2 and AVX512 instructions. Some of you might have, which in that case is good for you because then you can work locally, which is a bit nicer. But we provide you access to one of our group servers on which you have all of this set up. So basically if we require special hardware, we'll make sure you have access to it. This is exactly the same hardware as the evaluation server. So the development server is a dual socket system,

Starting point is 00:13:05 and the evaluation server only has one socket, but the CPU and memory and everything is kind of the same. And this for the first task, most importantly the CPU is exactly the same. Memory is not going to be an issue here. Again, shared resource, please don't break. We changed the setup a bit from last year. That means this year,

Starting point is 00:13:24 every student will have their own Docker container. After this task is online, I'll sit down and I'll send each of you an initial combination of username, random password, and the fixed SSH port. You can then log into that server. Then for example, you can use CLion or VS Code to do remote development

Starting point is 00:13:45 on there and it's your own kind of environment which makes it a bit easier to we ran into some file permission issues with the shared setup from last year so this year everyone will have their own so maybe something will break on there and it's the first time we're trying this please just let us know if something does break. We tried it manually a few times. It looked okay. But you never really know until something scales to more than one person. Yes. So, again, after the session here, after lunch, I'll send you the information. Then you can log into the server. And then this is the standard SSH connection into an Ubuntu Docker.

Starting point is 00:14:26 And then you can fiddle around. You don't have Zulu access in the Docker containers. You probably also don't need it. We have some basic stuff installed. So there's vim, neo, vim, emacs are installed. Nano is installed. You can do VS Code. You can do CLion for development

Starting point is 00:14:46 or whatever you like. If there's something that a few people are interested in setup-wise, we can also just add that to the Docker container for everyone. But if it gets too complicated or esoteric, then we probably just won't do it because it's not worth it. But again, that really depends on how you develop and how your setup is. One very important thing. So this is something new we're going to test.

Starting point is 00:15:10 Hopefully this is not going to happen. It didn't happen last year. The Docker containers, should they crash for whatever reason, anything that is just in this Docker container will be lost. So make sure you don't, if you actually do this, so for example, if you work in C line or VS code, then you always have your code locally on your laptop and it's just pushed to the remote server,

Starting point is 00:15:34 so then you're fine. If you, for example, just develop with Vim in the Docker container, then make sure you do regular pushes to GitHub because again, from the setup, it's quite tricky to persist all of these images across all volumes and whatnot. So I've opted to not do this.

Starting point is 00:15:55 If we need to, we'll maybe have to figure out something. But for now, hopefully this works. Again, there shouldn't be any reason for these Docker images to ever stop. And again, most people will probably have code locally somewhere anyway. But again, we need to figure this out the first time we're trying to set up. But yeah, hopefully this works. That should be fine.

Starting point is 00:16:25 Okay, yes, the evaluation server, just for your information, basically, this is the CPU we have in there. It's the same as in the development server. Important for the very first task now, especially at Cascade Lake Intel CPU, because this kind of gives you the information about AVX 512 capabilities, so SIMD capabilities. But in the server, it's running in Ubuntu 20. We wanted to update this, but basically because of how GitHub Docker or GitHub actions runners

Starting point is 00:16:52 work, it was a bit tricky to do this because GitHub actions kind of somewhat supports it. It's tricky. We're running everything in GCC 11 because that's the standard install we can get in Ubuntu 20. So we don't have to do anything complex there. So yeah, you can basically, that's the version we have. Feel free to use whatever you need in it. But again, you're not going to need super fancy C++ constructs for anything.

Starting point is 00:17:19 Everything's run in Docker. Exactly. There's only a single instance on the evaluation server. So we're not going to run into any issues of like, oh, my benchmark numbers are bad because someone else was also benchmarking at the same time. This benchmark run or the GitHub run is the only thing running on this server for the entire time. Again, for the first task, not super important. But really only push changes when you feel like you need to. So again, you can push a lot for the task.

Starting point is 00:17:49 It doesn't really matter because the queue is quick. But some tasks, again, if they take a bit longer, there's a bit more setup, then this gets expensive very quickly. And then you have an hour-long backlog, which in itself is also fine, but it might be a bit annoying for you to wait forever to get feedback on what's happening. OK, fun part.

Starting point is 00:18:10 There is a leaderboard. So this is kind of a gamification aspect. I don't know if any of you have participated in something like competitive programming at the Algorithm Engineering Chair. When I did this many years ago, they also had something like this and we've also copied this from a course at the TU Darmstadt and adapted it a bit.

Starting point is 00:18:35 So, last year this worked out quite well. It's a nice little challenge for you to beat the baseline and each other. I'll quickly just go through the layout. This is the URL. You can only reach this from the HPI network. VPN is also fine. But this obviously will not work from outside.

Starting point is 00:18:55 Again, OK. So we have anonymous usernames. There will always be a user called baseline, which is the baseline. So you can see our I'll get to that in a second. And then the other ones are, I don't know, hardworking lobster, willing bird, whatever. It's just a random combination of adjectives and birds. Animals, sorry.

Starting point is 00:19:17 This will be assigned to you once. The very first time your GitHub user enters this leaderboard, and it will stay the same for the entire semester. The first time, I've tried to think of a nice way how to do this, but unfortunately, I haven't really figured out a better way. The first time you pass the basic tests, it will say in the GitHub actions log, it will say this is your user, this is your anonymous name.

Starting point is 00:19:43 All of this is also documented in the task description, so you don't have to memorize this. Then basically, once you're hardworking lobster, you're hardworking lobster for the semester. If there's a very unfortunate combination that people are not comfortable with, we can change this. I think we've removed a few names. I think there was something with

Starting point is 00:20:02 weird adjectives and rats and stuff, that's not an ideal case. So we've removed that. But other than that, they should be rather harmless. OK, yes, then we have a few columns here. Oh, let's start at the top. This is the name, obviously. Here, start and end date, so you can always

Starting point is 00:20:22 check what's going on if there are no active tasks then this the first page will be empty but on the top you don't see this here there's kind of like a little archive box here and then you can click on the past tasks so in the first column basic test this is just an indication have you passed the basic test and this will basically always be set to true because you're not on the leaderboard until you've passed the basic tests the advanced test again willboard until you've passed the basic tests. The advanced tests, again, will indicate if you've passed the advanced tests.

Starting point is 00:20:50 Then the baseline indicates if you've passed the baseline performance. And the optimized baseline indicates if you've passed the optimized baseline performance. And then the next two columns kind of mean two things. So for you, as a student submitting something, the last run time will tell you this is the run time or the performance value you've determined for your last run. And then the best run time will be the best time you've achieved. So here you can see this is the best time. The last time was a bit slower, but it's still better than the baseline. So yeah, all is good.

Starting point is 00:21:24 And once you've passed the baseline, you'll never go back. So it's like a, because then you have one solution that passes this, and this is the one we track, and we track this internally. So whatever, basically the commit that achieved this, we store this. And if we can't retrieve that afterwards because you've done something weird to your repository, that's bad. For the baseline, which is why it's a double meaning the baseline,

Starting point is 00:21:48 this is the time you need to beat. So, for the baseline value here, in this case, it would need to be 4,400, and the optimized baseline would be 2,050. Those are the values that you should beat. In this case, we did because I think hardworking lots in this case was me running my own code. So you can also see there's a bit of the optimized base.

Starting point is 00:22:11 Like it's somewhat the performance that I've achieved minus a bit of leeway. So we kind of discuss internally what we think is appropriate here. And this in theory can also be changed. If we realize this doesn't work out, or we've miscalculated something, then we can adopt the baseline, and then we will need to see. So, if we have the optimized baseline

Starting point is 00:22:36 and exactly zero people achieve that, then we've probably done something wrong on our end. Meaning that maybe it was a bit unclear, the expectations were a bit off. So far, this has all worked out very well. So far, we've always had a healthy number of people passing the optimized baseline. But again, just from the expected grade range, we do not expect everyone to pass the optimized

Starting point is 00:22:58 baseline. Okay. Okay. Any questions to the general setup? Yes? How do we know what the events checks do? Okay. So good question.

Starting point is 00:23:18 So we have basically three test files in each setup. We have a basic test test which is what you get So this is basically a very small test set of tests you get this in the repository. You can see this Then in it it says like I know it's like for the first task It's like a few tests very small examples. You can run this and you can also modify this locally Because on the CI server we always use a fresh version So here you can you can add your own tests. So here, you can add your own tests. And we actually encourage you to add your own tests

Starting point is 00:23:48 because there's a high chance that just passing the simple tests will not mean you're going to pass the advanced tests because, again, they're basic and advanced. Then we have hidden advanced tests. You will not see those. They are only available on the CI server. When we run them, you'll have a standard Google test.

Starting point is 00:24:05 We use Google tests for this. So you have the standard Google test, pass, pass, fail, whatever. In the basic tests, you get the full output and what's expected. In the hidden test, this is hidden because it's hidden. But we try to provide you with error messages that indicate what went wrong. So in the very first iteration,

Starting point is 00:24:27 we had something like test failed and people like, I don't really know why. So we're trying to give you some guidance onto, okay, maybe think about this aspect or think about this input range to guide you along how to do this. Because the test names, it depends on the setup. I think for this one is just advanced test 1, 2, 3, 4, 5. So we're trying to the test names, it depends on the setup. I think for this one, it's just advanced test 1, 2, 3, 4, 5.

Starting point is 00:24:47 So the error messages are trying to give you some insights into what's going wrong. And that worked quite well last year. So we're kind of trying to stick with that. And then we have a benchmark. Again, that's also hidden. And that just, so in the task description, we kind of say what we're running, how it's run.

Starting point is 00:25:06 And then also we provide some output there. So, for this one, you need to implement a few methods, and you can see the runtime with each method, and how we determine the final value. But again, we're not really telling you what's happening internally because, well, otherwise you might be able to optimize explicitly for this benchmark which we don't want.

Starting point is 00:25:26 Does that answer the question? Any other questions? Okay. Then I'll move on to this task. So the first task you're going to implement is a delta column scan on compressed data using SIMD. So we started with the SIMD in the lecture yesterday. And due to a bit of timing conflicts, unfortunately, the second SIMD lecture isn't until next Tuesday. But I hope that shouldn't be too much of an issue.

Starting point is 00:25:59 So basically, I'll get to some details in the beginning. Basically, you need to implement a decompress method. So we have compressed input. You you need to implement the decompress method. So we have compressed input. You now need to implement the decompress. That just takes all the values and decompresses them. And then you need to implement a scan method, which decompresses the values, checks if they match a predicate. So let's say if they are smaller than 10.

Starting point is 00:26:23 And if they're smaller than 10, then you write them to an output. Again, there's a way more detailed task description rule. This is only a brief introduction, because you can read the document yourself. I don't need to read it for you. The deadline for this is Monday the 29th. So yeah, midnight.

Starting point is 00:26:44 Please make sure to submit before this. There is a tiny grace period. So if you submit something just, midnight, please make sure to submit before this. There is a tiny grace period. So if you submit something just after midnight, it's also fine. But please don't do your first commit on the 30th at 1 in the morning and be like, sorry, I'm late, which is why, again, I said, if you have a longer history, we can kind of see you've been working on this for a while. I know how deadlines are at the end. They get a bit tight.

Starting point is 00:27:06 That's fine. But please don't just start two hours before the deadline and then complain that the deadline was too early, because that's not fair to everyone else. So that's the deadline. OK. The first task. So we're going gonna do delta compression.

Starting point is 00:27:25 So, this is, yeah, kind of widely used compression technique, which basically what it does is if I have a series of numbers, I always just store the difference to the number before. So, I'll get to this picture in a second. It's a bit complex for what it is. I couldn't find a nice way to represent this. So in this task specifically, we have 32-bit unsigned integer values.

Starting point is 00:27:50 That's our input column. And then what we do is we basically, for each value at position i, we just store what is the difference to the value at i minus 1. So we can see this here. The first value is 10. The second value, oh, yeah, okay. I'll get to that in a second. Let's start from the right just because I'll get to the we can see this here. The first value is 10. The second value, okay, I'll get to that. Let's start from the right just because I'll get to the start value in a second. So this value is 24,

Starting point is 00:28:12 and we see the difference to the value before is 15, so the output is 15. Here we have 9. The difference to the value before is minus 3, so we store minus 3, then 12 to 10 is 2. And then because the first value obviously doesn't have a previous value, we also need to store a start value.

Starting point is 00:28:30 And the start value is always just the first value of the input. That means also the first delta is always 0. So that's basically it. Now if we go from the other direction, we can see 12 minus 10 is 2, 9 minus 12 is minus 3, and 24 minus 9 is 15. And that's how we get this.

Starting point is 00:28:46 And the main reason we're doing this is because we expect the deltas to be small. So in a database, for example, you would have 32-bit or 64-bit values. But let's say it's just an increasing range of numbers. In the best case, let's just say it's 1, 2, 3, 4, 5, 6, 7. Then the delta is only ever 1. And then we can store 1 very concisely. In this case, we wouldn't even need a bit, we could store this differently.

Starting point is 00:29:12 But let's say it's always only 2 bits, then we only need to store 2 bits instead of the entire 32-bit or 64-bit values. In this programming task, the diff or the delta here is 8-bit signed integer. That means instead of storing 4 bytes, we only need to store 1 byte, programming task, the diff or the delta here is 8-bit signed integer. That means instead of storing 4 bytes, we only need to store 1 byte, which means we have a reduction of 3 quarters.

Starting point is 00:29:32 Having 8-byte signed integer also means the value range is minus 28 to plus 27. So the diff between two values next to each other, two consecutive values in the array is never going to be smaller than 128 or larger than 120 plus 127. So the value range you can fit in an 8-bit signed integer. Okay, decompression basically works the same way around, or the same way just the other way around. Let's say we take the start value, which is 10 here, then we take the output, the compressed values here, which is 0, so we do 10 plus 0 equals 2, then we have a new start value, so 10 plus 2 equals 12, 12 plus minus 3 is 9, and 9 plus 15 is 24. So that's basically the concept of how this works.

Starting point is 00:30:20 You're just going through and it's always looking at the previous value and the diff to the previous value, then the next value and the next value and the next value and always just a diff to the previous one. So interesting this is something that's actually not very well suited for SIMD processing, which also makes it a bit challenging because there's a lot of dependencies within. So if you had this in one SIMD register, so there's four values here, you can see there's obviously dependencies between the values here.

Starting point is 00:30:51 But this also makes it a bit challenging to think of this. So, what we basically say here or giving you a hint on how to start with this is to think about how can you do some horizontal additions in and around your vector registers, and how can you maybe combine and shift entire registers. If you kind of get started with this, and there's also a bit of a description in the task description on this,

Starting point is 00:31:18 we are aware that at the beginning this might be a bit overwhelming, and this kind of works as intended. Understanding SIMD code is quite complex. There's a lot of documentation around it and for us we think as like the teaching aspect it's very important to be able to understand this and kind of navigate documentation around it, understand what's actually happening there. So there's a very, very good guide from Intel that's very detailed, has a lot of descriptions around what's going on. But it takes a while to understand the names,

Starting point is 00:31:48 to understand what they do, how to navigate this. And we consider this part of what you're supposed to achieve during the programming task, is to understand and navigate this. So here in the small example about combining and horizontal shifting. So basically what we have, if we look at our input values here, if we let's say we want to add we have a starting value 10 here so we can just use a register where all lanes are 10

Starting point is 00:32:14 and now we can see we can do some arithmetics here somehow that's for you to figure out is where basically the first value is just 0 then the second value is these two added up and then this one is these two value added up, and this one is all four values added up. And then we can just do one addition here with the register, and then we get the same output that we expect here. But instead of doing this in this kind of chain fashion, we've done something to create a SIMD register where all values represent something. Then we can do one addition with the base value. And then we get all our output.

Starting point is 00:32:48 Again, that's the task. That's kind of trying to figure out how this works. So yes, as I mentioned at the beginning, you need to implement two methods, or eight methods in total, but two different types of methods. One is the decompress method. The decompress method, what you do is you take this here and this value, so the blue values here,

Starting point is 00:33:10 and your task is to just output the red values. And the second one is to implement the scan method, which does the same thing internally, but instead of returning all values, you also have to do a filter. So a filter would be like, please just return all values that are larger than 15. So then in the output you would only have 24 and these values you would not store. Two types of methods. And you have to implement each of these four times.

Starting point is 00:33:41 The motivation behind this is, I wrote in the task description that your boss is evil and makes you implement this multiple times. And in this case, your boss is me. But on a more serious note, so this is actually also some active research we've been working on and kind of looking into these things. Developing this for different platforms is actually quite tricky. So let's say you're actually a database vendor or you want to write code that runs efficiently on different platforms, then you don't know which platforms you're running on, then you might not

Starting point is 00:34:12 have AVX 512 with a high probability or like for x86 you're guaranteed to have SSE, so the Scala one is just very basic for you to get the understanding of what's going on. That should take only a few minutes. SSE is guaranteed on all x86 servers. But realistically, if you run something in the cloud, let's say an AWS, you don't know 100% sure, depending on how your setup is, if you get AVX 512 or if you only have AVX 2. Most laptops, if you have Intel laptops, will only have AVX 2 and not AVX512 because Intel killed this for consumer hardware. So it kind of gives you an insight into, okay, what do I need to do if I actually have to support this on multiple platforms? And this is only Intel.

Starting point is 00:34:56 Nowadays, a lot of people are starting to move their cloud stuff to ARM, to like the Graviton instances, for example, and AWS. And that now means you would have to implement this also in a completely different SIMD instruction set for different CPU architecture. So this kind of gives you an idea of, OK, what are the limitations and the advantages of certain SIMD instruction sets? Most specific or most notably, you'll

Starting point is 00:35:20 see that AVX-512 is a huge addition and does a lot of things and also implements a lot of instructions again for smaller instruction sets, so for like 128 bits. But then again we're kind of forcing you not to use those because we use specific compiler flags to tell the compiler you can assume this or you can assume that. So yeah, after a while you kind of figure out that each of these has their own limitations. So yeah, there's some limitations in AVX2 for example, regarding like cross register stuff. AVX512 can do all the fancy things, but has some other disadvantages. I'm not sure if we covered this in the lecture, but most notably, ABX 512 on most CPUs will

Starting point is 00:36:08 cause a throttling of the CPU clock frequency because the CPU otherwise gets too hot. So, yeah, again, each of these has some ups and downs, you need to figure this out. The compressed method is implemented by us, you don't need to deal with that. Hint, this is probably a very good starting point to think about the scalar implementation because it's just the reverse.

Starting point is 00:36:30 Again, the scalar is just for you to kind of do a sanity check that you've understood what this actually is. This is covered in the tests. We test. I'm not sure if we... We test all four implementations in the tests, in the basic tests and in the advanced tests. So, you need to have a correct implementation for all versions for the advanced test to

Starting point is 00:36:54 pass. And when we hit the benchmark, we will also run all versions, but we'll only pick the fastest. In my code, AVX512 is the fastest. So, there's a good chance that this will also be the fastest for you depending on how you implement it. But it's also towards the lower end. It's going to be interesting for you to observe

Starting point is 00:37:16 the performance differences between Scalar, SSE for example, what changes to AVX2 have in there, and then what can we do on top with AVX512. Just a few random assumptions again all in the task description. You can assume a few basic things so we don't really care about handling edge cases that much because it doesn't like help a lot with the understanding. So generally for anything you do with SIMD I hope this was also covered in the lecture already. Generally let's say we have a SIMD register I hope this was also covered in the lecture already. Generally, let's say we have a SIMD register with four values in it, but if my input is not divisible by four, then obviously I need

Starting point is 00:37:49 to have a small tail that covers the last few values. In this case, we don't want you to implement that. So we're saying all input is divisible by 16. You can just assume that. You can assume that the input and output buffers you're reading from and writing to are always aligned to 64 bytes. This is very helpful if you're using something like AVX 512, which is also 64 bytes. And the output has 64 byte padding at the end, which allows you to, if you want, to write behind the end of the output buffer, because otherwise, that would technically be undefined behavior, and we don't want that.

Starting point is 00:38:28 So, this is something that you would commonly do in these setups. Again, we encourage you to really write more tests in the basic test CPP file. We cover very basic ones. This is something that is very easily to test manually. You can come up with different edge cases, different implementations. You can more or with different edge cases, different implementations.

Starting point is 00:38:45 You can more or less just copy the code that we have for some basic tests and extend them to cover some ranges yourself. And whatever's in there, we really don't care. We throw that away. We don't really even look at it. And that's just for you. On the server, we always have fresh files

Starting point is 00:39:00 that we use that just cover the basic tests. So if the plain basic tests that you have locally run, they should also run on the server. If they don't, then that's going to be an interesting problem that's probably related to setup. There are quite a few files here because we have different files for each of the SSE scalar AVX2 and AVX512 versions.

Starting point is 00:39:24 And you should only modify code in a file that starts with SIMD underscore, delta underscore, which is exactly four files. The rest, please just don't touch. Or you can also obviously do the basic tests. Please don't create any new files, because that would require you to change the CMake lists file, which we don't want you to do.

Starting point is 00:39:45 And there's also no need to do this. So please just stay in the files that there are. But again, there shouldn't be any need to do anything else. Yes, we will discuss, actually, are there any questions to this so far? Probably a lot and there's a lot more information in the task description. A lot of what I said also is in there, or all of it. OK.

Starting point is 00:40:14 Now let me finish with the discussion that we'll have. This will be the day after the deadline. So generally, always we'll have the deadline at midnight one day, and then the discussion will be the morning after in the lecture slot. So, in this case, the Tuesday 11 AM lecture slot on the 30th of May will be the discussion of task one, and then we'll also introduce task two. With the major difference today,

Starting point is 00:40:39 this is being recorded because we're also giving general information. These other tasks discussions will not be recorded purely for the fact that as soon as I discuss the solution with you I can't use this task ever again because then it's on teletask forever and then the next year people will just come look at the solution copy that and we're done which is not what we want so if you I know there's some conflicts with with other lectures in this time slot. If you cannot attend or you know someone who cannot attend, you can always just come by or write us an email

Starting point is 00:41:10 and we can explain something and discuss how, yeah, if there's something unclear, or maybe you can discuss with your friends or fellow students that were here and kind of got the solution. Yeah, not recorded. We'll introduce the next task. But the task introduction is always going to be kind of the level I did today.

Starting point is 00:41:34 The ground truth for anything will always be the PDF in Moodle with the task description. That's basically what we rely on. That's what we give you. If we update it. We let you know Yes, and So this year again Three of the four programming tasks will be new and so last year was four out of four this year It's three out of four. The first one is new. The second one is old and the other two are new So if anything goes wrong if anything seems off if something seems like it should not work like this,

Starting point is 00:42:06 please let us know, there is a good chance that something is wrong. The number of students has nearly gone up by 3x, so we expect some other setup related issues to pop up here and there. We've gone from 8 to now 21 students doing this course, and these problems, they kind of scale linearly with random setup issues. So if something feels wrong or seems wrong, just let us know. Write us an email. If something's generally unclear, feel free to use the forum to discuss these things. Maybe provide additional information in the forum.

Starting point is 00:42:39 Please don't post your code in the forum. But again, we said, feel free to discuss something. But maybe don't explicitly say, write this line of code to do something. That would kind of be too much. If you're unclear about whether this is too much sharing, just write us an email. So you can write Marcel and me or Professor Rabl an email.

Starting point is 00:43:00 And we can figure out how to solve the issue. And with that, I think we are good for today. Any questions? Okay, perfect. Then thank you very much for your attention. And I will see you, well, yeah, next lecture is 72. That's on Tuesday. And I'll see you probably on the 30th when we discuss the solution or if you have any questions in between.

Starting point is 00:43:26 Thanks.

Your Ad Here

Hardware-Conscious Data Processing (ST 2023) - tele-TASK - Task 1: SIMD Delta Scan

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.