Hardware-Conscious Data Processing (ST 2024) - tele-TASK - Task 1: SIMD Delta Scan

Starting point is 00:00:00 Okay, I think we can get started. For the ones of you who don't know me, I'm Marcel and together with Florian, I'm leading the programming tasks. And in today's session, I first want to talk in general about how the programming tasks work, what kind of setup we use, how the tasks are executed, how they are graded, also what kind of expectations we have and what kind of what you can expect from our grading. And in the second part, I will introduce you to our first programming task, which is the SIMD Delta Scan. So in general, we use GitHub Classroom. For the ones of you who are not familiar with that, GitHub Classroom is some kind of wrapper around the GitHub environment, which allows creating programming tasks, assignments. And this allows us to provide you some repository with a certain amount of code.

Starting point is 00:00:52 And you can basically fork this repository and provide your solution in there and submit your solutions. This also means that all of you need to have a GitHub account. We don't really care about what account this is. So if for some reasons, some privacy reasons, for example, you don't want to share your private GitHub account, feel free to create another one. But in the end, we need a GitHub account that we can link, that we can use in the GitHub classroom.

Starting point is 00:01:21 And it's also important that you use this GitHub account across all the tasks. So it would be bad if the GitHub account switches across the different tasks. Already online in Moodle right now, you can find the task description of task one. And there's also a GitHub Classroom link. And when you click this link, you

Starting point is 00:01:44 can join the first assignment when you do it for the first time then you are asked to choose your name so we provide a list of students and in this list there are all the names of the active participants so if you cannot find your list in there then you're probably not in the group of the active participants. So please, when you join this assignment, make sure that you choose your name. Otherwise, there might be a conflict

Starting point is 00:02:14 since the other student will not find their name anymore. And then we need to unlink and link new accounts. And this is just additional effort that we can avoid. So please make sure that you choose the correct name here and once you joined the assignment then github classroom will create a private repository fork of our provided one and until then you can just interact with it as common for for github you can just use the normal GitHub workflows. You can develop locally.

Starting point is 00:02:46 You can run your tests locally. And you can also push your solutions. A task submission is performed by just pushing a commit. And you can also push multiple commits. So you're not limited to only one commit. So just feel free to push regularly. Every time when you want your solution to be evaluated, just push it to your repository.

Starting point is 00:03:12 And then actually, when we can also see a history of commits, we can see that you are actively working on doing progress. It would always be a little bit suspicious when only, let's say, an hour before the deadline, there's only a single commit with a perfect solution. So feel free to push regularly here. And for the evaluation of the task, we have a GitHub runner.

Starting point is 00:03:36 This runs on one of our group servers. And this is an automated procedure here. It's based on GitHub Actions that you can also see in the GitHub repository that you will then be working on. And we basically have a base structure that you can see at the bottom on the slide here. We have different stages that build on top of each other. So we have first a basic test stage. If you pass these tests, then there's another test stage

Starting point is 00:04:09 in which advanced tests will be executed. After this, we will perform a certain benchmark workload in which we measure performance. This performance number is often a runtime of the certain workload that we executed. In some cases, this might also be a combined metric. For example, combining runtime and memory consumption. We did this in the past for a few tasks.

Starting point is 00:04:34 And we might also do it in this semester, but we will let you know about this. And yeah, with this benchmark performance, you will receive a certain performance metric. And with this benchmark performance, you will receive a certain performance metric. And with this performance metric, then you can pass a certain baseline performance threshold or even an optimized baseline performance threshold. And for all these stages, you will receive points. In total, you can receive up to 25 points for the tasks with some additional bonus points. I will come to this later. And with the four tasks in this course,

Starting point is 00:05:08 you can in total get 100 points, 25 per task. If you pass the basic test, you will get seven points. This is a very basic implementation then. Does not necessarily mean that it's completely correct, that all edge cases are covered. But if you pass the advanced test, then we consider this as a correct solution with correct functionality.

Starting point is 00:05:33 Since in this course, we also want to teach you or to make you aware of hardware conscious data processing, also programming in an efficient way. We also encourage you to try to tweak, to tune your performance so that you can also achieve the baseline and optimize performance. And then with these two additional stages, you can add additional five points.

Starting point is 00:06:00 So when you manage to beat the baseline performance, you get two additional points and the last three points you can get if you also manage to beat the optimized baseline. These performance thresholds are given in a leaderboard. You will also find a link to this, to the leaderboard in the task description. We hosted it on a server that is accessible from the HPI network. So you can also use the VPN to access it. If you're not somehow connected to the HPI network, then you cannot access the leaderboard.

Starting point is 00:06:41 And in addition to the 25 points for the different stages you can also get bonus points based on your rank in the leaderboard so the first three ranks in the leaderboard can get additional points if you are number one on this leaderboard you get three points number two two points and number three gets one additional point but you can of course also achieve 100 100 points in the course without getting any bonus points here. In addition to our programming tasks and the points that you can get there, we will have a task presentation and discussion with us.

Starting point is 00:07:25 So each of you will present one of the tasks. So we have a random assignment here, which is already also in the Moodle. But I can also show it here on the right. You can see to which task you're assigned to. And in this presentation, it will not be in this lecture hall it will be in a small group with professor Ravel, Florian and me in which you basically walk us through your solution explain the solution to us and we will ask you some

Starting point is 00:07:55 questions about your solution we will just want to check if you have done it and not copied it from someone else or if it's a solution of chat GPT. And other than you explaining us the code, we will also ask some theoretical questions about the concept, just to check if you understood what you did there and what's the reasoning behind it. In the past two iterations, so this is the third iteration of the course, we also gave points for that. But it didn't work out so well, so therefore we switched it to

Starting point is 00:08:30 a binary assessment. It's either pass or fail, and you need to pass to successfully finish this course. And as I mentioned, we already randomly assigned you to the groups, as you can see. And in terms of grading, to just give you an impression about what you can expect with passing the different stages. So if you pass only the basic tests, remember you will get seven points per task. This will not be enough to pass this course,

Starting point is 00:09:04 so you need more than that. So at least a couple of advanced tests. Or you need to pass some advanced tests in multiple tasks to actually pass this course. If, for example, you pass the advanced tests in all of the tasks, then you will get 80 points, which will result in a grade of 2.0. And if you pass all the baseline performance

Starting point is 00:09:34 tests for all the tasks, then this will be a grade of 1.7. And if you pass all the optimized baselines, then you actually achieved what we're trying to teach you here. You have a correct solution. You also consider the hardware and try to efficient code, which you showed with your implementation. And in that case, you will get a 1.0. But of course, you can also tweak your grade a little bit

Starting point is 00:09:57 if you get the bonus points. It's important to note that the grade of this course is determined to 100% by the programming tasks. So therefore, we also treat the programming tasks like an exam. And therefore, as it's usual for written exams or oral exams, we also have some rules that we expect you to follow for the programming tasks. First, the solutions are individual solutions. So please don't do pair programming. Don't submit the solutions together. I mean, you can see at the bottom, of course, you can discuss about the task.

Starting point is 00:10:45 And we also encourage you to discuss about it on a conceptual level. But we can also not prevent you from discussing this. But we also would encourage you. But what we don't want you to do is that you just copy code, give it to another one, or that you share your solution, for example, in the Moodle forum. And we also use a code checker in which we,

Starting point is 00:11:11 yeah, plagiarism checker in which we not only check if the variable changes so they're a little bit more advanced. And yeah, so there are some ways in which you would fail the course. So one is, as I mentioned, if you copy code from someone else, this will result in failing the course. Also, for our automated grading, for the automated code evaluation,

Starting point is 00:11:38 we have GitHub Actions implemented. So therefore, you will find a certain build test yaml file in the repository and also we have certain specifications in the cmac lists so in the project configuration for example for the simd task we provide some compile flags so that you can only use a certain set of simd instruction sets i will come to this later. And please do not modify these files, so there's no need to modify the CMake list or also the GitHub Action YAML file. And also to allow this entire automated evaluation, including the different test stages and also the benchmarking stages and also the

Starting point is 00:12:25 benchmarking and also pushing your performance numbers to a leaderboard I will show it to you later there are also some secrets that we use in the GitHub actions we cannot hide them from you but we also don't need to hide them from you but just please don't to extract some kind of hidden information please don't extract some kind of hidden information. Please don't try to hack some of our setup here if we realize that you tried to extract some kind of secrets or hidden information. Or if you also try to break the setup, then this will also be considered cheating.

Starting point is 00:12:59 And this means you will fail the course. In terms of implementation and pushing your solution, make sure that you push your solution to the main branch. There is actually no need to create any other branches. But please make sure that also the solution that should be evaluated by our setup is available on the main branch. And for the evaluation, as I mentioned,

Starting point is 00:13:29 we have an evaluation server in our group. And this server will execute the evaluation of one code at a time. So there will not be any evaluation in parallel. So therefore, there will not be a case in which, for example, your performance metric of your solution will be affected by the testing compilation or performance evaluation of another task.

Starting point is 00:13:58 So there's only a single evaluation at a time. This also means that there could be some waiting times, especially now in this year we now have 35 active participants. Last year we had 20, the year before we had about 8. So it's scaling up and therefore the more you push, the more waiting time could also be generated for others. So please don't overdo committing. Just make sure that you only commit what you really want to be evaluated by the GitHub runner. And also have this in mind when you approach the deadline. Because for example, if all the 35 students would start pushing

Starting point is 00:14:43 the solutions only five minutes before the deadline, then there might be waiting times. And then the evaluation of your code might not be evaluated anymore in time. So just have this in mind that we have a single task evaluation at a time. We will track the GitHub commits that you push, especially the performance numbers for your commits that pass all the test stages and where the benchmark is executed on.

Starting point is 00:15:18 And we only consider for your ranking your best solution. So you can of course push multiple solutions by just pushing your commits and it might be the case that a later commit has a worse performance than a previous one and we only consider your best performance here but therefore you we also need to see your history. So please do not force push any changes. Because if you have certain performance number and we cannot actually see what code this performance number is based on, then that's bad. That's difficult for grading, then it's bad for you, it's bad for us.

Starting point is 00:15:56 So please just don't force push. In terms of development, not all students might have the hardware that is required for the tests and this is especially relevant now for the first task in which you should implement SIMD instructions and therefore we provide you a development server and actually yesterday all of you should have received an email with some information about how to connect to your to the development environment so we provide access to a remote server here the development server has the same or is very similar to the evaluation server the development server has it's very similar to the evaluation server. The development server is a dual-circuit system, and the evaluation server is a single-circuit system.

Starting point is 00:16:50 But the CPUs and also the memory configurations, so the CPUs are the same. Memory configuration is very similar. And also here, have in mind, similar to the evaluation server, that this is a shared resource. So therefore, if all the 35 of you, so maybe one more detail. As you can see in the bottom, each student has their own Docker container here.

Starting point is 00:17:15 And you can connect into your container with your first dot, last name, and username. Then we generated a random password for you, an initial one, and you have a fixed SSH port. These three pieces of information were sent to you yesterday via mail. If you don't have received one, please let us know, or also if you have any issues with connecting to your container, please let us know. So there's one container running for each student

Starting point is 00:17:45 and back to the shared resource yeah this is the first iteration which we have so many students last year it was 20 students and we did not have any issues in terms of shared resources so there was no there were no big delays in terms of development so nobody complained about this. Let's see if everything works out fine this year. Also, please let us know if you see any issues in terms of developing on this development machine.

Starting point is 00:18:18 But also have in mind that you share the resources here. Therefore, for example, if you build the binary, maybe please do not use all cores for that. So for example, there's this make j parameter. Don't use, for example, all the cores here. Have in mind that it's a shared resource and that also other fellow students of you use the server and also want to build and run and test

Starting point is 00:18:46 the code. In the stocker container, there are already some tools pre-installed. For example, there is vim, Emacs, or Nano if you want to use that. But you can, of course, also use remote development setups, for example, with VS Code or CLion. Development workflows might vary significantly across people.

Starting point is 00:19:12 So if you see something that you really need for the development that is blocking you from making any progress that you think that you really need in your Docker container, let us know, and then we can discuss it. And if it's easy to implement, then we can also just adapt the containers and provide it for also all of your fellow students. But if we see that this might be rather complex

Starting point is 00:19:38 and that we consider it as not a reasonable effort, then we might not address your request. But if there's something where you think this would really improve the development environment, just let us know. And then, of course, we can discuss it. Regarding the Docker container, it's very important to know that if the container should crash,

Starting point is 00:20:02 anything that is just in the container will be lost. So therefore, make sure that you regularly commit and push your solutions. Also, depending on your development setup, this might not be a problem. For example, if you use the remote development setup with CLion, then you actually have the files locally on your machine. If then for some reason the Docker container should crash,

Starting point is 00:20:30 you still have the files on your local machine. But if you would work with Vim, Emacs, or I think also with VS Code, you don't have the files locally, then this might be an issue if it should crash. We also did this in the last iteration last year with 20 students. It worked out quite well. So nothing crashed. Now it scales a little bit to 35.

Starting point is 00:20:55 We don't expect it to crash, but we never know. So we hope the best. But just to let you know that you also have this in mind, not that you start implementing everything from scratch up to your solution that also beats the optimized baseline, but you never saved it, pushed it to the server, then everything might be lost. All right, just some details about the evaluation server. This is the CPU that we use.

Starting point is 00:21:26 As I mentioned, we have these CPUs on the evaluation and also on the development server. Important for the first task is that it's a cascade-like architecture. This also means it provides or supports AVX 512 SIMD instructions. We use Ubuntu 22.4 here. And we use GCC 11 as the compiler,

Starting point is 00:21:51 which is the default compiler that you can install on Ubuntu 22. So the project also supports C++ 20 features. But as we have GCC11, be aware of the features that are supported by that compiler version. But you also don't need the latest and greatest C++ features for your solutions, so there should not be any limitation because of the compiler version here. As I mentioned, there's a single instance

Starting point is 00:22:30 running on the relation server. And also, only push the changes when you need to. This is a little bit of trade-off. On the one hand, please don't push too much so that there's not too much resource contention with fellow students. But on the other hand, also, as I mentioned, when a Docker container crashes, every of your files are lost. So it's, yeah, one aspect says don't push so much and the other aspect says push regularly. So maybe you

Starting point is 00:23:05 can find a balance here. But we just want to avoid that for every tiny change that you make, that you push another commit and therefore might spam the runner and also block other students, fellow students, from getting their solution evaluated. Okay, any questions so far? Okay, then the fun part, the leaderboard.

Starting point is 00:23:36 For every task we have this leaderboard on which you can see your performance numbers. This is the link that you can use to access the leaderboard. As I mentioned, you need to be connected to the HBI network here. You can also use the VPN. And yeah, there are different information shown in this leaderboard here. First, we have in every task a baseline user.

Starting point is 00:24:01 A baseline user just represents the performance numbers that you need to beat to get additional points for beating the performance or for the baseline performance numbers. So this baseline user, you can see the columns on the right. There's the last runtime and the best runtime column. These two columns actually have different meanings here. For the baseline user, the last runtime, or as you can see below, the baseline. So this is the performance number that you need to beat if you want to get the baseline

Starting point is 00:24:38 performance points. And then in the best runtime or the optimized baseline column, the baseline user presents the performance number that you need to beat to get the additional three points for beating the optimized baseline. If you submit your first solution, you will also have the chance to be on the leaderboard, but you will only be shown if your solution passes

Starting point is 00:25:07 the basic tests. And the other way, you enter. So for the first time, when you pass the basic tests, you will also get an anonymous username here. So this is a combination of an adjective and an animal, for example, here, the calm camel. And we tried to remove unfortunate combinations. So for example, there was something with rats.

Starting point is 00:25:35 We just removed that. We thought this is quite unfortunate. But if any of you feels offended by the randomly generated name, just let us know. But the name should be rather harmless. And yeah, once you pass the basic tests, your name will also be, or your anonymous name will be here. The way you identify your anonymous name

Starting point is 00:26:00 is not super obvious. It will be printed in one of the GitHub action stages. So if the GitHub runner runs your solution in one of the tests, but this is also described in more detail in the task description, it will print your anonymous username. And once you have an anonymous username, it will not change across the entire

Starting point is 00:26:26 task so you will always be in this case for example the calm camel and then you have the different check marks here depending on how your performance how your solution performs so if you pass the basic test this will always be true if you're on the leaderboard, as I mentioned, because you're only on there when you pass the basic tests. But then you will also see the different check marks if you pass the advanced tests, if you pass the baseline performance that is determined by the baseline user, or if you also pass the optimized baseline. And as you can see on the very right,

Starting point is 00:27:06 you see also these nice award symbols. So if you have one of these award symbols, you will get additional bonus points, as mentioned before. And your ranking is based on your best runtime. So I briefly talked about the two columns, lastRuntime and bestRuntime, with the meaning for the baseline user and for you. The lastRuntime column always shows you

Starting point is 00:27:37 the runtime of your last commit if it passed the advanced tests and the performance benchmark was executed. And best runtime column shows the overall best runtime that you achieved with any of your solutions that you push to your repository. All right, so far the details about the general setup, the general submission, how we run your code, how we evaluate it, how you can develop. Are there any questions to the general setup so far? Okay, that sounds good. Then we will continue with the first task. So in this task, you should implement a delta column scan on compressed data using SIMD instructions.

Starting point is 00:28:37 So there are basically two methods that you need to implement. One function decompresses a compressed column. So we can imagine here database systems with a columnar storage layout. And you have one column compressed in a certain way. And you want to decompress the data in one function. And the other function scans the data, which means that you internally also need first to decompress the data but you

Starting point is 00:29:07 Only write values of that column to output buffer that qualify a Certain predicate. For example, only values that Are larger than ten. Everything that i'm explaining You about the general setup and Also about the task, all the details are also in the task Description that is already online in moodle. I will only briefly introduce the task here. You will find way more details in this task description and we

Starting point is 00:29:39 Also see this task description as the ground of truth. So always refer to this document. And if there might be some changes because we see that something is not working out during the task execution, then we will also let you know. And you will always see an announcement in Moodle if we might change something. This did not happen in the last year. And I also don't expect it. But there is the chance that something might Not work out as expected and uh yeah but we will make sure that um that you will know about it the deadline is monday the 27th um so it's about three weeks and

Starting point is 00:30:18 the task will also be discussed on the day after the 28th of may. In that session we will first Discuss the solution of the first task and then also Introduce you to the second task. Okay. But more details about this First task. Yeah. Imagine you are a developer in a company with a database management system as their product, and now your boss wants you to develop.

Starting point is 00:30:51 You should implement decompression, or you should implement delta compression. Delta compression basically means that instead of storing all the values, so assume the values that you see in red. This is your uncompressed column with certain integer values. Let's assume you have 32-bit unsigned integer values here. And now you want to compress them, because your boss saw some nice talk about SIMD instructions,

Starting point is 00:31:19 and he thinks we can significantly save space here if we use data compression. And what data compression means is that instead of storing each individual value, you only store the difference, the delta, between two values. Or in other words, a value at position i, the compressed value will be value at i minus a value at i minus 1. Or in this example here, there's an exception for the first value.

Starting point is 00:31:52 So basically, your compressed version stores the start value, which is 10 in this example. And then you have only the difference of the delta values. So for example, the first value here, 10, you don't have a previous value. So therefore, you need to calculate the difference of that value to the start value.

Starting point is 00:32:18 Since they are equal, 10 minus 10 is always 0, meaning the first value of this data compressed representation is always 0, meaning the first value of this data compressed representation is always 0. And then for the following values, you check what is the difference between this value and the previous value and just store it at the position of the output column here. So in the case of 12, we check 12 with the previous value.

Starting point is 00:32:47 12 minus 10 is 2. Therefore, our delta is 2, which we store then in our delta compressed column. Then for 9, we check it with the previous value again. 9 minus 12 is minus 3, which we store in our compressed column. And then 24 minus 9 is 15, is also here then coded or the compressed value so yeah we always have in the end start value and this array of deltas and the motivation here is that you could significantly reduce the um the consumption. Let's assume you have a column in which the values are only sequentially

Starting point is 00:33:29 increasing. Maybe the difference, or let's say the difference is between 0 and 3, then you only need 2 bits to represent the delta, because you can always store the delta of 0, 1, 2, or 3 within 2 bits. And in this task, we assume that the delta is an 8-bit signed integer. And so therefore, you can also assume that the delta of the column or the uncompressed column is always in the range of minus 128 and plus 107, which is the value range of an 8-bit signed integer. So this compression function will be provided.

Starting point is 00:34:13 So you do not need to implement it, but you should implement the decompression step here. And this is now we switch the order here. We get a compressed column with its start value. And the idea here is that you should calculate the unencoded input. And in this example here, you take the start value. And for the first value, you just add the delta of the first value to your start value and get the encoded value.

Starting point is 00:34:47 For the second value, you actually need to add the delta to the previously calculated value. So therefore, 10 plus 2 is 12. And then for the next one, you add minus 3. So 12 plus minus 3 is 9. And then for the last one, you add 15. 9 plus 15 is 24. So therefore, you have this chain of value calculation here.

Starting point is 00:35:16 But you can also do it different with SIMD instructions. And this is actually what we want you to do. This is not intuitive, and it's not very trivial for SIMD instructions and this is actually what we want you to do. This is not intuitive and it's not very trivial for SIMD instructions because you have these dependencies across the values. So if you look at the decompress box, the input values, the compressed values, you have these dependencies because, for example, if you want to calculate the latest value, 24, you first need to sum up the values of the entire blue column here and add in addition the start value to actually get the correct unencoded output value.

Starting point is 00:36:02 And one of the learning goals here is it might depending on the experiments with Cindy instructions might feel a little bit overwhelming first because we also want you to work with different sim the instruction sets but one of the learning goals here is also that you learn to navigate through the documentation we will also provide you with several links to SIMD instruction implementation, especially there's a very comprehensive guide by Intel where you can look up all the different instructions

Starting point is 00:36:35 of the different SIMD instruction sets. And yeah, you should learn to navigate with it, to get familiar with the names, to identify what of these instructions or what are the purpose for the different instructions. And to basically calculate the unencoded value here, you always need to sum up the previous delta values and add it to the start value.

Starting point is 00:37:02 So you can imagine this example on the bottom left here, in which you have the second. Let's say these are registers. You have the second register filled with a start value. And in the top register, you somehow find a way in which you sum up the values, meaning in the first value, you only have the 0. In the second value, you need to add 0 in the first value you only have the 0. In the second value, you need to add 0 in the second value,

Starting point is 00:37:29 so 0 plus 2. And then when you follow the values in that register, you always need to sum up the previous values so that you then can just add the top register and the bottom register to get your output or your resulting register with the correct unencoded values here and one hint that you can oh yeah one starting point that you might want to use here is to research about horizontal additions. Maybe you can find some ways about how you can horizontally sum up values so that you can actually

Starting point is 00:38:14 achieve the situation in which you then just have to add the two registers. We don't want to give too many hints here, because as I mentioned, this should be one of the learning experience that you learned how to navigate through the documentation. And there are some kind of repetition here, but we want you to implement a decompress and a scan function.

Starting point is 00:38:38 So the decompress one scans the compressed data and stores the individual compressed values in an array. And the scan function basically does the same internally. But also, as mentioned before, we only write values to an output buffer that qualifies certain predicates. So we have two values here that we compare. So our qualifying values need to be less than equals and greater

Starting point is 00:39:09 than equals the two given range values. Each of these functions should be implemented with different SIMD instruction sets, so you should implement both of these functions with SSE, AVX2 and AVX512 Intel SIMD instruction sets. And also, just for a sanity check, we also want you to provide Scala implementation, which is just for you to understand the basic concept to just,

Starting point is 00:39:50 let's say, as a small warm up to just understand or to just make sure if you understand conceptually this scalar implementation should not take long, just a few minutes. And then you should work with the different assembly instruction sets. And also in the CMake lists of the code that we provide to you, we will reset some compiler flags

Starting point is 00:40:11 so that in the different solutions that you need to implement, you can only use the specified set of SIMD instructions. So in the end, you have to implement each of these functions four times. And you do not need to care about the compress functions. This is already given. Hint, this might be a good starting point for the scalar decompress implementation, since you just need to do the reverse function to decompress it. And we also have a list of assumptions here.

Starting point is 00:40:51 You can also find all of these assumptions and even more in the task description. If you have a register with, let's say, four values, for example, if the four values would fit into a register and now you want to process six values, then you have an additional tail. So you somehow also need to find a way to process the remaining two values that do not

Starting point is 00:41:18 fill an entire register anymore. We do not care about this corner case, so therefore we give you some assumptions here. You can also assume that the input size, which is the size of your input buffer with the compressed values, is divisible by 16. And then you can also assume, so you have an input buffer, an output buffer, you can always assume

Starting point is 00:41:41 that these buffers have an alignment of 64. This can be helpful for AVX 512 instructions, which use 512-bit registers, which are 64-byte. And you can also assume that your output has 64-byte padding. This is relevant if you, for example, want to perform SIMD instructions that goes beyond the end of the output buffer. So there might be an instruction that could be helpful for the task here. And if we cannot assume padding after the output buffer, then this would actually be undefined behavior, which is bad. We don't want this. So therefore, you can assume the 64-byte padding here.

Starting point is 00:42:33 Yeah, something about the tests. I explained the different evaluation steps to you, the basic tests, the balance tests, and the baseline performances. And in the repository, you will only find the basic tests. And the advanced tests are hidden, so they will only be executed on Doing the evaluation on the evaluation server with our github runner. And feel free to modify the basic test file. And we actually encourage you to do so because the test cases that you will find in there

Starting point is 00:43:11 are really only basic and you will probably not pass, you will not pass the, there is a high probability that you will not pass the advanced tests when you only pass the basic tests so we encourage you to also modify the basic tests and add additional test cases and you do also don't need to worry that these somewhat influence the evaluation on the evaluation server because on the evaluation server we always use a fresh copy of the basic tests so feel free to write as many test cases as you want in your local setup or on the development server this will not influence the

Starting point is 00:43:49 evaluation on the evaluation server and also please when you implement this task please only modify the CPP files that start with SIMd delta There's no need to move functions around. There's no need to Create any new files and it's also documented as Commons where you need to provide your implementation in The different source files. And again, please also don't Modify cmac lists and the GitHub action, the YAML file as mentioned before. OK, are there any questions regarding the task,

Starting point is 00:44:34 regarding the general setup? This might be maybe a lot of information. And there's a little bit more in the task description. But I think everything will get a little bit more clear if you read the task description in detail. And as usual, if there's anything unclear, please approach us. Send Florian, me or Professor Rabel an email. Or also if it's related to the task,

Starting point is 00:45:02 there might also be other fellow students that have the same question. Feel free to use the forum that we also try to answer to as quickly as possible, usually within one or two days. And if there are no further questions, then thanks. The solution discussion will be on Tuesday, the 28th. This is one day after the submission deadline In this session, we will as I mentioned discuss the solution also present the new

Starting point is 00:45:33 Task task number two and very important. This will not be recorded So this session is recorded you can find it on teler task because it also contains some general information about the setup But in the next programming task sessions, we will provide solutions. So that we can also use such tasks in the future. We don't want anything on teletask. Otherwise, we cannot use the task anymore. That would be unfortunate for future iterations. And next session is tomorrow about SIMD2.

Starting point is 00:46:07 Thanks for your attention, and see you tomorrow.

Hardware-Conscious Data Processing (ST 2024) - tele-TASK - Task 1: SIMD Delta Scan

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Hardware-Conscious Data Processing (ST 2024) - tele-TASK - Task 1: SIMD Delta Scan

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.