Hardware-Conscious Data Processing (ST 2024) - tele-TASK - Introduction

Episode Date: April 9, 2024

...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome, everybody. We're going to start with hardware conscious data Processing summer term 2024. So who's new at hpi this Summer term? nobody? Okay. So i see a couple of familiar Faces. I don't think i have seen Everybody so far, but i also have very bad face and name memory so feel feel free to remind me and i'm already sorry if i didn't recognize you right away but it will get better over the course
Starting point is 00:00:36 of the semester so this is on uh yeah hardware consciousing, so this is a course on how we can use hardware most efficiently for doing data processing, so most notably database management tasks, so how can you do joins, store data, et cetera. And we're going to go down through the different kind of levels of hardware, so it's also a bit of a hardware architecture course and first a heads up you probably already know this right so this will be recorded and will be available on teletask if we have online sessions we don't record the videos of online sessions so feel free to show your video although i'm not really planning on doing any online sessions. So we're going to try to do everything here.
Starting point is 00:01:27 But if for some reason, some exception, random rare exception, you cannot attend the class in person, you can also always watch the video. And this means we have everything online. So the slides will be online, the videos will be online, we have Q&A forums etc. online. So there is no need to take any videos or screenshots or whatever and distribute them because everything is publicly available anyway. What I want to do today is give you a bit of an overview of our group, our curriculum, so let's say broaden the scope so you can see what else is there, what we're doing, then give you details on the course organization and do a bit of a motivation also today.
Starting point is 00:02:17 So well, everything you need to figure out if this course is for you and you want to stay with us, which I really hope. Okay, so who am I? I'm Tilman Rabl, professor for data engineering systems here at HBI since 2015. With that, I'm one of the older ones by now. And before that I was at TU Berlin, University of Toronto and University of Passau. And besides being a professor for data engineering systems, I'm also ombudsperson here at HBI. So whenever there's a problem with scientific integrity in whatever form, you can come to me or at least ask me how this can be dealt with or who to ask, whom to ask. And I'm director of the HBI data center, which is kind of nice because we have direct access to all of the hardware that we're going to present here. And for those of you who don't know, the data
Starting point is 00:03:20 center is basically, you cannot see it, but I can see it uh just the second part of this building and um with that uh if all goes well we'll also do one slot where you can actually go there and see the hardware for yourself although this is always a bit of a struggle to be honest because there's many regulations for you to be safe if you see the hardware, but we're going to really try to make that work. Okay, so, but of course, I'm not alone, right? So, this is my group. Two of the people here, you will, or three, actually, you will have more contact with. So, it's not really working well. This is Marcel. Marcel, stand up. Marcel will help with the lectures, and especially with the labs.
Starting point is 00:04:11 And Florian as well. Florian, please stand up. And thank you. And so these two guys, you will see whenever you have to program something, which you will have to do. And then we also have Martin, who is still in Irvine and will come back next week.
Starting point is 00:04:34 And you will have full exposure with Martin next week because I'm traveling next week. So and he will do parts of the lectures whenever I'm not around. And all three of them have great knowledge in database systems and programming. So they will be able to help you a lot along the course in many ways, often even more concretely and better than I will be able to. OK, so what do we do? What is data engineering systems? It's database systems in essence, right? So we do look at database systems, meaning how do we build database systems? How do we optimize database systems? How do we run database systems on modern hardware? These acronyms here, these are conferences that we typically publish on. So if you want to read some of our papers, you can go to VLDB, SIGMOD, or ICD. I don't like ICD that much. So go to VLDB and SIGMOD.
Starting point is 00:05:34 That's the best conferences for database research. We also do a lot of stream processing and real-time analytics. That's something that I brought from TU Berlin because they're super strong in that direction. Of course, machine learning is important, but I'm not a machine learning guy, so I'm really interested in how can we look at this from a system perspective. So there's many tasks that require data management, that require being efficient on hardware, and that's where I'm getting interested. So we do research in that.
Starting point is 00:06:06 And then I've always been involved with benchmarking. So this is something that we always also do. So we check how fast is the hardware, how fast are systems, we compare them and we build benchmarks. So in the end, our research approach often looks something like this. We start from a certain application scenario, build benchmarks, so build basically frameworks, tooling and guidelines how to benchmark applications and systems and then we do the actual benchmarking. So that's basically running
Starting point is 00:06:38 your things and checking the times based on the application. we see if systems are good enough or which system might be best. And then we might extend existing systems. We might exploit new hardware. If existing systems can really help or can really use the hardware, or we're just building completely new systems. And if any of this is interesting to you or during the course you figure out well I finally found what I was looking for I want to do work on this then feel free to reach out to us and we'll help you with doing some work with us in form of a project a student assistant position or a thesis. And this course is part of a greater curriculum that we somehow do in winter and summer
Starting point is 00:07:30 term and not everything done by us. So in the bachelor's there's database systems one and two, which mainly is taught by Felix Naumann, but every now and then I jump in there as well. Then we do a lecture series. I'm going to give you a bit more details on that in a bit. And then in the summer term, we have hardware conscious data processing as our big lecture. In the winter term, we have big data systems as our big lecture.
Starting point is 00:07:56 And then besides that, we have seminars on hardware, typically in the winter, machine learning systems seminars every semester basically. And new in the winter, machine learning systems, seminars every semester basically. And new in the winter term, we had the Big Data Lab, which is kind of a practical version of big data systems. And so, I mean, we're kind of trying to give you all the details on everything you need to know about database systems and data engineering systems.
Starting point is 00:08:25 This semester, we have hardware-conscious data processing, you already noticed. We have a lecture series on HBI research. So this is something that I'm trying this semester for the first time. So since we've been growing quite a bit, I thought it kind of might be interesting also for you to see what kind of research is there actually at HBI.
Starting point is 00:08:47 It's also interesting for me, but it might also be interesting for you. So there's a lecture series where every week we have another HBI professor presenting the work of the group in a lecture. So not like a pure research talk, but kind of an overview of what is the research about, like something specific and a bit of an overview of the group.
Starting point is 00:09:11 So that's going to be Tuesdays. Today will be the opening. So again, just course logistics, etc. And then from next week on, every week there will be a professor. And this is for bachelors and masters. So you are all masters for three credit points and we have other lecture series and other other lectures and courses where you can get another three credit points then machine learning systems will start tomorrow on at 1 30 and i don't know the exact. I think we're trying to do it in F downstairs,
Starting point is 00:09:47 but that's something you can find on the website. And besides that, we have bachelor projects, we have master projects. Every now and then this term we don't have one, but there's a bachelor project on the climate footprint of the data center. also something that's near and dear to my heart. So how can we make stuff more efficient also climate-wise, not just performance-wise. And we're always happy to guide you in some kind of projects. And one thing that might be interesting to you is the SIGMOD programming contest. This is tight already, so it already started. But there's still some time. So if you feel you don't have enough to do right now at the start of the semester, you might still sign up for this and check or build a hybrid index
Starting point is 00:10:36 for vector queries doing some KNN search, basically, on vector data sets. So stuff like this, every now and then something like this comes up. The SIGMOD programming contest is kind of nice, because it usually has a database management aspect or a database system aspect like this one. Every now and then, it's not. and then I'm not super promoting it but if you participate in this and you are part of the finalists then we will sponsor your trip to SIGMOD
Starting point is 00:11:13 and this year this would be in Santiago de Chile so that's also maybe a reason to participate in these things next year would would also be fun, right? It will be in Berlin, so it's cheap for us to send you there. But I don't know what's going to happen next year. Okay, so, but these things, if you're interested in this, reach out, check out the website. I actually have the website open somewhere to show you. So, if I find my mouse...... So this is basically... You can find all the details here and there's a price, money etc. So it should be fun. Okay, with that, if you want to do a research project, just a quick recipe for you.
Starting point is 00:12:05 Also, this is kind of like more of the meta level information. Whenever you do a research project, that might be a good idea to follow this kind of seven step recipe, because it helps to keep you kind of motivated and focused and not end up in a dead end somewhere along the path. So if you're doing a project which has something to do with performance in one way or a measurable outcome, then this is a good approach to start this. Also if you want to write a thesis. Even if you don't want to write it with us, I'm quite happy to give you some guidance in order not to fail. So this is always good. So of course, you need to do literature research.
Starting point is 00:12:50 So you start, depending on your previous knowledge. If you're already expert in a field, you should know the literature. Otherwise, go out there, check the most relevant conferences, check Google Scholar and Google, of course, in general, to find what's already there then you identify a research problem right so usually it shouldn't be just an idea so many people start with an idea and then later on
Starting point is 00:13:15 figure out it actually doesn't work or it's not neat idea but it doesn't help you in any way so try to find a problem that you can solve. Often we can find a problem based on idea just by framing it differently. So rather than saying, oh, I can use, I don't know, this hardware for something, I can frame the problem in the sense, what would be the performance if I use this hardware? So then I have a research problem.
Starting point is 00:13:47 I can actually identify the problem. If I think, oh, using this hardware will be great, and now I'm trying it, then later I find out, oh, it doesn't really work. It's not really faster because there was no problem. It was just a neat idea. Then my result is negative. Figuring out if it works
Starting point is 00:14:06 at all or what the performance is, you can always have a positive outcome. So the framing is different, your motivation will be different if you do this. If you identify a research problem, you can describe a novel solution, and then you perform back of the envelope calculations. So this is like just a very rough estimate what the performance will be. So will this be good or not? Typically, you can do this in a day or even less. You can figure out what is the performance of my setup. So even if you try to do a new startup here at HBI with eSchool or something,
Starting point is 00:14:44 and you come up with a needs microservice whatever application server something something architecture you can do a quick back of the envelope calculation if this will be fast enough or not so just by basically knowing some basic numbers of how fast each individual service can respond what are the latencies in between, you can figure out what should be the performance in the worst case, in the best case. And with this, you can figure out if your idea is good or if it's stupid. And this is also called a bullshit filter. And this is basically just to figure out. And if you don't do this, you build
Starting point is 00:15:24 everything, you do experiments, you figure out everything's way too figure out and if you don't do this you build everything you do experiments you figure out everything's way too slow and then you work your way backwards and you already have so much invested that you cannot really change everything from the get-go doing these simple calculations often helps a lot and then of course we have to implement we perform the real experiments and then you write up the report. If you want to publish this, you have to endure a couple of revision cycles, even in your thesis, right? Your thesis is not a single shot, write down the whole text,
Starting point is 00:15:56 but it's typically, even if yourself, you are revising the thesis, you should go through many cycles. This is the only way to improve the text. Nobody can write a perfect text from the get-go. Just write it down. You really have to do this incrementally and revise, revise, revise. Okay, questions so far? Perfect.
Starting point is 00:16:21 Then we're going to get right into the course logistics. So, course hours and structure. The lecture and labs are here, Tuesdays and Wednesdays. So, you found this, so you know this. This is probably mainly for people online or watching the video. The labs will typically have on Tuesdays, every now and then, there's not that many of them, but this will be in the same course. These will mostly not be recorded, just because we're going to present you the solutions and we want to reuse some
Starting point is 00:16:59 of the tasks, so that makes sense for us not to record it. But if we don't record, we typically provide a Zoom link if people cannot join. We might do this proactively or reactively. So I don't fully know yet. So meaning, and this is always the case, right? If there's something where you need some help, you cannot join, something is missing on the website or something, just reach out. So we're trying to always accommodate everybody.
Starting point is 00:17:29 If you're sick, you cannot perform the task in time, let us know in advance. So it doesn't make sense to send us an email after the deadline. Well, unfortunately, I don't know, I had to take a vacation during the last couple of weeks, couldn't write an email. This is typically the things where we cannot really help. But if you come and say, something happened, I cannot finish, I don't know, I have to deal with something else, can we move the deadline? We're always trying to accommodate that, if possible, if the reason is plausible. We're trying to always have this here. There might be a case where we're using an online version of videos, so if there is another pandemic
Starting point is 00:18:19 we'll go fully online. I hope it's not. Otherwise, I'm hoping we'll have everything here. We always have all of the details in Moodle. So, I hope all of you already found our Moodle page. So, let me show you this again. So, this is the Moodle page, right? Number 740, the course. And in here, we put all the slides. We have all the programming task details, the grading, prerequisites, policies, discussion forums. And you can see already there's lots of stuff that's still hidden. We're updating the schedule, I I hope did we update the schedule
Starting point is 00:19:08 perfectly updated the schedule we have an example coding task I'm also come to this we have the slides for the day for example you can see there's all everything else is already here as well okay I'm available by email always so you can always send me an email if you have questions. If things are hectic, right now I get lots and lots of emails because of the semester start and some other conferences that I'm involved in. So every now and then emails get stuck in my stack, somewhere lower down. That means feel free, if something is urgent,
Starting point is 00:19:49 feel free to remind me. If I see an email is urgent, I try to answer right away. If you don't get an answer within three days, probably it went down the stack a bit, then it might take a week or two until you get a reply. If everything, like stuff gets locked or something, then you can always also send me a second email. I'm almost never, almost never upset about this.
Starting point is 00:20:14 And I also have an open door policy. So whenever my door is open, which is usually the case, you can just come in and ask your question, right? This is something we can answer quickly. I'm in building F in the second floor. Just come in, ask your question, and we'll solve this right away. Unless I'm in a meeting which is super secret or I'm in a call or something,
Starting point is 00:20:37 then I might tell you, well, give me five minutes or give me 10 minutes or something or come back at that time. And again, we'll solve it. If this is something where we need more time, half an hour, well, let's say starting from 15 minutes onwards, then let's book an appointment, right? And this is always something that we can do.
Starting point is 00:20:56 Usually smaller appointments I can fairly flexibly make. If it's something long, like an hour hour it will need some time okay good so course contents we have quite a bit of stuff to cover in the course this is continuously evolving this is the third iteration of this course and every now and then I'm trying to bring in some new stuff. We're trying to change in some software kicking out some old stuff This is basically We're working our way through and you can also see this in the in the timeline later So what we're going to start after going through the introduction and some performance analysis
Starting point is 00:21:39 We'll start working our way through the CPU, right? So this is basically the CPU architecture with the individual cores, multiple cores, different levels of caches, and then the DRAM. So this is the stuff that will take us some time first going through this and thinking about the implications this architecture has for data processing. Then we'll walk out of core onto multiple cores, and then out of the multiple cores into the peripherals. So basically on disk, network, GPUs, and then later other kind of accelerators like FPGAs, right?
Starting point is 00:22:23 And in between there's some other things, some goodies that we sneak in, something like CXL, Compute Express Link, a new standard for PCI Express, a profiling session where we help you how to profile your code and really get an understanding of the performance of your code, so where does time go in there,
Starting point is 00:22:43 and things like that. So that's the rough outline. And learning goals is like a couple of different things all packed into one because I think this is basically like you learn a couple of different things by doing this. So the one thing, the major overarching motivation for this is efficient data processing on current cpu architectures current memory architectures in parallel distributed and
Starting point is 00:23:14 on accelerators so this is basically the aim of the course this is why we're doing this is so you know how to do this and this actually makes sense because if you can do that, you will find a nice job in industry or you can do great research in academia, whatever. So with this kind of knowledge, you have a deep understanding of processors, of data management and data processing, and a lot of people will be happy to work with you. You will have to experience this yourself. So this is something that's really important for me.
Starting point is 00:23:55 So you have to program yourself. Major part of your work in the course is programming. So it's actually the biggest part of the things that you have to do, except for listen to me, which might also be a heavy task every now and then. But the things where you actively have to do something is, or more actively is the programming. And I think that's really important. If you go through this, my experience with the people who went through the course is they're really good to work with, right? So
Starting point is 00:24:31 people who understand architectures, who understand data structures on a low level and make them performant, they have an easy time doing their thesis with us, doing their thesis with other people as well, and other groups. So, and as a side thing, what you also see is, or what you also learn is computer architecture, because we have to understand computer architecture in order to make our programs efficient. So mainly CPU and memory,
Starting point is 00:25:04 so this will take us a lot of time, but then also accelerators, which is also important if, God forbid, you go into AI and want to do something there, right? So then this is also good to know, just to know how the data has to flow, et cetera, and how the architecture and the hardware works. And then, of course, efficient data processing, etc. and how the architecture and the hardware works.
Starting point is 00:25:27 And then, of course, efficient data processing, so query processing and how to utilize the hardware, and efficient programming. So this is not so much what I will teach you, but this is what you will learn in the labs and where Marcel and Florian will help you to make your programs run fast. And you will self, I mean, there's also a lot of self teaching, so how to make your programs
Starting point is 00:25:51 hardware conscious and fast. And this is not super hard, it's just something that you have to practice, and we'll give you some space and time to do so. So with this kind of overview of the course, so we're in the introduction, we're gonna do performance management. Next week I'm not here, so Martin will do database basics,
Starting point is 00:26:16 kind of a recap for those of you who haven't heard database systems 2. And CPU basics then, so this is the first step on or the first overview of CPU architecture. This will take us probably two sessions, depending on how fast Martin is. I will continue where he left off. Then we'll talk about instruction execution on a CPU. So how do the instructions actually get encoded and decoded and then executed on a core on the multiple functional units. We'll then take a bit of a deeper dive on one particular part of unit on the CPU, which is the SIMD units. So the vector processing.
Starting point is 00:27:00 Then, unfortunately, on the 1st of May, there is no class because it's a holiday. We'll also enjoy that. Then you will have the first task, and that one will be on SIMD. Then another SIMD class. Then we'll talk about the execution model. So how do we execute queries efficiently? So there's different ways how we can do this. There's also different, let's say, religions on how we can do this. There's also different, let's say,
Starting point is 00:27:25 religions on how we can do this and people fighting against each other, what would be the best. So we'll talk about this. Then data structures, how to make them efficient, the profiling that I promised. Marcel will tell you how to do that. Then multi-core execution, the second task that I wrote to be discussed or to be determined but we're already pretty sure that this will be query execution. So query compilation actually. So one specific kind of execution model. And then it's, we're slowly walking out.
Starting point is 00:28:02 So you can see this is all basically on a single CPU, or at least on the CPU with multi-core with, yeah, we're still single CPU. Then we're working our way out of the single CPU, going to multiple CPUs, multiple sockets, then out of the sockets to the storage. While we're at storage, we're looking at the PCI Express bus. So how is the storage connected to the CPU? And there is a new standard for that, which is Compute Express Link, where everybody's super excited,
Starting point is 00:28:35 but there's no good hardware yet. So we have some prototypes, but nothing productive yet. Then you'll learn how to program a buffer manager, then talk about networking, and then finally we're completely going out of the single server into the core CPU architecture into GPUs, more networking, FPGAs, and we're going to try to have an industry speaker, so somebody who's working on hardware in industry, data processing on hardware in industry, and then have a summary and a data center tour. And as you can see, there's no exam, so everything will be determined
Starting point is 00:29:19 in the programming exercises. Okay, so with that, that's already kind of the grading in a nutshell. It's very easy, at least from a high-level point of view. We have four graded programming tasks. Each of them is 25% of the final grade, and you must pass all. So there's basic tests, there's advanced tests, and then there's basic performance and there's advanced performance. And you basically have to test all the basic tests which are known, right? And the advanced tests are hidden, right? So you know all the basic tests. We have some advanced tests.
Starting point is 00:30:02 They're not super hard or something, not super unexpected. But basically, we don't want you to hard code all of the tests. So that's why we have some advanced tests that are not public. And if you pass all the basic tests and some advanced tests, you can already pass the course. Then if your performance is good enough so it's fast like we have a baseline solution and we have an advanced solution if you pass the baseline solution and
Starting point is 00:30:31 some of the basic tests and you can pass the course if you pass the basic and the advanced tests and some of the baseline performance then you will go yeah basically you you will get into better and better ranks and we also have a leaderboard and if you are leading i don't remember we have some marcel will go through the details regarding this in the in the task description so there's also some bonus points for the fastest solutions meaning uh depending on the number of participants essentially in the end. We'll basically see if you are in the fastest solutions, so this is not required for a perfect grade, but it's basically making it easier. So if you have like one of the
Starting point is 00:31:19 solutions that's super fast, you can get some extra points. Otherwise, you just need good solutions, and you can also get perfect grade. In order to make sure that you've done the stuff yourself, we'll have an individual presentation, meaning one person presenting to us each of the tasks. So otherwise, around each person will present one of the tasks. So otherwise around each person will present one of the tasks. So everybody will basically get a personal slot where we'll just discuss the solution. We'll also check all of the solutions on plagiarism. So basically
Starting point is 00:31:59 just check with us not only syntax check, but also like basic program structure checker in order to make sure that you're not just copied somebody else's solution. And there's a couple of more details. I think I have some on my slides, but you also get some more details on how to behave during the programming. So this makes it kind of easier towards the end because you don't have to do like study for a program. You just have to continuously work on this. There's a question.
Starting point is 00:32:31 When the tests are in, will we then push to GitHub or something? Will we see the points that we get on the hidden tests or will we only see it in the end? So we have everything automated, so when you push a commit, then the CI will execute the base test, and when the base test is passed, then the advanced test will be executed. And if you pass this advanced test stage,
Starting point is 00:32:57 then you get the points for this stage of advanced tests. And if some tests fail, we also provide you with some additional information about what might be wrong in your code. In the past, this worked out quite well. But if you're struggling a lot, then also please reach out to us. Then we might reveal some more information. But we think with the current information that you get from failing tests, you can figure out what's wrong and how to improve the code. Yeah.
Starting point is 00:33:28 So we have everything automated. It's also using GitHub a lot. And I mean, you're not supposed to just try an error or to fail an error and just repeat, repeat, repeat until you find the right result or figure out what might be our test case. But if you're struggling, again, feel free to reach out. There was another question on the top.
Starting point is 00:33:57 Did you have the same question or a different question? No question? It was not? OK, then we have a question here. But this means you can, like, if you're not, I don't know, happy with our results, No question? It was not? Okay. Then we have a question here. This means you can, like, if you're not happy with our Results, we can reupload the notes. Yes.
Starting point is 00:34:13 So you can, so the question is, can you upload multiple Solutions? yes, you can upload many Solutions, and also during this discussion, we will pick the one That you want to discuss. So meaning if you, we will pick the one that you want to discuss. So meaning if you, in the end, do some nasty hacks for some performance, which you don't really want to, I don't know, which makes the code somehow cluttered or something, then we can also go back.
Starting point is 00:34:36 Or the last version was not the nicest version. Then we can go back. OK, so now there's questions up here, left and then right. Will we present in front of the class or just in front of you, two, three, four people? Yeah, the presentation that you do will be individual in my office. So it's not in front of the class because we need to basically have multiple people present the same task. And that then means we'll do this individually.
Starting point is 00:35:09 So do we get the task like a week before and then we can work on it at home and then the session is just for presenting? Or what do we do in the session? Or do you discuss the ways to solve the task? So what do we do in the session? So in the sessions we introduce the task. We'll also introduce the solutions. And we'll in the first session we'll introduce the overall
Starting point is 00:35:36 Setup, right? so how everything works. And then in the individual session it's really just Discuss your individual solution. So then we'll ask questions about your solution. Why did you program this this way? How else could you have done this? And we'll assign this randomly, meaning you will randomly be chosen for one of the tasks, not multiple. But if we feel that you couldn't really answer
Starting point is 00:36:09 the task properly, so if we feel, okay, this was somehow shaky, we're not really sure, if you did this yourself, we're gonna ask you for another task. And either, so if we're starting, say, with the SIMD binary tree, and then we notice this didn't work out, you'll get another appointment later on. If you're assigned to the log-free skip list, and we figured out, okay, this didn't work out really well, then we're going to open up another of your tasks and ask questions on that one as well. And of course, that's basically the only way besides not handing in a functioning program. That's the other way how you can fail the course.
Starting point is 00:36:55 That should be a low bar, but if you really cannot explain anything of your course, of your program, then we'll have to fail you. Is the presentation going to influence the grading? No. Okay. It's just a check. So there was another question here. So sometimes there are,
Starting point is 00:37:16 especially if you want to make your code fast, there are some true solutions that you find on Stack Overflow or something, and should we use some of those approaches that were not taught in the class, but we found them, I don't know, on Stack Overflow? Should we comment on the code? Yes.
Starting point is 00:37:36 So if you use something, some other help, write comments, right? So we can basically see this. You shouldn't use GitHub co-pilot. I think that was one of the things that we didn't want because that kind of produces false positives in our automatic checks. What was, I mean, Marcel, you will do the details there.
Starting point is 00:38:05 So, about the tasks, what is okay and what is not okay. I mean, you should program yourself. You should really try hard. Try the first. We have a task zero that gives you an overview. I think I have this on my next slide. It's a concurrent linked list. And this will help you set up the environment
Starting point is 00:38:30 and also see if the course is for you, right? I mean, it's not super hard. You don't have to be a C++ whiz to do the course. But you will have to improve your C++ to a certain degree unless you're already really good while at this course. But this doesn't hurt, right? So this is always, as I said, it's a very good skill to have. It will help you in your life later on if you do this,
Starting point is 00:38:58 if you do this yourself. Okay. Further questions on the tasks? Not yet? You can also come back to that, right? So the task one is also already completely prepared. So there, but it's not completely up to date. We still have the binary tree, right? We said, or is it the SIMD scan? It's the simd scan. So then this is correct. So here we basically Implement the simd scan. And you'll find how to use
Starting point is 00:39:35 In-memory column compression. So how can we compress data in Memory, how can we do things or execute queries in a vectorized execution model and navigate some SIMD code, which is not super easy. I mean, it's also not super hard, but it's just a lot of different ways in which you can program this. And you'll write some SIMD code. So this will start in early May. So it's still some time, right?
Starting point is 00:40:04 So you can still relax, sit back and enjoy. And once this has started, then the real fun actually starts. There's a lot of literature. You can check out this. There's also different things every now and then that I put in on the lectures, which might be interesting. If you're curious about computer organization, then I think one book that I really can recommend is the Structured Computer Organization and the Computer Architecture.
Starting point is 00:40:35 They basically cover the same things, whatever approach and writing style you prefer. But this is something very worthwhile to know. So how a computer internally works, and this is discussed in detail in both of these. We also have Hasso Plattner's book here, which is an interesting read. We have lots of performance analysis, and every now and then we add more books in the lecture, as I said. And most of these books are available in our library.
Starting point is 00:41:08 So we have a shelf where we have a couple of books. So if you want to grab a physical copy of the book, at least one we have for each of these books, and you can for sure have them for a while. And some of them are also available, or I guess most of them are available at the university library. OK, so I'm refining my slides until the day or hour before the lecture, typically. And then I load them up into Moodle.
Starting point is 00:41:40 Sometimes I forget. Then they are in Moodle right after the lecture or whenever I finish lunch, etc. Well, this is not a new course anymore, but you can still be patient, right? So if you find out something that didn't work out that well, then just let me know. If you find errors in the slides, if you think we can correct something, send me an email. I'm always happy to fix the slides. I might not update this in Moodle, but I'm generally very happy to have an improved slide
Starting point is 00:42:13 deck for the next iteration. If it's a major error, I'm also very happy to update the slides in Moodle. And of course, everything is also available from previous years. So there's recordings from previous years. There's the slides from previous years. I'm always trying to somehow refine it, but the changes are not that major. I already said this, right?
Starting point is 00:42:39 So Martin, he's not here this week. Next week, from next week on, he will jump around here and tell you about computer architecture or CPU architecture and database basics. And Marcel and Florian will help you with the programming. Code of conduct, very important. You can always ask questions. You can also always discuss with each other you can also discuss the programming exercises with each other and you should right so it's very good that you
Starting point is 00:43:13 have a class here that we have a setting where we can actually meet and talk to each other you can help each other you might even be able to look at each other's code, but please don't give it like share the code completely and just reuse the code, because if you don't submit your homework individually, we'll have to fail you for the course, right? So as soon as you share your solutions and and somebody copies your solution, or we find any other form of dishonesty, then we'll fail you for the course. And this is important.
Starting point is 00:43:55 Didn't happen in this course yet. But I had other courses where I had people who basically just reused other people's exercise. It's not true. We actually had people who just copied stuff. So people pushed their code on GitHub, somebody else found the code and reused it. And then of course we've,
Starting point is 00:44:14 I mean, there's lots of discussions. Oh, I didn't know and whatever. So don't do that, right? So just be smart and program yourself. Then everything is good. Also, we're working on the hardware here. We're giving you access to sometimes expensive hardware. So please don't break anything.
Starting point is 00:44:35 So it might be easy to break out of our environment. This is not a security course. It's a performance course. So if you try to break our stuff, again, we might have to fail you. If it doesn't happen, I mean, typically nothing should happen. But just please do not try to escape the Docker environment and do not try to break the hardware. This would not be good. And of course, in communication, so we have forums, we have emails, etc. Try to be nice. I'm trying always to be nice, right?
Starting point is 00:45:12 So this is, for me, this is kind of a happy place. I'm doing this because I like teaching, I like research, and I want this to continue. So this means I need happy and nice communication. Doesn't always have to be happy, right? Sometimes things are problematic. It's still good to be nice, right? And polite in emails. I'm trying to do that. Please also try to do that please also try to do that not only to me and to my colleagues but also with each other it just makes uh social life at hbi and everywhere much easier and much nicer and i think this is like a general rule i know many people think it's okay to just like write
Starting point is 00:46:02 very short emails or something but just being nice and polite always helps, makes everybody more happy. So this, and in general, you should always treat everybody with respect and consideration. This is even more important in an online setup because there it's often not easy to read what somebody else thought or what the feeling was,
Starting point is 00:46:25 the intention in which something was written. So in that sense, always try to be extra polite in a personal setup, the misunderstanding or the way we can misunderstand each other is not as big, but it's still given, especially in an intercultural setup that we're increasingly are so still try to be respectful and considerate to everybody and this course and HBI in general should be a safe space for everybody right so this is
Starting point is 00:46:58 basically it's not just a course for a certain crowd of people at HBI that are interested in something. But it should be open to everybody. Everybody should be happy to come to this place and come to this lecture. And if there is something that you don't feel is good for you, then feel free to let me know or let some of my colleagues know or somehow get some information to us
Starting point is 00:47:24 so we can fix this and change it so you feel good about the course. Of course there's official course registration which we'll also get so we'll get a list of everybody who signed up at a certain point but please also sign up in Moodle. So we are aware that you exist. And we can give you one of the GitHub accounts, et cetera. So you can, or we need your GitHub account. But then we can give you access to our setup. So you can do the programming tasks.
Starting point is 00:47:59 You find all slides, all resources, et cetera, on Moodle. Please use the forum. Most questions, especially for the setup and everything other people also have right um so if there's something that you don't feel sure about don't be shy just write in the course i'm always happy if i see questions in the forum and i'm also happily uh or i'm gonna be happy to tell tell Marcel or Florian to answer to you as quickly as possible and every now and then I like if it's part of the lecture then I'm also trying to answer quickly. Quickly means within a day or so so I'm trying to not do this at night too much but we'll try to give you answers as soon as possible. Of course, for the programming, I know it always happens, right?
Starting point is 00:48:47 Most of the questions come all the way at the end, but an easy way for you to make your life much easier is even if you do the work towards the end, just check out everything at the very beginning, right? As soon as we open the task, just check if you can download it, if you can basically get everything set up to a certain degree. Don't invest hours and hours if you
Starting point is 00:49:14 have some different kind of schedule. Let's just invest half an hour or so to check if you can access everything. Because if you do this last minute, probably it's going to be very stressful for you. It's going to be stressful for us. And so this is one of my major advices. Even if you like to work close towards the deadlines, which
Starting point is 00:49:35 I also do, I still check everything right from the beginning. So I have all the material. I can actually do the work to close towards the deadline and not then in the end basically fail just because I didn't have to set up. OK, so one other thing that I want to always make you aware. So if you've ever been in a course with me,
Starting point is 00:50:01 this is also something that's important. So you as programmers, engineers, and maybe data scientists, you have a lot of responsibility, right? So essentially, you can program many systems, or let's say, very central systems that other people's lives depend on, at least to a certain degree, or that influence other people's lives. And in that essence, make sure that what you're doing is safe and is at least to some extent morally good. So most stuff is neutral.
Starting point is 00:50:41 Most systems that we build are somewhat neutral, but they might also not be safe. So not everything, like most people are not malicious, they don't build anything with a bad intent, but people build things and don't think about the consequences. So whenever you build a system that deals with people, with people's data etc., make sure that we were thoughtful and careful and responsible responsible with the system and make people aware of the limitations right you can easily make mistakes or other people can easily miss make mistakes misuse or manipulate your system and then other people get hurt right so my my is, if you're building a kettle,
Starting point is 00:51:27 that's kind of safe to make some hot water. If you build it at home yourself using this, you might be okay making water like this. But if somebody else tries this, they probably electrocute themselves. So make sure that your systems are safe and that they're used in a safe way. And in general, I also recommend to think about the consequences later in life and whatever you're
Starting point is 00:51:54 working on. Okay. Before we go into the motivation, a few contact people. So I already said I'm one of the ombudspersons at HBI. Another one is Holger Kahl. And if there is any problem with good scientific conduct, so meaning plagiarism, misuse of data, et cetera, then feel free to reach out to us. We'll be happy to help you. Somebody misuses your data or misuses some other person's
Starting point is 00:52:33 text, or you're not sure, is this what I'm doing here? Is this plagiarism or not? Can I use ChatGPT to write my thesis? Things like that you can ask us. So not everything, I can give you a full-blown answer right away, but at least I know where to look. And then whenever we have equality issues, there's Gleichstellungsbeauftragte at HBI.
Starting point is 00:53:00 So you can write them an email if you feel something, there's some problem with equality. We also now have a diversity manager. I forgot to put that on the slide. And also very important, and I think very, very good that we have this. There's a psychological counseling hotline. So whenever you feel super stressed out, of course, if you're super stressed out about the course feel free to reach out to me right so we'll try to help you make sure that this is not yeah not getting too much for you in general I mean this is just a course right so if you don't do it nothing happens in the
Starting point is 00:53:42 end there's other courses that you can take or other repetitions so that's always like don't get too stressed out with the course we're also trying to make the like not too stressful for you right it should not be way too much work but it's called programming and the amount of work that you have to put into of course depends on how much you program before if you really didn't program a lot before, this will be harder than if you did this regularly in every course that you do. So anyway, if you feel stressed out, this is one thing that you can do,
Starting point is 00:54:18 or have any other kind of issues where psychological counseling might help. And I recommend to do this if you don't believe HBI can do something safe in that thing then there's also the nightline Potsdam so this is also something that you can call if you need to talk to somebody so this is out of like students out of University of Potsdam that provide this service. Okay, so under this, we're gonna do a five minute break, but there's a question first before we do the break.
Starting point is 00:54:58 Yes. can we step back from the course until the first assignment or until the last assignment? I think we said the first assignment or after the first assignment. We have the date on the website. But Marcel? 17th of June. 17th of June. 17th of June. Yeah.
Starting point is 00:55:20 So we'll stick to that. 17th of June, basically. I mean, we can also, this is also something we can stick to that 17th of june basically i mean we we can also this is also something we can negotiate to some degree if we i mean at a certain point we'll just say this is the date and that's it um again if there's something uh where you feel like there's a certain circumstance or something just reach out we'll always try to help you with that. Other questions? No other questions?
Starting point is 00:55:48 Then five minute break, four minute break, and then we'll talk a bit about motivation. Why do we do this? So some people already know this, right? So I like to do these short breaks in between. Usually I have one small break, often a bit too late, because I somehow need too much time in the beginning. But somewhere in the middle of the lecture,
Starting point is 00:56:11 I'm trying to have a short break just for some regeneration, let's say, for everybody. So with this, let me give you a quick motivation on why we do this. We're going to speed through this a bit, because you will also hear a lot of this again. But this gives you yet another overview of the topics that you will see in the course, and maybe, hopefully,
Starting point is 00:56:38 motivates you to stay in the course. OK, so with this, it the like one of the major questions why we need to do this this kind of course right so you've already heard database management systems one or database systems one and two hopefully or at least some of you if not you will get the recap next week and there's one major thing to know about database management systems or classic database management systems. They were all built around one certain performance gap.
Starting point is 00:57:17 So essentially, for a long time, basically, database architecture was just built around this access time gap in between main memory and hard disk. So for a long time there was nothing but spinning disks and then on top main memory and you had a 10 to the power of 5 times latency and performance difference in between the two. Meaning that accessing an individual data item in main memory is 100,000 times faster than accessing it on disk. And that again means if you have to,
Starting point is 00:57:58 like if your data is large enough and it needs to go to disk, then everything is slow, right? Everything basically, the CPU, everything just waits for these disk accesses. And that means the overall architecture of a database system, of a classical database system, is just designed to make this performant, to make these individual accesses as worthwhile as possible.
Starting point is 00:58:25 So that's why we have a buffer pool. That's why we have a row oriented layout. That's why we have a tuple in time processing in the upper layers, because it doesn't matter. Everything that happens up here, if this is so slow, right? Everything that happens up here is negligible. So we don't have to deal with any performance up here because this is where all the time goes.
Starting point is 00:58:52 And this is basically reflected in many database architectures. And people looked at this and saw, well, what if we have cheap RAM? So this is also what Hasso Platten at a certain point said, right? So what if we put all of our data, the complete data set in memory, in HANA? So SAP HANA is one of these designs. Well, all of a sudden, the architecture needs to change because before, everything was designed just for this disk access.
Starting point is 00:59:27 But now all of the infrastructure, all of the buffer management, et cetera, basically is unnecessary. Or it's basically additional overhead. So if you look at, there was a paper a while ago that basically analyzed the different kind of useful work and then the different kind of overheads. And they basically saw that less than 10% of the work that the database engine does
Starting point is 00:59:54 is really useful work. And the rest is just management around this basically mostly the disk bottleneck. So traditional database architectures cannot utilize in-memory setups. And by this, they also cannot utilize modern hardware. And modern hardware meaning just basic CPUs that we have today. So a classic database engine will always just
Starting point is 01:00:22 work on the disk, do a lot of disk access, et cetera. And everything up there in the CPU will be slow. The CPU is basically underutilized. And with terabytes of main memory, most databases can actually fit into main memory. So if you think about student databases, et cetera, well, how many? We have 1,000 students at HBI around about.
Starting point is 01:00:53 Each of them have a, well, if we just think about grades, et cetera, it's not that much data. So this will be in the megabyte range, et cetera, the active amount of data. This doesn't even have to be in the megabyte range, etc., the active amount of data. This doesn't even have to be in RAM, right? So this basically fits into caches. So that means we can be super fast when dealing with this kind of data. But if every access always goes to disk and maybe has like a couple of transactions, all
Starting point is 01:01:20 of a sudden, even that might be slow. Okay, so and today, we don't just have, like, much larger RAM. We also have multi-core CPUs. And this means we have high parallelism for task parallelism. So, multiple threads can perform different tasks at the same time, meaning we can have different kind of management. We can have management tasks. We can have different parts of a query be run in parallel, and we have data parallelism.
Starting point is 01:01:51 We can have the same instructions, the same type of work on multiple cores, but also within a single core on multiple data items in vector units. But in order to use this effectively, we actually have to reprogram everything. We have to design our program differently. In a classical database, we would have a single thread for a single query, even for a single connection. There might be many queries, it would be a single thread, and that will always underutilize a modern CPU. So on a modern CPU, you might have 100 cores.
Starting point is 01:02:28 If you have just one task, then one core will be busy. Everything else does nothing. And you have much higher memory bandwidth as well. So we can have not even 51 gigabytes per CPU is even low. We can be in the hundreds of GB per second per CPU. With DDR5 we have now with multiple channels. And then we might not just have a single CPU, but we might have multiple CPU, multiple processors, right? So each processor, multiple CPUs, multiple cores,
Starting point is 01:03:06 then two, four, or eight CPUs on a single motherboard, which then also all have memory connected and each processor can access all of the memory across the different CPUs, but with different latencies. So again, we have to think about where do we put data and how do we access the data in order to be efficient across. So this is also reflected in the processor trends. A lot of this stuff you will see again. And I firmly believe in repetition to just remember
Starting point is 01:03:38 stuff. So please bear with me if you've got it the first time and you can remember everything in the first time. But I think every now and then it makes sense to look at this again, also from slightly different angles. But as a first view on this, you can basically see how processors evolve right now. So until the 2000s, mid-2000s, everything was single-threaded. Essentially, we had a single core, single CPU, single core, but we continuously increased the frequency. And that basically means that makes everything easy.
Starting point is 01:04:17 So we have a single core that just gets faster and faster. This is a logarithmic scale here on the side. So this means you just wait a few years or two years or something, and you get double the speed or 10 times the speed. You don't have to do anything with your program. So you have your original program. You wait a year.
Starting point is 01:04:41 Your new processor will be faster. The same program will just run faster. But this stopped in the mid-2000s, from when on we basically, like, the frequency already got so high that it didn't make sense to make it much higher, just because of electricity and cooling issues. Higher frequency means more power, means basically we have to cool more, also means more power leakage, meaning that we cannot get the chip as efficient. So more density, like getting the chips smaller,
Starting point is 01:05:20 and at the same time heating up, you basically have something like a small stove there. So right now, this is basically like your cooking plate. So this is what a CPU today is. You have your cooking plate on maximum heat, and at the same time, you cool it down so that it keeps at 30 or 60 degrees, something like this. That's kind of what you're doing with a processor today. So it's not super, super efficient somehow. And if we put more power into this,
Starting point is 01:05:53 the problem gets even more. So we need to cool more. Everything gets less efficient just for a little bit more performance. So with that, we basically, or CPU manufacturers have started to add more cores. And this is how we can still get performance. But that means we cannot just use our program as is. We have to parallelize.
Starting point is 01:06:14 We have to basically split up our task into smaller tasks in order to spread it out across multiple cores. And then even multiple CPUs. Because we can see, basically, the single thread performance also doesn't go up that much anymore. It still goes up, and that's because of architectural changes. So we're basically changing the CPU in certain ways so that a single thread still can be faster,
Starting point is 01:06:39 even though the frequency is not faster anymore. But that also means, of course, we need to use these new techniques or the new parts on the CPU that make things faster, which, again, means reprogramming things. And we're still in the range, at least here, where the number of transistors goes up, right? But also this is a problem.
Starting point is 01:07:02 So we're reaching physical limits. So eventually this will also not just remember the GPUs and CPUs are just getting bigger and bigger right now. And at a certain point, also, there's a physical limit. So right now, a modern GPU will be like this size or a CPU. I mean, eventually, you're on a basically what they're called, like tile size.
Starting point is 01:07:28 So basically, this is the maximum size that a processor physically can have, because that's the amount of silicium you can have, right? So the signal blocks. So once you're there, and this is already super error prone. You will get like on this level, you have many errors just in manufacturing. So this is physical limits. At a certain point, you just have to do things differently.
Starting point is 01:07:51 And that means rethinking the programming, rethinking also the hardware. And that needs to be co-designed. And what we have then, like additional trends, is fast networks. So in the past, network was the slowest part. Disk was already slow, but network was much slower. Now we have things like InfiniBand and remote direct memory access, and then CXL to some degree, which is slightly different, but goes into the same range.
Starting point is 01:08:22 And this is close to memory speed. So now it makes sense to some degree or in certain setups to write to a different node rather than writing to disk because the memory or the network can be as fast as the memory, or at least have the same bandwidth. Of course, not the same latency, but throughput-wise we can get the same bandwidth. Of course not the same latency, but throughput wise we can get the same kind of speed. We have different kind of processors and this is the sort of
Starting point is 01:08:53 specialization that we'll also look into. So rather than just making the chips bigger and bigger and more coarse, we can also redesign them. And the first thing that people, or one of the first things that people started is graphic processing units. They figured out, oh, it makes sense to do a specialized processor for graphics because graphics basically is very repetitive computation, meaning you do lots of number crunching
Starting point is 01:09:20 on many pixels, essentially. So we do the same thing on many cores. We just split this up and all of a sudden we can use this parallelism with very simple programs in essence. So a CPU can do a wide range of programs and can easily switch contexts and do basically different tasks in parallel.
Starting point is 01:09:46 A GPU is good if you do the same task in parallel across many cores and this is basically what modern gpus also we can also use them for data processing or just number crunching not besides graphics processing so now they're basically general purpose gpus and you you can see there's a neat comparison. So this is basically if you look at a CPU with, say, four cores, then you have a large cache, you have control infrastructure, and you have your individual arithmetical logical units. And on the GPU, you will have many of this and very little kind of control infrastructure, less cache.
Starting point is 01:10:27 So it's really just you do everything in parallel. All of these will basically execute. I mean, on a modern GPU, you can somehow split it up. But for simplicity reasons, you can think all of these will do the same kind of computation, maybe on different data items, but completely in parallel in lockstep. And with this, you get much higher throughput.
Starting point is 01:10:51 But you're not as flexible. You cannot switch contexts as easily. So that will basically take a long time. But we can use this. We can utilize this for data processing as well. If we want to be more specialized, more tightly catered to the exact program that we want, we can use reprogrammable hardware.
Starting point is 01:11:14 So FPGAs, or field programmable gate arrays, are specific kind of chips or processors that are constructed of logical units or logical blocks that you can reprogram where you can then basically program logical gates so and or gates etc in the simplest form and directly basically wire up from a programming point of view, wire up your exact program. And this is more like electrical engineering, right? So you basically create a path through the CPU or this processor that represents your program.
Starting point is 01:12:00 And that then makes it, of course, very fast. It's a very different type of programming. Of course, there's some abstractions that make it easier. But it's much faster, but it has some certain restrictions. So the programming is slower, the reprogramming is slower, and the layout, creating the layout for the chip is super slow. So if you have a very large program that you want to place on the chip, it might take days to compile the program, just to do basically the circuitry,
Starting point is 01:12:33 how you fit it on the FPGA. But then it's highly efficient, it's highly parallel, but somewhat hard to program. Okay, and this is kind of an overview. So the idea is, you see, there's many different hardwares. There's trend in hardware. And if we want to be efficient in data processing, then we'll have to look at these trends.
Starting point is 01:12:57 We'll have to really look at the hardware, and this is what we're going to do in this lecture. So really see how does the hardware work. If the hardware work. If the hardware works like this, how do we use it for data processing? OK, so as a summary for today, I gave you the introduction to the course, or let's say to the group,
Starting point is 01:13:20 and what else we do. And here as a quick reminder, it's also to these competitions. So if you feel you don't have enough to do, look at the SIGMET programming contest. Then we talked about the course organization. The slides are online in Moodle. Also, we have lots of descriptions
Starting point is 01:13:44 about the course projects, etc. So read through that. If there's still questions after that, feel free to send us a question by email. Ideally, if it's about course organization, etc., use the forum because then other people will for sure have the same question and we can answer it for everybody. Then we talked about exercises, et cetera, and then I gave you a bit of motivation why it actually makes sense to think about hardware. And the reason is this, while hardware is changing and if we want to be efficient in data processing then we also need to change our programming of course this is also true for
Starting point is 01:14:33 everything else right so the the hardware is changing not only for data processing hardware is changing for any processing and if you want to be efficient in any processing, you have to think about the hardware that you will be using. So that makes sense in any case. But in this course, besides looking at the hardware, we'll specifically look at the data processing parts. Okay, so thank you very much for your attention. Tomorrow, I'll talk about performance analysis. And do we have questions so far? No questions?
Starting point is 01:15:13 Very good. Well, there's a question. How are the dates for the individual sessions determined? Are these assigned generally as well? So the question was, how do we assign the individual feet or the individual interviews for the programming exercises. And we select randomly.
Starting point is 01:15:36 So basically each of you, if you participate in the course, will get one of the programming exercises. And then we'll find a block, maybe Wednesday after the session or Tuesday after the session, so right after the course, and ask you if this fits. If it doesn't fit, we'll find another slot, so something like this. So closely after the programming task is done, we'll start with these interviews, let's say. And some important aspects regarding the Moodle.
Starting point is 01:16:15 So please regularly also check out the announcement section, which we will regularly post news. And there is also the task zero that we mentioned before. It's already online, so feel free to check it out and to try to program it because based on that you can also better assess if this course is for you or not if you have some a lot of troubles solving this task zero then this task this course might be quite hard for you and there's also a section for active programming participation you can always of course also join the course but if you want to get credits you should do the tasks and we also need to prepare some docker setup for that so therefore it would
Starting point is 01:16:58 be good to know as soon as possible who wants to do the tasks. So therefore, if you already have decided, just click the According button in this section and let us know that you will participate in the task implementation. But of course, we'll have another announcements regarding this. Yes? So how hard would you say is task zero
Starting point is 01:17:21 compared to the tasks 1 to 4? No, I mean, the tasks are not that much harder. They're just a bit more, I would say. Right? So it's, I mean, if you can handle task 0, most likely you can handle task 1, et cetera. So in the past, we were kind of, well, said, OK, you really need C++ programming, or we really need to already know it.
Starting point is 01:17:55 And then a couple of the feedback that we got was, well, it wasn't that hard after all, at least from some people in EVAP. So now I'm a bit more careful about this. So just try it. If you can deal with task 0, you most likely will also be able to deal with the other tasks. And if you need help, just reach out.
Starting point is 01:18:19 And that's, of course, for everybody. OK. So did you get the announcements for Zoom, whoever was already? for everybody. Okay. So, did you get the announcements for Zoom? Whoever was already, okay, perfect. Then this works every now and then. At some point I didn't correctly configure Moodle and people didn't get the announcements, but if that worked
Starting point is 01:18:37 you will get all future announcements and that means you will be informed about all important things around the course, hopefully in time. Okay. So with that, i hope to see you tomorrow. Thank you very much.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.