Disseminate: The Computer Science Research Podcast - Dimitris Koutsoukos | NVM: Is it Not Very Meaningful for Databases? | #41

Starting point is 00:00:00 Hello and welcome to Disseminate the Computer Science Research Podcast. I'm your host, Jack Wardby. Again, a reminder that if you do enjoy the show, do please consider supporting us through Buy Me a Coffee. It really helps us to keep making the show. Today, I'm going to be joined by Dimitris Koutsoukas, who will tell us everything we need to know about NVM. Is it not very meaningful for databases? Dimitris is a PhD student in the systems group at ETH Zurich. Dimitris, welcome to the show. Thank you. Thank you so much, Jack, for inviting me here. I'm really happy to be part of this. Brilliant. So let's jump straight in then. So can you tell us a little bit more about yourself and how you became interested in database research? Yes,

Starting point is 00:01:00 sure. So right now I'm like a fourth year PhD student in the systems group. I'm just finishing my fourth year and they're in my fifth in a month, actually. So I just did this combined bachelor's plus master's we have in Greece called the master's of engineering. And then I started working for a couple of years and I realized that I want to do something a bit more than just like being a plain software engineer. And I saw all this like, I don't know, this machine plus deep learning boom.

Starting point is 00:01:30 And then I said, okay, like the easiest way to get like to ride the train is just to do the data science master. And I saw like various universities that offer programs. I just applied to a bunch of them. One of them like that accepted me was like a TDA I took that and I did a couple of years of a lot of machine learning

Starting point is 00:01:53 like big data courses and mathematics courses and then I realized that I still enjoy this machine learning part but to combine it a bit with my previous background, I like more to write code, right? To work on the systems. And at that point that this happened,

Starting point is 00:02:17 I was really interested in data processing systems that combine multiple disciplines, for example, data and relational analytics, machine learning, and this stuff. And that's what brought me to Database after all. Amazing stuff, yeah. So today we're going to be talking a little bit about NVM, right? So non-volatile memory. So maybe you can tell the listener, we can go through some background here, and you can kind of tell us what is non-volatile memory, also known as persistent memory sometimes, right? So yeah, let's start off with that how what are these

Starting point is 00:02:48 technologies so non-volatile memory is something i would just like in layman's term let's say i would just describe some like storage layer that is in between traditional like uh hard drive storage so it's this and ssds and vram So it's faster than SSDs. It's slower than DRAM, of course, but also comes at a lower price. It can persist data. The advantage of it is that you can basically use it in order to support the increasing amounts of data that are produced every day and companies, researchers want to analyze them. Because the problem is that DRAM cannot keep increasing all the time, right? First of all, it's kind of expensive.

Starting point is 00:03:35 And secondly, of course, new technologies are coming, but we needed to find a solution to just keeping all the data in main memory most of the time. Nice. So it sits between SSDs and DRAM and that sort of spectrum. In terms of the price point then, so how much cheaper than DRAM and more expensive than SSD are we talking here, really? I don't recall the exact numbers. So just probably this is a spoiler, but Intel Optane and like the commercial implementation of PMM is now discontinued. So like we cannot really buy it or see the price. I would say that probably is like 60, 70% less than DRAM like in price, like 60% of the price of Ethereum, just to be exact. Now about SSDs, there are also NVMe SSDs,

Starting point is 00:04:28 so it's like difficult to just pinpoint it like exactly. It also depends on like how many SSDs, which is the technology, like SSD technology is actually like evolving all the time, right? Right, sure. Yeah, it's quite diverse sort of things that's hard to draw an exact sort of comparison cool so there's been there's been sort of you mentioned this in your paper as well there's been a whole chunk of research looking at using sort of nvm and pm

Starting point is 00:04:54 in databases so where does your work fit into that sort of body of work and kind of yeah i guess what's the general elevator pitch for for your work in this area so my my elevator piece would be that okay of course like we we knew that like we cannot keep all data in main memory and then we started like finding other types like of memory let's say maybe far memory high bandwidth memory non-volatile memory and at that point that this idea started we didn't have like any, the actual hardware, like any commercial implementation of NVM. And like researchers in databases, but also like in general did like a lot of studies,

Starting point is 00:05:35 right? But at the end, the hardware wasn't there. So they couldn't like exactly see what was happening. Then when the hardware came out, some studies came, but they were like a bit more general, studying the characteristics, you know, like people just like dipped their hands in the water, like very, very slowly. And at that point that we started writing this publication, there were some studies for like, not exactly like databases, but like, let's say all up workloads workloads or like some isolated studies done by companies like Microsoft or SQL Server.

Starting point is 00:06:08 But we were like, okay, so let's say that, like, Jack, you, like, as a researcher, want to come and, like, use your NVM in databases and you don't want to, like, start, like, optimizing your code, but you just want to, you just, like, spend the money or the money or to find the hardware and you want to see what it does, what it cannot do for databases, right? And that's how we started to explore, see different workloads, experiment with different

Starting point is 00:06:37 knobs, and try to actually cover breadth and depth at the same time. Nice. So this paper is the same time. Nice. So it's the kind of the, this paper is the one-stop shop. So if you want to know anything about how, how MVM works in a database, right? It's the comprehensive study. The listener needs to go nowhere else. Just go to this paper. You can find everything you need to, you need to know. Right. So that's, that's, that's awesome. Cool. So let's talk about the, the experiments you ran then, because the paper is, is, has got a hell of a lot of, it's covered, you've covered a hell of a lot of ground so let's start off at the very beginning then let's tell us about the

Starting point is 00:07:09 the experiment experimental setup then so i think we can maybe talk about here that you mentioned earlier on the about intel's uh opt-in so you can maybe start off by telling us about that technology because that was the one that was the actual commercial offering you've got your hands on and got to play with and then we can talk about kind of the systems you measured and the workloads. Yeah, sure. So just to start and just to make things clear, we started like Intel Optane in the context like of a database running on a single node. There are many, many more applications.

Starting point is 00:07:38 Like in a recent paper, it's almost impossible to cover everything. And that's like the question like that we tried to answer, right? So probably we couldn't fit everything in the title, but like in the context of like a single server, like with a single database, does it make sense to use NVM or not? And there is Intel Optane.

Starting point is 00:07:58 So Intel had this NVMe SSDs, which were like faster than traditional SSDs, but of course they were in the PCIe interconnect. So Intel Optane is actually in the integrated memory controller. So if you imagine you're

Starting point is 00:08:15 computing the system and you have your CPUs with the number, the cassies, and then you have the memory bus, and then in the memory bus you have the DIMMs that are DRAM. You actually, in the same like slots, you just plug this Intel Optane technology. So that basically means that it's not like a peripheral device.

Starting point is 00:08:37 It's integrated, like the integrated memory controller manages both the DRAM and the Intel Optane. So I would just call it B- or NBM just to make it a bit shorter for the podcast. Nice. Yeah, there's a few different flavors in which it can be configured in, right? So I think there's Memory, App Direct, and Mixed Mode. Can you tell us about how these differ? Yes. So in Memory Mode, you basically substitute, like substitute in quotes, because it's not like 100% accurate, your volatile memory of the system. So your previous DRAM with BIMM.

Starting point is 00:09:14 But the caveat in that is that like DRAM becomes an L4 direct map cast. So basically, you still have like the previous memory hierarchy, but the DRAM is like a hidden cast for you now. In indirect mode, there are a number of ways. So basically PMF just becomes like another storage layer, like let's say a faster SSD. You can configure it in like a number of namespaces. Intel recommends FSDUX which actually bypasses the operating system page cast and you can just use directly a map to map like files to the storage medium. And this is actually the like the previous studies have mostly studied this mode and not from the angle that we have done. And they were trying to see like,

Starting point is 00:10:06 how can I adjust my short code and like my software to just be able to take advantage of that. And mixed mode is just like, you configure a percentage in memory mode and a percentage in up direct mode. Intel recommends a ratio of one to four, I think, just for a price and performance ratio.

Starting point is 00:10:25 And again, like DRAM sorry, ratio of one to four, I think, just for price and performance ratio. And again, like, Dear I'm Sorry becomes an L for direct mapped cache. And just like because I mentioned it, just to also make clear, we just wanted to when we just benchmarked NVMP, we didn't want to make like changes in the source code, right? As I said in the beginning, we just wanted to say, you just happened to find this hardware somehow. Like, can you use it?

Starting point is 00:10:48 How can you use it? What happened? Yeah, just drops from the sky. Here you go. Okay, now what do I want to do? Right, yeah, cool. So just to recap there real quick. So the in-memory case,

Starting point is 00:10:56 that allows us to basically swap it out for DRAM. DRAM becomes a hidden cache. We have the same memory hierarchy. Whereas AppDirect, that allows us to sort of interact and we can m map directly and mix mode is is kind of a combination of the two and they recommend it was a one to four ratio which was the one on which was the four in that ratio i think so for one gigabyte of like uh

Starting point is 00:11:16 dirham you needed four gigabytes of like uh bm in memory mode if i'm not mistaken gotcha cool cool that makes sense cool great stuff so that's that's this is the toy that we that you've been playing with for all these experiments so now that we've covered that so can we let's talk a little bit more about the the experimental setup then so what were the systems you used and then maybe you can introduce some of the the benchmarks as well i know we're going to talk in the about those in a lot of depth later on but yeah give us a high level sort of overview of all the various components yes so, sure. So we just like, I don't know, like we're database researchers, right? So we know two types of benchmarks. We know OLTP and we know OLAP. And at the end we chose like an OLAP

Starting point is 00:11:59 workload like TPC-8. It's heavily used, especially when it's the first thing that you want to do like in a research paper also companies just use it as like i don't know like a point two to prove that uh like they have an efficiencies how fast we are yes exactly and for all mdp we used tpcc it's also like very very heavily used and then at some point uh one of the reviewers asked for uh like uh key value stores just to like bridge like all the different types of relational databases that we would have i know like key value stores are not exactly like a database but they are used as kind of such right right? So we also, like the most famous benchmark there is like the Yahoo Cloud Streaming Benchmark. So we also like ended up using that.

Starting point is 00:12:53 And for what we try to do in general is that we said, okay, like we really need to stress the memory, right? And we know that maybe some of the effects that we see are a bit like exaggerated because we stressed like the memory in cases that it's not common but we really wanted to see like the weak points the strong points so we we just isolated like all our experiments in one NUMA node so that we made sure that our working set is actually higher than the amount of DRAM that we have there. So basically, we wanted to use as much PMM as possible for the experiments that we run.

Starting point is 00:13:36 So for TPC-H, for example, we would use scale factor 100. For TPC-C, we configured a number of warehouses to just have around 100 gigabytes of data and the same for the Yahoo Cloud streaming benchmark. Awesome. Cool. Yeah. So which systems did you use in these experiments? So in the end, we ended up using, I think, MySQL, SQL Server, and then DuckDB. And for TPCC, we used, again, PostgreSQL, MySQL, SQL Server, and VaultDB. And for the Yahoo Cloud Semi-Branchmark, we just used RockCB. Nice. What drove the choices of those systems? Did you usually get kind of get a breadth of some of the most popular systems and I guess you've got a nice obviously cross-section there of transactional systems also with obviously analytical systems

Starting point is 00:14:33 as well but was there anything more to it than that? So the first like choice is that we wanted to focus on as much of like open source databases as possible. Because as you well know, it's very easy to change configurations, basically, and understand the behavior because you can actually have access to the source code and understand how they manage the different stuff. So this was the first choice that we made. So that's why we choose Postgres and MySQL

Starting point is 00:15:07 and also DuckDB. I think that are three of the most popular systems that are being used in databases now. And then we also use SQL Server because we know that, I don't know, and of course I agree with that, it has, and of course, I agree with Guelda, it has a very high place in the industry for optimization of plans, running queries very fast.

Starting point is 00:15:32 And for the VaultDB, I think we chose it because we also wanted to specialize, let's say, OLTP system. Of course, MySQL is one of them, but you wanted a second one because Postgres and SQL Server are not so popular for OLTP workloads. And for the Yahoo Cloud Streaming Benchmark, I think we just picked RocksDB again

Starting point is 00:15:55 because of being open source and popular. We just wanted a QI source that people actually use a lot. And also, other studies have used a lot roxy b as well for comparisons yeah cool yeah obviously you scoped here kind of on relational systems but did you ever have you thought about maybe um how these how this experiment would

Starting point is 00:16:18 interact with maybe kind of some of the systems like mongo db for example it was a little bit sort of out of the usual relational world. We thought at some point to do experiments with NoSQL databases like MongoDB, for example. But the thing with these systems is that they don't have such standardized benchmarks, at least in my experience as Postgres or MySQL. And then it's hard, first of all,

Starting point is 00:16:47 to find the insights. Because if you don't... I mean, we discussed about TPC-H before, right? You know the benchmark by heart, you know the weak points, you know what happens. We need a benchmark like that for MongoDB. And also, in academia, you also need reviewers, the other side, to be able to understand and contribute with their comments to that. So it was like, I don't know, like it was just a compromise at the end, right? Yeah, sure. I mean, at the end of the day, right, you've got to be practical about it at some point because there's that many systems. You could be doing it forever. And at some point you've got to publish, right? So, I mean, yeah, that's really cool.

Starting point is 00:17:24 So let's talk some about some results then so just kind of set the scene you ran some sort of baseline micro benchmarks can you tell me what kind of these were and kind of what some of the sort of preliminary sort of findings were of those experiments so these micro benchmarks we just uh had them as a point of reference. There was a lot of work in basically more in systems conferences, let's say. And especially there is like a famous lab led by Swanson that have done, have open sourced their code as well. So me and the other people that worked on the paper, here it was like a postdoc, Michal Friedman, that took more of the micro-benchmarks. She took the code of these previous papers,

Starting point is 00:18:14 she adapted it in order to be able to understand what are the limits, right? Like how faster in practice can PIM and B than DRAM or than the SSD that we had. And of course, we could do that only for the, just like this parenthesis here, and just to bring like probably the audience in the same page. Memory mode is like, we know the basics that I said, but like it's kind of a black box, right? We don't know exactly how it works.

Starting point is 00:18:44 We do not know, like, we know that it's direct math, but we don't know, we cannot pinpoint exactly what is happening there, right? So when we run these micro benchmarks, we couldn't verify that what we are actually seeing as a behavior was actually the memory mode and not something else that was hidden behind the scenes. So the micro benchmarks were not for memory mode, but just for the up direct mode, our

Starting point is 00:19:08 SSD and DRAM, which like, I don't know, like we were perfectly clear that, okay, this is kind of what happens with these hardware components. And we can be sure that, for example, like PMEM has like 10 times more read and write bandwidth than the SSD. Cool. Let's get into some of the OLAP experiments then. So can you start maybe giving us sort of some of the general insights, the general observations you found whilst doing these experiments with MTPC-H? Yes, sure. So I would say that the four different databases that we had

Starting point is 00:19:46 gave some common insights and many different insights. What I would say, one of the common insights is that when you have a workload like TPCAs that you mainly read, the more cores that you use, the better you get, the higher bandwidth you get for PMEM. And you reach the maximum both in PMEM and DSSD, and you can see like a lot of difference. But on the other hand, we didn't see like the exact difference that we have seen, like

Starting point is 00:20:16 in the micro benchmarks. And we were kind of like wondering where this, why this happens or where some of the bandwidth is lost and then we started observing how like different databases like postgres utilize the page cast the operating system page cast and we see we saw that because bm is in the integrated memory controller then there is no direct memory access right there is no like this kind of prefetching that is happening through PCIe that you know that data are going to be there. So you just prefetch them in the operating systems cache

Starting point is 00:20:51 and the database region from there. And the CPU has to be like involved all the time. So you're kind of losing some bandwidth there. Another insight that we got is that for some reason, it's also like very online. If you use any search engine and you search for it, SQL doesn't use more than one core for analytical queries except for a very specific type of queries. And you cannot really change this behavior. So in very constrained environments, we saw that the advantage of PMEM was even less because you had to spend some of the CPU resources

Starting point is 00:21:31 to do the processing. You had to spend some of the CPU resources to actually transfer the data through the integrated memory controller. So you were losing a bit of that as well. And that insight was also verified by DuckDB. DuckDB, like from all the systems that we run, has a columnar format instead of a row format. It's very, very optimized for specific type of queries, especially TPC-8. And we saw that for CPU-intensive queries, for example, because it's a columnar format and anyway, storage, of course, plays a role, but you don't read as much data as in like in a raw format.

Starting point is 00:22:12 You can see that PMM in up direct mode is more or less the same as like using a plain SSD or sometimes even worse. Yeah, just to make things clear that but i just discussed before it's mainly about the up direct mode right okay sure so um there was obviously you mentioned there's various knobs you've you've changed in some of these experiments to sort of investigate different areas of this sort of space so can you maybe talk us through which when you were doing your experiments which knobs you did vary so in general we tried to like make the buffer cast as a smaller size but like of course wouldn't make it like one megabyte because that wouldn't make sense but like we wanted to use as much as less of DRAM as possible right so we made it like 16 gigabytes we experimented for like other sizes but we we

Starting point is 00:23:02 settled up on on that um at the end in order to be able to just read from disks all the time disk being either pmem or like an ssd we didn't actually experiment with it but we saw how like postgres utilizes a lot like a cyclical and like a cycle buffer in order to be able to just get data and process them. And I would say that for OLAP, we mainly focused on trying to understand, okay, your database has a query plan that is like an aggregate that is based on a hash table and then a double for a loop joint. So we were trying to understand, okay, how would this map

Starting point is 00:23:50 to the actual hardware characteristics and what makes sense to change, like making the axis size larger or the block size larger or uh like the block size larger or smaller so i would say it was more like observation and less experimentation compared to tpcc that we like had a lot more uh room to just tune knobs there okay cool so i guess that's a nice segue into the ltp experiments and so i guess let's let's rinse and repeat here tell us about the insights and then we can talk about the about the various knobs so something that we also observed for all but it was more obvious for all the OLTPs that memory mode it has some like tiny few cases that it was better so you needed to have you needed to have like workloads that not only had the larger working

Starting point is 00:24:46 set and vram and the os page has combined but you also needed to have like a slow number of read and write accesses to them right so let's say that you have a table it's very large you read it once you process and then you're done you're not going to read it multiple times so this was like the premise for memory mode but for OLTP we really didn't find like anything that it could be better compared to a system that uses a plain SSD and just DRAM and then that's it. We also saw verified insights that other people have found for, that you need to be very careful when you do write and read workloads together. So we had to make sure that the interference

Starting point is 00:25:35 and how the queues are managed by the integrated memory controller made sense, right? Because the thing is that with PMEM, when you have just read workloads, the bandwidth increases almost indefinitely until you get to the maximum. But as you increase the write rates, there is a lot of contention

Starting point is 00:25:53 and then the bandwidth drops. And if I can describe it, it's an opposite U-curve. It's basically not an opposite, like a proper U-curve with the written right uh like workloads you have like a sweet spot in the middle and then like you you end up like dressing left and right right so that like that were the two main insights i would say so be careful when you have mixed workloads when you have right workloads also be careful don't write too much because then

Starting point is 00:26:21 you will also have bad results and at some point at dpcc for sql server we had so bad results when we were using a high number of users that the ssd was like uh better like by by a large margin for like the transaction rate that's crazy i mean i mean obviously the elephant in the room here you mentioned earlier on is that it discontinued this this now of intel so is this is this the maybe the reason why right like the performance of it just isn't isn't that good i mean obviously no it gets it would be used for a lot of other things than just databases but i mean this doesn't seem like it's not kind of a a glowing sort of um endorsement or use case like look how amazing it is right so yeah do you have any sort of information why they discontinued it

Starting point is 00:27:03 i i know there are the speculations that like everybody like so i don't really know why they did it like i guess that's like one reason was that right and one other reason was the cost because like it had like a kind of a hidden cost let's say because basically besides like having a payment buying payment and putting in your system there is like a certain number and in the of intel processors that can actually support pmm right you cannot put like your processor like from six seven years ago and expect it to to work with pmm out like uh out of the box so it was like i combined hidden cost, I would say. There were some of the companies that were also producing these chips because Intel outsources the production of the actual circuits

Starting point is 00:27:52 that were also not so happy economically. That's what online sources say. Combined with these characteristics that don't make the software so appealing and i guess this was the like i don't know for the final post to push to to this like direction and also like there was this like cxl the compute express link which is being advertised as the new solution so it was just tiny big pieces that were like yeah in the end it's the the straw that breaks the camel's back right there's all these little sort of minor sort of cuts right and then yeah death by a thousand cuts there's another shame for it i guess and cool and so yeah so we've touched on some of the general insights there from the old tp experiments but yeah you said that there's the you varied a lot of different um knobs here tuned a lot of knobs so let's talk about them the things

Starting point is 00:28:43 you varied when you were doing these experiments then yeah so we we just played like first of all like we saw the number of threads that you can use when you want to write for example and we observed like what other people have done there are optimizations in postgres like the wall compression and like it's used to just like compress the logs in order to to take advantage that cpu is actually like faster than storage but there we just couldn't see like any advantage of it so like all optimizations have to be like re like reconfigured or thought or thought a bit in order to be able to to just bring them into pmem we also show saw in MySQL the number of flushing methods because MySQL, you have different ways to be able to flush your logs and your data to storage, right?

Starting point is 00:29:35 You can do it again by bypassing the page cast. You can just use intermediate buffers. So we also played a bit with that. And we also, so in general, like in up direct mode, let's say that you have, so it's like a bus of the integrated memory controller. You have like, let's say, one like slot of a PMEM, right?

Starting point is 00:29:58 There is a way to combine them together. And this is called interleaved up direct mode. But there is a way to just see them as two separate namespaces. We also played around a bit with that. And finally, we just went a bit around.

Starting point is 00:30:16 Let's store the logs in one storage medium, let's say SSD, and the data in another storage medium, for example, PMEM, or stores everything in PMEM or in everything in SSD and to understand like if you can gain some performance advantage

Starting point is 00:30:33 or because like, I don't know, Intel Optane and PMEM in general are being like advertised a lot for persistency and you can say, okay, I want my logs to be persistent, right? So what if I just store the logs and not like everything there? But in say, okay, I want my logs to be persistent, right? So what if I just store the logs and not like everything there? But in general, besides like we had some like insights that other people had,

Starting point is 00:30:52 but like the other insights just like fall a bit naturally with the general story of like the paper and the weaknesses that we had identified before. Cool. Yeah, I guess so. Given that then moving on to the key value experiments and ycsb was that the same story there or was the was there a difference of insights you got out of those experiments it was just like a confirmation of the story that we have seen before we we use like three workloads one was like half and half so mixed and there we saw like actually that with mixed workloads

Starting point is 00:31:25 it's like very, very tricky. And then we had just a read workload, which showed like, I don't know, how you can use PMM and just expect like your bandwidth to grow indefinitely up until like a very high point. And then mainly write workloads. So I wouldn't say that there was anything very surprising. The difference is that there the keys and the values were very wide.

Starting point is 00:31:49 And we were expecting maybe to get some performance advantage because the line of PMM is like 256 bytes, which is like four times larger than a line in DRAM, for example. So maybe we said because of read and write amplification, we'll be able to see like some different uh behavior but um we couldn't like identify something that wasn't like revealed in part of uh in part in the previous part of the paper yeah of course i'm getting the answer to the question and the title of the paper is no it's not very meaningful yeah cool so i guess before we move on to sort of um sort of some more kind of i guess general questions about about um things

Starting point is 00:32:33 it might be a good opportunity just if you would like to provide like a i don't know a summary of all the insights you sort of found or like kind of a condense all your findings into sort of um one soundbite i guess if you can do that i mean there's a lot of findings but yeah um so if i wanted like uh i don't know listeners to just like think about two things like from this paper the one is that memory mode like who couldn't really find like a use where like it's science or you should use it if you have like a shortage of dirham the second one is that the app direct mode is more optimized like as a hardware from like a traditional SSD but like there are small cabins like along the way

Starting point is 00:33:12 so you should be careful about read write interference the number of write threads if your resources are limited for example so if you if your system like Postgres uses a lot the OS page cast because then other systems, I mean, when you just use it SSD, this might cover some part of the hardware difference. So these are all the things that a reader or like a researcher or some person like working in a company should pay attention to if they use like Intel intel obtain at some point in the future cool that's that's that's a nice summary there so i guess are there any sort of

Starting point is 00:33:52 limitations of your of your of your study though so i mean obviously it'd be very sort of comprehensive but are there any areas that we think yeah that could maybe have been improved or this is some of the some of the the dark corners of the study that we kind of pushed to one side. I don't know. But yeah, the only limitations, I guess, is the general question. Yes, of course, like every study has limitations. So my paper is not an exception to that. I have heard from people in companies

Starting point is 00:34:18 that when they use PMEM as in a separate machine, as a casting layer in the cloud, they get really good results compared to traditional ssds um but our study was like more on like a single like server environment and it was hard to push the the scale factor and everything to the extreme because you wanted to be able to fit like in the hardware that we had so you wanted to be able to fit everything into one NUMA node which means 256 gigabytes of pmm two terabytes of ssd plus like 64 gigabytes of ram so like at the end like the scale factors were just a bit restricted by this fact right maybe like Maybe like if you go like to, you have like, I don't know,

Starting point is 00:35:05 like a system that is like, has like a few terabytes of RAM and then you have like, again, a few terabytes of PMM and like more optimizers as these, you will find, of course, we will find some of the findings that we did, but like maybe you will find some things

Starting point is 00:35:22 that we missed. Cool, yes. I guess kind of on that, is there any sort of plans for the next steps of this study or is it with this technology in general with PMM or is this kind of the end of the road for PMM for you for now? For me, yes, this is like the end of the road after the discontinuation.

Starting point is 00:35:39 With CXL, because right now it's simulated using FPGAs, so field non-programmable arrays. And my group has a long tradition of using this type of hardware. I think they are planning to do some CXL work. I'm not so involved in that. But for PMM as it is, we really don't have anything to continue working on that line of research. Sure, cool.

Starting point is 00:36:07 I guess kind of, you mentioned this a bit early on, actually, about sort of what you would want a software engineer or DBA or someone building kind of database system to kind of keep in mind from your study. But I guess what I want to know is what impact do you think your work can have? And maybe what impact it has maybe had already if you've had any sort of feedback from people who've used the this study to influence what they're doing their day-to-day work when i was like presenting to vldb um it was um like it it was kind of like a nicer price let's say uh there there is like there was like a very known

Starting point is 00:36:43 researcher for microsoft and he said that uh look, you don't present us anything new, right? Like at Microsoft, we had the software from 2019 when it came out. And all the things that you have said and that it's not so useful, we have found out the same things, right? And at the end, that's why it was discontinued and then like i told him like that look like when we were writing this study um uh like it was like more than a year before the technology got discontinued and like by the time that we submitted it for the first time it was also like more than like eight months before it was discontinued, right? So then he understood the motivation of our work and that we had actually predicted in a way

Starting point is 00:37:30 the discontinuation without knowing it. Yeah, that's funny. But I mean, it's all good and well them knowing internally at Microsoft that this is the reason why and this is why. But I mean, not everyone has access to the resources Microsoft do to figure this out, right? so you need to kind of have it in the public domain somewhere to sort of to allow other people to go and sort of get the same insights right because these these

Starting point is 00:37:53 big companies are very good at keeping their findings to themselves i mean no i think it's this it's really good work and um yeah i think hopefully you got your point across to it thank you so much like uh i don't know like it's a bit of like a win, let's say, at the end, that like the history just confirms what you have predicted, right? Exactly. Yeah, yeah. So it's great. That's cool. Awesome.

Starting point is 00:38:17 So, yeah, another question I like to ask people when we're talking about the work is, was there any sort of, I don't know, there's two sort of angles to this question. The first is kind of what is the most interesting thing that you learned while working on MVM other than the fact that you can predict the future? The most interesting fact, I think,

Starting point is 00:38:34 that there is this like specification, like book that we see for every type of hardware, but the reality sometimes surprises you, right? This was like, for me, what was the most interesting thing when I was working with this technology. I guess on the other side of that question then, what was the most unexpected thing or maybe the thing you tried along the way that failed

Starting point is 00:39:00 and that was some war stories, I guess, from this project? Because you said you were working on it for quite a long time. First of all, it was kind of like some war stories, I guess, from this project? Because you said you were working on it for quite a long time. First of all, it was kind of like a war story to get this paper accepted because, of course, the title is a bit controversial. The results were a bit controversial as well. It's hard to convince people so easily that something that has been elevator beats like for decades right and like it's actually now it's not like really worth the fuss i would say that like uh this was one of the the struggles in this uh in this work and also when you are using so many

Starting point is 00:39:41 systems and you like have to to conform to all the dependencies and stuff like that, this is a bit different than your typical database paper that you develop a solution, you have a couple of baselines, etc. When you are using six, seven, eight different databases and version change, the way that they work changes, it's kind of hard to like uh synchronize and script uh everything like in one place yeah every system has a different way of different syntax for doing the same thing right so yeah it's a bit of a nightmare i can imagine that's a very difficult challenge cool yeah so i've just got a few more questions now so i guess the the next one you

Starting point is 00:40:22 mentioned you've mentioned it various points throughout the podcast is that some of the other research you work on so can you maybe tell us more about some of the stuff you've maybe done in the past and maybe some of the stuff that you you wish to do in the future just like to to take like a pause here since we're just sleeping um like i would like to thank like a lot of like the people that worked in this paper yeah go for it yeah i would like to thank, like, really a lot, Raghav Bhardia, the second author. He was a master's student at the point that we started this work.

Starting point is 00:40:53 I think he's now doing, like, a very good career, like, as a software engineer. But, like, he really worked and, like, produced together with many of the results that we actually found in the paper at the time. And I remember the conversations and some of the things that he was saying to me. And I was skeptical at that point. But at the end, he just did excellent work from his master's thesis. And just to share about some future plans and in general, the motivation of my PhD

Starting point is 00:41:31 and my dissertation. I was just like my main, let's say, the main research question that I was trying to answer is that look like the data processing landscape evolves very fast. You have new types of hardware, new types of algorithms. You want to have real-time processing. Your data is increasing.

Starting point is 00:41:51 And usually, the systems that are built right now are a bit monolithic, are a bit very good at doing exactly one thing. They're super fast at doing this exact one thing. But the moment that you have to change something you basically have to rewrite everything from scratch so i just pick like um different like parts uh and see how we can adapt them to like the the modern trends of like hardware or platforms um and my work is just going to continue like at the same, let's say, road. I just like I'm working right now to a bit like how it's like Snowflake and Databricks, but like a

Starting point is 00:42:46 bit more of the first two for like a workload that you have in order to have like the best latency and also the best goals. So a bit like more like since the data processing moves in the cloud, how you want to organize your data to take advantage of that. And another type of work that I'm kind of doing is bridging, like other people do this type of work as well. But I would say I'm doing it from a bit of a different angle. So I had a system that I built as part of my PhD, it's called Modularis.

Starting point is 00:43:20 And at that point, it just doing like relational workloads and we now extended to machine learning workloads we added the support for fpgas to be used like as a as a smart storage engine and we want to see how you can build like a system that is like using different types of hardware to just run one query right because maybe Because maybe like, as we said before, we don't know like what comparison system do. Maybe Amazon Redshift does the same thing. But like when you just develop like your own system and like you open source it afterwards,

Starting point is 00:43:56 you know exactly how these things work from like one system for another and how you can like put them all together. Yeah, so it sounds like you've got loads of really cool things to be working on there. i'd be i'd love to know about how you actually approach generating these ideas and things to work on and then obviously once you've generated ideas i mean selecting the things to actually dedicate a significant portion of time to basically i think it's just like an iterative process uh mainly like based on on discussions i I've been extremely lucky to have some very talented

Starting point is 00:44:27 colleagues in my group. And maybe somebody will tell you okay, we have this, for example, we have this problem. How will you solve it? And then you start thinking that this is actually a problem, right? And maybe some of these people like go to industry afterwards and you see that uh like if you're still giving thoughts you see that this problem actually exists in like also companies so like for example this is like evolving data layout for or like uh to optimize optimize your data layout based on like workload it's it's like a real problem right so like let's say that you're the typical data scientist of the day and you have like your

Starting point is 00:45:14 data somewhere in whatever like csv json per kill i mean if you want to do some analysis probably you won't have like a full-fledged database right you would just like use a query as a service system then you will be you would just run your queries take some insights take some rows like compute uh what you need to train your machine learning models or whatever so given the fact that you know this and like you don't like uh to pay pay so much money every time to Amazon or Google or whatever, what can you do to just say, okay, I want to take advantage of some transformations that I know make sense for the systems,

Starting point is 00:45:57 for example, compression, to be able to have workloads that run faster and are also cheaper. And also this is the typical data scientist problem, right? So you want to, let's say, have a model that does a prediction. You will just go to a database. Then you will have to run something. Then you will export your results to CSV,

Starting point is 00:46:20 then you will just start using Python. You will import them to Pandas. You will start training your scikit learn model. Um, so what if you can do all of them like with one system and what if you can do it like using different types of hardware as well? Fascinating. I really liked what you were saying in the, in the answer to this question about sort of the people you surround you with,

Starting point is 00:46:42 right? You really are kind of product of your environment, right? And if you surround yourself with good people then it i know it just produces those good ideas and if you can iterate on things as well yeah yeah no that's that's a great great answer to that question so with that we're on to we're on to the last question i know you gave a great summary of your of your um of your work on mdm earlier on but maybe now is a chance to get kind of to a chance to give that message again. So what's the one thing you want the listeners to take away from this podcast today?

Starting point is 00:47:11 The one takeaway that I would say, so first of all, in the context of NPM, like today, there is no NPM, right? This technology is discontinued. I hope that my study will just give light for like future types of hardware uh that will be similar to nvm or even cxl right to just not um have some of the weaknesses that this system had for example what i discussed about like about using CPU to just read all the time and not having DMA or not having a hardware prefetcher that brings data to your main memory.

Starting point is 00:47:56 Just take it as lessons in order not to do them. See what NVM did wrong and don't repeat it that's uh i would say it's like the main takeaway yeah which was the same something like if those who don't understand history or know history are doomed to repeat the failures of the past or something right so yeah there's there's a warning message there so yeah great so well that's great let's let's end things there thank you so much for coming on the show it's been a fascinating chat it's a listener's interested in knowing more about demetrius's work we'llris, for coming on the show. It's been a fascinating chat. If the listeners are interested in knowing more about Dimitris' work, we'll put links in the show notes

Starting point is 00:48:27 so you can go and find that. And again, if you do enjoy the show, please do consider supporting us. It really helps us to keep making the show. And we'll see you all next time for some more awesome computer science research. Thank you.

Disseminate: The Computer Science Research Podcast - Dimitris Koutsoukos | NVM: Is it Not Very Meaningful for Databases? | #41

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.