Tech Over Tea - Making Nouveau & OpenCL Usable | Karol Herbst

Starting point is 00:00:00 Good morning, good day, and good evening. I'm, as always, your host, Rody Robertson, and today, we have someone who you may not know the name of, but if you've been following, you know, a lot of Linux, graphics stuff, you've probably seen some of the work that he's been involved with. Welcome to the show, Carol Herbs. How are you doing? Hi. Great. And you?

Starting point is 00:00:22 I'm doing pretty good. So, I think, correct me if I'm wrong, but the best way to describe the general work you do is work on the Linux graphics stack. Is that a fair way to describe it? Or do you have another way to describe it? Yeah, I think that's fair enough. I'm more leaning to the compute side now,

Starting point is 00:00:45 but it's still like I'm working on MISA primarily. Right, right. I wasn't particularly like meaning specifically like gaming graphics. I was sort of meaning all of that together. But I guess compute, yeah, I guess compute you would consider separate from that. That makes sense. considered separate from that that makes sense um so the three main things that people probably recognize that you've been involved with i'm sure there are other things as well that you can definitely talk about but the three main things that are definitely noteworthy of late especially

Starting point is 00:01:17 are nvk rusty cl and novo like those are all people are always going to be talking about them as they are being improved. Yeah, I think that's fair. So I guess we can, before we get into any of that stuff, how about we talk about like how you really got yourself into this sort of space? Like how did you get yourself involved in not only doing the driver stuff you're doing but sort of getting yourself involved like in in linux and foss in the first

Starting point is 00:01:52 place yeah yeah i think that's quite a fun story because i used to start as a java backend developer oh okay j Java backend stuff. Like the full story with 500 lines of stack traces. Some Java developers know what I'm talking about. And at some point, I was just. I mean, we never had Windows installation in our home. My server was mostly like Apple user with all the Mac stuff going on.

Starting point is 00:02:25 And at some point, I just got into like HTML programming and got some jobs with Java and stuff like this. And at some point, I was like, yeah, Linux might be something fun to try out. I remember that at some point, my father got like a Linux CD installer in some Mac magazine with a penguin on it. And that was kind of my reason to wanting to try it out,

Starting point is 00:02:52 just because of the penguin. So at this point, he was against that idea. And I didn't have enough hardware of my own, so I couldn't. But yeah, at some point, I was just having this kind of laptop with an NVIDIA GPU and installed Linux on it because open source, and it kind of made sense to me.

Starting point is 00:03:16 I think the first thing I was mostly involved is because it was a gaming laptop, playing games on Linux and everything. And it was like, I think that was like 12 years ago. It was not in a great state. So the only way to reasonably game on this laptop was still with the NVIDIA driver. Right.

Starting point is 00:03:44 And even the hybrid BU setup was kind of a mess. And I had one of those laptops where you can't flip the main GPU. So yeah. I think there was this project called, I don't know if the project was called OptiWren, but it was this command which starts the second X server and is copying stuff around.

Starting point is 00:04:11 Bumblebee was the project. Yes, yes. So it was part of Bumblebee. And yeah, so that's kind of the reason how I got into Linux gaming stuff. And at this point there was also like i don't know if you heard of it but there was like the zura and the zurium which is like an open source client like steam okay okay um i don't know if people are familiar

Starting point is 00:04:40 with it because you know the company was kind of financially in a bad state and that they had like this open source i guarantee there's gonna be like one random person like i remember that thing hopefully yeah so i started to contribute to this project oh yeah i see the last chance is like nine years ago so um it's an old project. But yeah, I think I'm, I don't know if I'm still the biggest contributor, but yeah, the build system was like a bunch of shell scripts. And because I knew CMake at this time, I was like, yeah,

Starting point is 00:05:20 let me just port this entire mess to CMake. And yeah, so yeah, let me just port this entire mess to CMake. And yeah, so yeah, I got involved in helping out with this open source gaming client. It's basically like Steam, just open source, so fixing bugs and all this kind of stuff um at that time did you have any like existing programming experience or were you learning to program like as you were doing that uh not really i mean i've talked about the java and the hml stuff right right um i i was like i am familiar with c um i also programmed in objective c becauseC because, you know, Mac system and everything.

Starting point is 00:06:06 Right, right. But it wasn't really like in a... I never had like a FOSS project before and I kind of just got by accident into this position of maintaining this project. So yeah. What a fun... You know, that's a fun way to start. That's certainly a fun way to start. With those Apple systems you were talking about, was that like pre or post Unix? Was it like before there were Unix systems or where um i well my the first computer i kind of had access to was even like not even

Starting point is 00:06:50 a mac but like atari but i only used it for like playing games right but the the first apple i had contact with was also like pre-os x okay like. Like it was 9.2 or something, but it was like, you know, very close to the migration and the hardware just couldn't handle the new operating system. But yeah, from like the first actual Mac I had my own was like, yeah, it was, I think 10.3 or 10.4, somewhere around that. When you were making that transition like into Linux when you were like you found this Linux CD were you aware of what Unix was at that point or like was it just like completely unknown to you um I wouldn't say...

Starting point is 00:07:45 I think the concept of Unix itself was unknown to me, but I was familiar with, like, command line stuff. Right, right. So the command line, like, felt familiar with the two of them? Yes, yes, yes, yes. Okay, okay. So you eventually got from there into

Starting point is 00:08:00 somehow working on open-source N source Nvidia GPU drivers? Yes, so the reason I actually started to look into Nouveau was because, you know, there was like the open source Nvidia driver and I was like, yeah, maybe I try that. And then it's like, oh, yeah, it's pretty slow. But it had fixed the hybrid GPU setup with suspending the powering down. My laptop even was one of those who had an LED showing if the GPU is actually turned off or on.

Starting point is 00:08:38 Oh, OK. So yes, I also had motivation to make it work so I can power down the GPU and everything. But yes, so the reason was the performance was kind of slow and the re-clocking support was in a bad state at this time. I think it worked with DDR3 GPUs, but it was broken for GDDR5. So yes, I figured out why it was broken in the war and fixed it and that was kind of my first big patch to the project do you remember roughly when that was uh 2013 oh so it's like just after nova was getting started then Uh... 2013?

Starting point is 00:09:25 Oh, so it was like just after Nova was getting started then. I am... I think the project is older. It's like 2011 I want to say. I don't think it's that much older than that. What does Wikipedia tell me? Uh... let's see. I want to say it's like 20... Yeah, initial release was 2012. At least according to this, it might... The numbers might be a bit wrong here, which is very possible. Yeah, let me find the patch. Okay, I wasn't expecting you to find the exact patch it was.

Starting point is 00:10:07 But regardless, it was fairly early on in the project, within the first couple of years. Yeah, it might be. I didn't really follow up on the history of Novo at this point, or later, so... Okay, okay. But yeah, let's see. I have patch yeah it was uh 2015 oh 25th okay it's like three years into the project being around then um i yeah oh i was gonna say i i'm

Starting point is 00:10:40 fairly new to linux myself like how aware were people back then of of novo like do people because nowadays you know people when they're using video cards it's like novo is mentioned as a thing that exists but most people tend to just run the proprietary drivers because as as great as like the work that's been done on novo is a lot of like the experience is just going to be better right now and hopefully some of the other stuff that you're involved with is going to help sort of improve that experience um but how aware were people of novo at that point at that point?

Starting point is 00:11:26 At least how popular did it seem like it was? Yeah, I think it was more popular than before NVK showed up. Like, we had, like, more contributors on the OpenGL driver and everything, and the contributors just moved away to other projects,

Starting point is 00:11:45 like working on the AMD driver or something else. So before NVK emerged, the popularity, I think, was decreasing. I think around when Kepler was still pretty commonly used, was I think it was pretty much used by a few people because you know with clocking working it was actually useful for some kind of gaming what generation was Kepler 600 oh okay okay so that yeah that's a while ago now yeah yeah so my first like my GPU was also Kepler, and I fixed it for that.

Starting point is 00:12:29 But, yeah, Kepler was kind of the generation with the best reclocking support, and it was decently fast, at least for some games. And I think that the best benchmark I had was, like, 80% towards the NVIDIA Deviver. Oh, wow, okay. Yes, so it wasn't that terrible, but like other games were like at 10 or 20%, so it was... Right, so it was very hit and miss with what actually worked well. Yeah, yeah, but people were like, there was a bigger community of people being interested and seeing where the project is going and um yeah and it was just declining because you know also the firmware situation um bringing up new hardware was difficult

Starting point is 00:13:18 um and this this happened around maxwell which is like with the 900 series, and all the signed firmware mess. So what's the deal with the firmware stuff? Because I occasionally will see articles about, you know, firmware situation improving with Nova, all of this stuff. What's actually the problem here, and why is it such a big deal? what's actually the problem here and why is it such a big deal? Yeah, so the big thing is that for whatever reason, Nvidia came up with their signed firmware stuff

Starting point is 00:13:55 to essentially lock down certain access to the hardware from the CPU side. Mm-hmm. Um... For example, in Maxwell, we aren't able to control the fan speed of GPUs. Okay. Um... So, even if we could re-clock, if your fan is, like, still slow, it's kind of a problem. Yeah, yeah.

Starting point is 00:14:24 Um... There's only so much wiggle room you can really have there until it starts to overheat. Yeah, I mean the GPU has overheat protection at some point. Sure, but you don't want to hit the overheat protection. That's the point I'm getting at. Well, that's fine to some degree because you know the first level of overheat protection is just throttling the clock without cutting the voltage. Right, but if you're overclocking, the pointer is to raise the clock. That's the problem. Yeah, sure.

Starting point is 00:14:57 But if you run at 1 eighth the frequency, you're still cutting the power consumption by half. Right, okay. that's fair that's right yeah and at some point if it's like really getting hot then this gpu just shuts off um but yeah it's not a great place and uh you don't want to run the gpu constantly like this um and then in later generations they also like um prevented us from even changing the voltage at all. And then it's like, yeah, no point in bothering anyway. So yeah. And sadly, the firmware we were getting from Nvidia

Starting point is 00:15:40 was also customly made for the Nouveau project. OK. Nvidia was also custom made for the Nouveau project. Okay. And they never gave us the firmware for power management. Um, and only for context switching. Okay. Yeah, so, yeah. So, you had firmware, but it was the most minimal of minimal firmware, basically. Yes, it was enough to, you know, drive displays and have multiple rendering contexts on the GPU. Right, right.

Starting point is 00:16:17 Yeah. So what's all this stuff that I'm hearing about the firmware situation getting better? What's all this stuff that I'm hearing about the firmware situation getting better? Yes. So the GSP firmware is basically a large chunk of their original driver moved to the GPU side into firmware. Okay. And it includes power management and display management and a lot of other stuff. includes power management and display management and a lot of other stuff.

Starting point is 00:16:46 Basically, a lot of the driver just moves into the firmware, into the GPU. And the main advantage we have here is that we can use the exact same firmware Nvidia is using in their driver. Right. same firmware NVIDIA is using in their driver. Right. Which helps us because we can basically do the same thing.

Starting point is 00:17:09 And now that they also open sourced their kernel driver, because if you move everything or a lot into the firmware, there are not really any secrets anymore. We can also check out how the driver interacts with the firmware and what we can do with the firmware and everything. Because I remember when that kernel module came out, there was a lot of confusion about what it was, what it would actually do. I'm certainly someone who doesn't have a deep understanding of how how nvidia's graphic stack actually functions so what was that kernel module and what what wasn't it is also very important

Starting point is 00:17:51 yeah so the kernel module is more or less a layer for the user space driver to talk with the hardware. The NVIDIA driver does a lot of stuff inside the kernel, like inside user space. And the kernel driver isn't really doing that much anymore. They also have something called user space command submission, which essentially means instead of telling the kernel, you have commands for the GPU, they all do it inside user space. OK. And they just need a kernel driver to wire up all the stuff and make sure there's

Starting point is 00:18:32 no security implications of doing so and stuff like this. So basically just making GPU resources available to user space. OK. It's a little bit different than the style we usually have with all the other DRM drivers inside Linux. So but yeah, there is a huge benefit for using user space command submission

Starting point is 00:18:59 because it helps with compute because you can submit more stuff with lower CPU overhead. But yeah, the basic idea of the driver was just to have something open source. And in the past, there was also sometimes the GPL symbol situations coming up regularly. What's this? I don't actually know this.

Starting point is 00:19:24 What have you been? What's this? I don't actually know this. What have you been? Sometimes, NVIDIA wanted to use interfaces in the Linux driver, and then at some point, people were like, oh, yeah, but that's GPL only. You can't use this. And things like this happening. And so it's kind of, know from their perspective it makes sense to have like an open source gpl driver so they just don't have this problem anymore okay that makes

Starting point is 00:19:54 sense so from the from the perspective of novo what would you want NVIDIA to actually do that would really help out the project? Because obviously AMD's got this great open source driver that every gamer uses that no one really worries about, and it's just fine. What would you want NVIDIA

Starting point is 00:20:20 to do differently that would make this work out a lot smoother? At least hopefully yeah yeah i think it will certainly help if they would dedicate more developers to the project obviously um um but i think their documentation approach could also be improving. I don't know if you saw it, but we have this repository on GitHub with some documentation. Well, documentation.

Starting point is 00:20:59 It's mostly just header files with names for registers. header fights with names for registers. But yeah, they publish some stuff there. They document how to program the 3D engine or compute engine. And for example, in NVK, we only make use of that. So we don't really have to reverse engineer certain commands anymore. So that's certainly helping. But there are still gaps.

Starting point is 00:21:31 And from time to time, we are still asking NVIDIA for more documentation. And sometimes they are responsive, and sometimes they are not. And sometimes they forget to handle some requests for months or years. Right, okay. So why do you think there is this discrepancy with NVIDIA?

Starting point is 00:21:53 Like why, obviously Linux is a smaller platform, so that might just be a simple enough answer, but why do you think there is this discrepancy with how they handle their their drivers on on on linux as opposed to you know what's happening over on the amd side and over on the intel side as well yeah i i think it's probably just a perspective of, you know, does it actually make sense to them from a business perspective?

Starting point is 00:22:29 Like would they even get, you know, profit out of it? I don't think Nvidia is the company to do it like out of goodwill. Right. Yeah. So. I think that's probably the... That's definitely understandable it it'd be hard to really like get into their heads and really understand exactly why they're doing something

Starting point is 00:22:51 but yeah i i think it's a it's probably a fair guess to just say money like it it's a simple so what is actually the difference how does this what is the difference between how the AMD driver is set up and Nvidia I don't know how much you've done on that side as well if you

Starting point is 00:23:20 yeah not really okay okay that's fair. I mean, I didn't really get into, you know, architectural details of all the GPU drivers. No, that's, okay, that's fair. Not yet, at least. So, in the state that Novo is currently in,

Starting point is 00:23:37 what does it do well? Like, obviously, newer cards, it struggles. I saw that laugh there. Okay. What does it not do terribly? Okay. I think we try to make a, we make sure that at least you can boot to a desktop.

Starting point is 00:23:58 Okay. That's, I think like the baseline, the project currently tries to fulfill, if some update breaks, GNOME or KDE or something, then yeah, that's a regression. And we should probably fix it as soon as possible. But if a user comes to it and say, oh, yeah, I have this 10% performance regression,

Starting point is 00:24:22 and it's like, I wish we had time to look into all of this. But sadly, there are more pressing issues to deal with. And yeah, everything display related, like if DisplayPort doesn't work or HDMI on certain GPUs or regressions in general. I think we try pretty good to at least not request the driver but deal with the issues and you know at least make sure that display is working. Yeah. But you're really sort of fighting this like really uphill battle trying to just get something that like it's it's a massive undertaking to get something like this to to actually work in the

Starting point is 00:25:14 state that it's currently in and the fact that it does as much as it does is already like a a testament to how much work has been done by all of the people involved. Yeah, I think the biggest problem was just that before we started NVK, I think it would have been a fair estimation to say there were like two full-time developers on the entire project. Okay, yeah, that makes sense. So there's just so much we can actually do there yeah and it got a lot better with nvk um well i guess that leads us perfectly into nvk then uh excuse me i said i guess that leads us perfectly into nvk then we can talk about that yes yes yes yes so uh i guess we're always asking about working driver for NvO.

Starting point is 00:26:07 What I was going to say is, what is NVK? I think you're about to explain it anyway. Yes, so NVK is the open source working driver for NVIDIA GPUs. It's still built upon the same kernel driver. It's still built upon the same kernel driver. And I don't know if the new compiler is merged, but it also shared the compiler with the OpenGL driver for quite some time. Yeah, I think the new one isn't shared yet. So it doesn't matter.

Starting point is 00:26:43 But yeah, it's basically just open source implementation. And the main idea was to start the project in a way where we are doing things right and more sustainable than everything. Yeah, like a lot of issues with the OpenGL driver, we didn't want to repeat the same mistakes there. So yeah, it actively makes use of the documentation NVJ is giving us.

Starting point is 00:27:17 And it lives inside MISA and makes use of the same infrastructure we have for all the other Vulkan drivers there as well. You did mention documentation there but I've seen some write-ups from I think yeah I've seen some of the the collaboration write-ups where it specifically mentions multiple times reverse engineering reverse engineering reverse engineering how much documentation do you actually engineering reverse engineering how much documentation do you actually have from nvidia and how much of it is just trying to work out how this device act how this can fit together yeah um or maybe i like file files like these um

Starting point is 00:28:12 which especially like you see that they don't really explain anything they are just you know And I see. But at least it allows us to search for terms and figure out where to look, basically. And sometimes it's straightforward. We just search a term and then, yeah, play around with it a little bit and figure out how it works. Sometimes it's a little bit more complex. And there has been work to write a new tool

Starting point is 00:28:54 to be able to reverse engineer their working driver as well, so we can also compare against whatever NVIDIA is doing. And do reverse engineering against their driver in case we need more information so when you talk about like reverse engineering a vulcan driver like what is that what does reverse engineering actually mean in this context like what what sort of work would you be doing to to work that out, so most of the time, because we also have this Vulkan CTS repository with like hundreds, thousands of tests, there are a lot of small tests

Starting point is 00:29:35 which test very specific cases or parts of the Vulkan specification. And what we would do for reverse engineering is to just check what commands the nvidia driver sends to the gpu and then just try to figure out what to do on our end as well okay so it i'm sure that's a very long and tedious process. It kind of depends. I mean, the tests are pretty small, so it's not that bad. But there's a lot of...

Starting point is 00:30:14 But it would be... Yeah. But it would be, like, more complicated if you, you know, try to figure out certain things they're doing in games or something. Because of the overload of information you would be getting there.

Starting point is 00:30:31 So it's approaching one little command at a time and then over time you build those up and then you get something that passes the test and hopefully also works properly as well yeah something like in the best case it's just one unknown command but sometimes

Starting point is 00:30:57 it can also be you know multiple commands and what they also have is a macro engine on the And what they also have is a macro engine on the command processor. And they can, like, what they're doing is they can execute a macro and it would generate more commands for the GPU. And that's a little bit more tricky to figure out. But yeah, there's, you know, it always depends on the test. Sometimes there are more But yeah, there's, you know, it always depends on the test. Sometimes there are more commands involved and sometimes not. So when did NVK get started? Because I know it's a fairly new project. Yeah, I think it's already like two years old. Oh, wow.

Starting point is 00:31:44 Wow, it has been around for a bit. Maybe not quite. One and a half year, I think. Yeah. It's really impressive how far it's come along. I remember seeing articles from the start of this year about how it can actually play a video game now, which is impressive. Very impressive, actually.

Starting point is 00:32:04 Yeah. Yeah, we actually have, like, on our Novo channel, we have users actually trying out NVK on DXVK and Proton and everything. And, yeah, some games are already running.

Starting point is 00:32:22 It's very slow, but... Right. It's running. i'm seeing this article from gaming on linux that was talking about your post from the start of this year with our talos principal running the screenshot has it running at five fps which is more than one that's it's a lot more than one yes so it's functioning it's not i wouldn't call that playable but it's functioning yeah yeah i think this was uh without the gsp firmware um so it was like with the basic most basic clocks um which are usually quite slow on modern gpus um yeah i think some people some users are already trying it out with GSP. I think

Starting point is 00:33:08 it's already got merged upstream, like the GSP support as well. So I think it will come in with Linux 6.7. I remember reading a Pharonix article about that, yes. I think just a couple of days ago it was merged. 6.7 GSP. Let me see. Best week Linux 6.7, blah, blah, blah. Yeah, GSP should be in 6.7. So it's going to take a while to make its way out to the other distros, but on things like Arch, that'll be out there fairly soon.

Starting point is 00:33:48 Yeah. That's really cool. Let me go over to it. Yeah, we don't really enable it by default on GPUs we already support. So the first one to actually use it is Ada, and that's the 40 series. Oh, really new, okay. Yeah. It can be used on Turing and Empyr, which is like 20 and 30 series, but users would

Starting point is 00:34:18 have to opt in. And, yeah. Is there any reason that it's... Add a kernel command line flag. Is there any reason why it's going with 40 first, not also doing those older ones? Oh, yes. For the 40 series,

Starting point is 00:34:37 we haven't gotten any firmware from Nvidia so far. So that's the only GPU you can use with GSP anyway. Okay, right. That makes sense. Yeah. And for the previous generations, we already have firmware. And in order to not request users,

Starting point is 00:34:56 we don't use it there yet. But we do plan to flip the switch at some point once we are more comfortable with flipping it. So that article I saw was from the start of this year. So I doubt you just have benchmarks sitting in front of you right now, but how much better of a state is it in than it's basically like it's like 10 months ago like how much how much better is it at this stage i don't know but i think michael said he's planning to do some benchmarks on the article.

Starting point is 00:35:49 I think. Yeah, in the last section of the article. Ah. Of the ESP binary firmware blob article. OK, I can't. ESP binary firmware blob article You might be having a look in a slightly different article There's a lot of things that go up on this website No, I just quoted the part to you so Here we go

Starting point is 00:36:25 Now to you so. Oh here we go. Oh no. Yeah you are reading a different one than I am. Okay that's why I can't find it. Now on to benchmarking this new Novo Sport with Linux 6.7 to see how much it improves the open-source situation for RTX 20-30 series hardware as well as the initial RTX 40 GPU Sport on Novo. For those on GPUs prior to RTX 20, this firmware isn't relevant, with the GSP only being introduced with RTX 20 Turing GPUs. Okay, that makes sense. So this is like...

Starting point is 00:36:55 Yeah, so... So it's for like fairly recent GPUs. So... Yes. Yeah. I thought you were going to answer that. GPUs. So, yes. I thought you were going to add something to that. Yeah, yeah. I assume what you are going to ask, but I think

Starting point is 00:37:13 like the, not that I think, but the generations between Kepler and Turing probably won't get any reclocking support ever. That makes sense. So that's very unfortunate.

Starting point is 00:37:30 But if NVIDIA is not giving us any firmware for that and we ask then it's not going to happen. I mean there are some people who try to figure something out but it's really not sustainable.

Starting point is 00:37:47 Yeah. It's understandable. Like, I... Like, I... Okay. I have similar discussions with people about, like... Because I talk a lot about Weyland. And I will have people message me, like, I'm on a Kepler GP or whatever.

Starting point is 00:38:04 And it doesn't... Like, Weyland doesn't work great on my GPU. It's like, I'm on a Kepler GPU or whatever and it doesn't, like, Wayland doesn't work great on my GPU. It's like, yes, but you also have a 10-year-old GPU. Like, at a certain point, like, Nvidia's gonna stop supporting it. Whether they should stop supporting it at the

Starting point is 00:38:20 point they stop supporting it, you know, that's a whole other discussion. But there is gonna be a cut-off point where they have to just say, this is just too old, it's not our problem anymore. And look, it is what it is, basically.

Starting point is 00:38:40 It would be nice if they went back further, but, you know. I mean, they can always try to use Novo. No, I mean, I think on Kaplan it would be fine, because we also support the Kaplan GPUs in NVK. Well, kind of. I don't think it's at the same level as Turing, but it kind of works.

Starting point is 00:39:04 It's better than nothing. But yeah, all the GPUs in between it kind of works um it's better than nothing um but yeah all the gps in between are kind of yeah not much we can actually do about you can also frame it slightly differently with 20 series being the like early cutoff point everything going forward will not be a problem so in a couple of years when you buy an old 20 series card, that will support it. Oh yeah, that's true. And those GPUs are already like five years old. Is it that long already? Yeah.

Starting point is 00:39:38 The first Turing was like September 2018. What the hell? Uh, tiering architecture 2018. What? Okay! Oh! Yeah, okay, that was a while ago, wasn't it? Jesus! Yeah. Wow, that feels like it just happened. Wow, that feels like it just happened. Maybe it's because GPUs started getting really expensive starting around then,

Starting point is 00:40:11 so a lot of people just didn't upgrade. It's been like... There's less excitement I hear about GPUs nowadays than... Because I first started getting into PCs back during Kepler, and people got super excited every time a new generation came out, and now it's like, okay, but it costs as much

Starting point is 00:40:32 as a car, so like, do I want to buy it? Probably not. I'll just stick... There's a reason why the... I think it's the 1650 is the most popular GPU on Steam Hardware Survey. Yeah. Yeah. Yeah, and that's

Starting point is 00:40:48 also supported, by the way. It's just weird because the 16 series is newer than 20. Just in case somebody doesn't make the connection. It's also Turing. It's just Turing without ray tracing. Right, I forgot about

Starting point is 00:41:04 that. What? I want to know who at Nvidia thought that that naming scheme was a good idea. I really do. Yeah, it's always been confusing. Yeah, it's probably not going to be that long until we do another reset of the numbering? The numbers are getting a little bit too high. We're going to get back to the 100 series at some point. Give it probably a couple more generations. They went from three digits...

Starting point is 00:41:33 They started with four digits and went to three and now two, so I guess they will go with one digit. Yeah, just NVIDIA 1. Honestly, I wouldn't... At the end of the day, it's still better naming than monitor naming. And monitor naming, I just...

Starting point is 00:41:50 I don't know what goes on there. Outside of the size of the screen, it's like just throw random letters at it. That's fine. Some will work it out. Yeah. So... Okay. With NVK, out yeah so okay with with nvk why why was something like this needed like what what was deficient about the current method that was being done and that something like nvk like MVK needed to be around?

Starting point is 00:42:27 I don't think that the need was really, you know, because the OpenGL driver wasn't that great, it just we wanted to have a working driver. But there we also think about the idea of dropping the OpenGL driver in favor of Zynq. Mm-hmm. Because it might just be faster at this point in time, because more work is done on Zynq to make it actually run faster than ever was put on the OpenGL driver. And nobody really cares about the OpenGL driver.

Starting point is 00:43:03 So the performance will probably not improve there anyway. So yeah, and I think one of the big problems of the GL driver is that nobody really actually figured out why the performance is bad. There are some assumptions and there are ideas on why it's bad but you know nobody actually put in the time to actually performance optimize the entire driver i've heard a little bit about zinc here and there but i imagine a lot of people listening probably have no idea what that is so briefly explain what zinc Zynq is an OpenGL

Starting point is 00:43:46 driver inside MISA and instead of talking to hardware like the NeuroDriver, it's talking to a Vulkan driver. So we have this so-called Gallium

Starting point is 00:44:02 framework inside MISA which is just a driver abstraction layer. The OpenGL driver is using this abstraction layer to implement functionality. And Zynq is translating this abstraction layer onto Vulkan. And it's able to provide not just OpenGL, but also in theory and in practice other APIs on top of Vulkan as well. Okay.

Starting point is 00:44:37 So theoretically, if the OpenGL driver didn't suck, let's just assume that. Is there some sort of overhead of doing it through this method? Or is it not noticeable? Like... Is this a- obviously it makes it convenient to write it, but is it a good idea if there was more resource to do the OpenGL driver directly? Um... I wouldn't say so. Um...

Starting point is 00:45:07 At least not anymore. I mean, Marek from AMD spent a lot of time like reducing the CPU overhead of Gallium and D-Wide DNS iDriver. So I think if you really want to reduce the overhead, that's entirely possible. I think some abstractions might be not perfect, but we are also free to change them as we go.

Starting point is 00:45:33 So it's not like a fixed API we can't change. So if there are performance problems, somebody can look into them and fix all the drivers or something. somebody can look into them and fix all the drivers or something. It's also not like that Gallium is actually a library. It's really just an API that driver exposes directly to the OpenGL implementation. And the OpenGL implementation mostly

Starting point is 00:45:59 is just responsible for tracking the state of the OpenGL context. So if you bind a texture or something, something has to track this stuff. And then at some point, you also have to call into the driver to draw stuff or allocate memory and this kind of stuff. So you mentioned in there, is that T in there? What do you got, is that tea in there? What do you got in the pot?

Starting point is 00:46:28 Yeah, it's tea. I just noticed you keep filling it up. How many of those have you had? It's not much, it's like 700 millilitres the entire pot. I just keep seeing you pour a bit more in there. I only pour like a little. Okay. I thought you had like three or four cups no

Starting point is 00:46:51 it's not that bad yeah am i um i was sick like last week and my foot is still a little bit so that's fair that's fair it helps's fair. It helps with, um, speaking. Yeah, totally understandable. Um, so you mentioned Zynq would later support like other APIs. Uh, what else, you can probably see where we're trying to direct this now. Uh, what else would this let you support? Um, OpenCL for example. And it's already merged and everything. Um, it works. Mm-hmm. Mm-hmm. And it's already merged and everything. It works.

Starting point is 00:47:28 It's great. I actually have ran the OpenCL conformance test on RedVee, and it's passing. So it's even feature complete and everything. It's great. So... Oh, go on. Yeah, there's still like, at some point, I plan to make an official conformance submission.

Starting point is 00:47:55 But for that, I need a second driver it runs on. And it has to be an independent one. And I'm not quite sure what exactly that means, because we have various work drivers of various vendors inside MISA, but I don't know if they are like independent enough. Yeah. So I'm also looking into making it work on Nvidia and yeah. That's cool.

Starting point is 00:48:22 So what is OpenCL? What is the purpose of OpenCL? Yes, OpenCL. It's actually a compute API that Apple came up with. Okay. And this was like... I have to do math, but I think it's like 14 years ago. It's like when the specification was released. And I think Apple looked into how

Starting point is 00:48:59 to accelerate their UI in the operating system came up with this API, I think. I mean, I didn't really look at the history, but I think the trademark was originally Apple and might still be. I don't know, but they started it. And the rough idea was that we have those GPUs, and it would be nice to use them for everything.

Starting point is 00:49:30 And instead of running your code on the CPU, you can also run it on the GPU. And the programming language you use for OpenCL is OpenCL C. And it's actually a C-derived standard. So it's C with a bunch of stuff on top. And yeah, the main goals were just we want to have higher power efficiency, run stuff on a GPU, and do crazy stuff.

Starting point is 00:50:07 I don't know precisely what they were using it for. But yeah, that was kind of a general idea. I think it was in the time where the GPU GPUGPU phrase term was still a thing. I don't know if you heard. I don't think so. I just had a look. Apple still does hold the trademark to OpenSeal. It doesn't matter at all.

Starting point is 00:50:38 Just had to check. Yeah. yeah yeah there was like i think the gp gpu gp gpu term was like a fancy word a few years ago gp gpu that's a yeah that is kind of annoying to say isn't it gp gpu uh general purpose graphics processing unit is a graphic processing unit that is programmed for purposes beyond graphics processing, such as performing compu- oh, okay, so it just means compute on a GPU, right. Yes, so

Starting point is 00:51:13 the history for that is that at some point, even OpenGL got, like, shaders. It's kind of at this time when you didn't have this fixed function stuff where you say, OK, this triangle goes there and has this color. But the industry transitioned to shaders.

Starting point is 00:51:33 And this also had implications for the GPUs because to run arbitrary code, you also need something like a CPU on the GPU instead of your fixed function graphics pipeline. And Nvidia was also kind of big with this, where they also came up with CUDA around the time, I think even a little bit before. And what they were doing is they were looking at this and said,

Starting point is 00:52:01 you know what? All the shaders we are doing, it's all the same with compute. It doesn't really matter what it is. We just have one thing for the entire thing. So since forever, how you are programming graphics shaders or compute shaders is on the hardware, it's basically the same. So that's kind of like the rough idea of where all this

Starting point is 00:52:27 can come from. As you were saying that, I checked, Coot has been around since 2007. I didn't realize... Yeah, it's a little bit older, I think. Yeah, it's been around for a little bit. Yeah, I think Tesla was the first GPU to support it.

Starting point is 00:52:45 Jeez, okay. That's just like 8000 series. Right, okay, before the numbering reset. Yeah, uh... Yeah, GeForce 8800. Yeah. Yeah. That was... that's a while ago.

Starting point is 00:53:06 Yeah. So everything is quite old. Yeah. They came up with CUDA and Apple had their own thing. And it transitions at some point to Kronos to make it cross-vendor. I don't know if that was the reason but that's kind of what happened yeah so so what is this

Starting point is 00:53:31 RustyCL thing that you've been working on yeah so I started it mostly as trying to learn Rust and because I was also like you know I started it mostly as trying to learn Rust. And because I was also like, you know, at Red Hat, we have like those days of learning where we can dedicate some days to just learn

Starting point is 00:53:57 whatever we want, basically. OK, OK, that's good. And I always wanted to go into Rust. And I was thinking, because I was also involved with Clover, which is the old OpenCL implementation inside MISA, written in C++. And I'm sure there are a lot of people loving C++ and everything, but it never really

Starting point is 00:54:22 was that much by most MISA developers. So a lot of people didn't really like to work with it. And I was also not a big fan of OpenCL. Not OpenCL, I meant C++. Yeah. And so because I was involved in Clover and thought, oh yeah, we can do maybe a new OpenCL implementation, and we could also figure out what would it actually

Starting point is 00:54:50 mean to support Rust inside MISA, or how would, if Rust becomes the cool new language everybody wants to use, what would be a migration path for MISA? Because at some point, you can say, OK, we will always stick with C. But that also comes with the risk of maybe all the new developers don't want to program in C anymore. And that's kind of bad for a project

Starting point is 00:55:18 if C is all what you have. And so I was thinking, I can just spend some time figuring out how to use Rust to implement APIs inside MISA, and how all the integration would work out, all the cargo stuff, and compiling Rust code with the current build system, and just how it would fit in most people when they say they're gonna learn a language they don't go and implement

Starting point is 00:55:52 OpenCL for TV like that's not a normal way that most people learn how to write a language might be no it's awesome that you did like someone's gonna someone's gonna be the one who wants to do that but um so it started as like nothing look were you so you didn't start like with the intention of taking it anywhere if if i understood that correctly like initially um yeah initially it was more like a prototype i just wanted to figure out if it would work at all um at what point it was just you know it was just fun to work on it so i I kept working on it, and that was basically the reason. At what point did you realize that maybe it's actually a good idea? Good question. I think when I was creating the merge request for it, probably.

Starting point is 00:57:07 I think you worked that out a bit late. But yeah, I think it was quite some, I don't know how long I worked on it before I started to merge it maybe a year maybe half a year i think it was a year because people were kind of aware of it i remember talking about it back when it was like first getting a bit of attention yeah i think i was talking about this in xtc 2022 okay the first time maybe i think i had a lightning talk in 21 already just saying okay this is what i was doing uh what do you think and uh yes yeah i had a lightning talk on the last day of 2021 uh on xdc so it's not been that long that then since like it's

Starting point is 00:58:20 it's been like a serious thing it's still like relatively new yes yeah I yeah it's between two and three years something like that jeez okay so you got a lot of

Starting point is 00:58:40 it seems like a lot of the stuff you're involved in is like sort of new obviously Novo's been around for a long time but like both NVK and Rusty lot of it seems like a lot of the stuff you're involved in like sort of new obviously novo's been around for a long time but like both nvk and rusty cl like these fairly new additions to the way that both compute and graphics are done on linux um yes maybe i didn't quite get what you're trying to say oh okay no i was just saying um i was just saying your how how would i rephrase it um i don't know how to rephrase it, actually. So, anyway. We'll move on from that.

Starting point is 00:59:29 So, what sort of state was Clover in, like, beforehand? Because I know there is this open merge request on the Mesa project. I just love the title. Just called Delete Clover. Which is a great title straight to the point um how usable was clover if at all were there was there anyone who was working on it at the time is there anyone working on it now yes so uh nobody's working on it. And so I kind of started. I don't really know the history of why people started it, like precisely, just that people started

Starting point is 01:00:15 to have an OpenCL implementation. And when it was made, it was in the time when LLVM came around. And we had this LLVM compiler for AMD GPUs. So what Clover was doing is it was very LLVM-centric at the start. And it was not using any of the other compiler infrastructure we had inside MISA. There was like the time even some Niveau developers were thinking about moving to LLVM and to have all the GPU compilers inside LLVM.

Starting point is 01:00:56 And as everybody maybe knows, that didn't happen. So we just write our own compiler inside MISA for every hardware. And even the Radion SI driver is probably moving to Echo, which is like the AMD backend compiler for RedV soonish. And that was kind of a limitation of Clover, that it only worked for GPUs with an LLVM backend compiler. And that included the, you know, it was basically just AMD.

Starting point is 01:01:43 When I was getting involved in the project, can you still hear me? Yeah, yeah, I can hear you. Can you not hear me? Yes, because I'm running Discord in the Firefox tab and now Firefox is complaining of it's not responding, so I was a little bit, what is going on? Oh, this is great. I can't do anything, but, but anyway, as long as you still hear me, I can just ignore it, I guess. I guess, hopefully.

Starting point is 01:02:19 As long as the tab doesn't crash. Yeah. I have no idea. Anyway. Yes, so when I was getting involved... Oh, there we go. Now the tab's gone. I think we lost him.

Starting point is 01:02:46 I wonder if he comes back. I don't want to cut this, because this is amusing as it is. Oh, I think I might actually need to... Alright, we'll see what he says. It still says he's online. Uhh... I don't know... There he goes!

Starting point is 01:03:23 There he goes! There he goes! Okay, we'll cut back to when he's back. Okay, I'm back. How long was I gone? Uh, no, two or three minutes. I- Oh no. Not long after you said,

Starting point is 01:03:38 uh, your tab was fine, it hadn't crashed yet. It crashed. Ah, yeah. No, yeah, okay. Anyway, it's working now. Yeah, the main deal with Clover was just that it required this LLVM backend compiler.

Starting point is 01:04:05 And we have this near thing inside Mesa going on, which is kind of like our own compiler infrastructure. And all the other drivers are using it for Vulkan specifically. So yeah, what I was doing was for Nouveau to support near, which was also then used for Vulkan, and make Clover be able to use it as well. Mm-hmm. I think I would be lying if I say I know exactly what's the problem with Clover.

Starting point is 01:04:40 Mm-hmm. I think for me, it was just annoying and frustrating to work with it or develop on it. So yeah. And it just... It was just more fun. How has the experience been learning Rust? Oh, it's pretty great, actually. Yeah? And I know that I like a lot of people who have strong opinions on telling

Starting point is 01:05:11 them what to do. And for me, it's great. I know that developers are making mistakes all the time. That's totally normal and totally fair. And when a language can help me with not making those mistakes, then it's really great. With C, you usually have always this risk of having some out-of-bound memory access

Starting point is 01:05:39 or use after free and stuff like this. And at least with the things I was doing for a year, it's mostly something I never have to bother with. So even if when I'm running into heap corruption or something like this, then it's just me using the C code incorrectly inside Rust and having bugs in the wrappers around the C code incorrectly inside Rust and having bugs in the wrappers around the C functions. Yeah, so it prevents a lot of bugs,

Starting point is 01:06:13 and it's really helpful with this kind of stuff. I was occasionally seeing.. It also has a strong and big standard library, which is kind of a pain. Because even basic things like linked lists or something, it's like everybody has to write their own implementation. And having the standard library and everybody kind of agreeing on what it's there really helps with having also

Starting point is 01:06:44 other developers just you know look at the code and saying oh yeah those things you should probably do differently and stuff like this i often see people complaining about the it's not just a rust thing but i'll see this with like typed languages as well people complaining about the compiler telling them they're doing something wrong the compiler's telling you you're doing something wrong because you're probably doing something wrong like a uh my my my favorite example of this is anytime you take a javascript developer and stick them in typescript and you have a type system that's actually there like they get very confused because the compiler is telling them they're not allowed to like do

Starting point is 01:07:24 certain things with types like yes because what you're trying to do is a bad idea, so stop doing it. And I'm sure Rust is much the same in that way, where it's like, stop doing that. The compiler is complaining because that's a bad idea, so stop it. Yeah, I would say that Rust goes further, because it also manages ownership of values. Yeah. And that's what tripping off a lot of programmers. And yeah, because it tells you that if multiple things own

Starting point is 01:08:00 something and they have mutable access to it, you might have a different state than you expect this value to be and stuff like this. And it's quite heavily enforced. Yeah. Some people just, I mean, it's sometimes a little bit annoying because you have to write code in a way where you don't run into those issues.

Starting point is 01:08:24 write code in a way where you don't run into those issues. But I would also say that it generally leads to cleaner code anyway. It's easier to, from thinking about the code perspective, because a lot of the errors you can just page out of your brain and don't have to think about certain issues anymore um like in c it's totally fine to return the address to stack memory and maybe the compiler complains maybe it doesn't um yeah stuff like this so as you were learning rust how did you go about doing so while writing this

Starting point is 01:09:05 driver? Did you just like look at the documentation for Rust and just wing it basically? Like what was, what was your approach? Yeah, I think it's basically that, um, they have this, uh, introduction thing on the Rust link website, I think, which I looked into. It was just basic stuff on how to write Rust code, and so nothing really complicated. But yeah, but then at some point, I just say, OK, I want to write code.

Starting point is 01:09:40 I'm not really a good documentation reader, so documentation is kind of things I avoid. And documentation in the sense of big block of text on how to do something. I look into the reference. If it's reference manual, I don't know. But like the actual documentation of the standard

Starting point is 01:10:07 library. Right. That's something I usually look a lot into. They have examples there on how to use code and how to use the functions and types and everything. So how long do you reckon it took you to start feeling comfortable writing Rust code? Not quite sure. I don't know if I'm comfortable yet at the level where I can say, okay, I know what I'm

Starting point is 01:10:39 doing. Sure. But I'm comfortable enough to just write code and the compiler is not getting too much into my way. I think there's still a lot I need to learn and more, you know, write code, more idiomatic and stuff like this, but I think it's not too bad. See, most people would have answered i feel comfortable now like you have a rust implementation of opencl in the mesa project and you're like i don't feel comfortable writing

Starting point is 01:11:13 rust yet no i mean from uh um or is this code like optimal enough or is there a way to do it better and stuff like this sure um yeah i can write code and that's totally fine. I um, yeah, I wouldn't feel comfortable enough to express my opinion on how Rust code should look like. Mm-hmm, okay, now that's fair. If that makes more sense. So you, you know enough to get what you need to get done, but you're not like a, you're not a Rust specialist.

Starting point is 01:11:44 Yeah, I think that's a fair summary so what is the state of rusty cl at this point it's a good conformant open cell implementation on some Intel hardware. So I filed for official conformance at the beginning of this year, I think. Yeah. So it's passing the OpenCL. There's the conformance test suite, which a lot of tests,

Starting point is 01:12:32 and I'm generally testing against that. So that is what basically works. And a lot of applications are already running, I think. Oh, OK. It's not enabled by default yet, just for stupid reasons. Or it was actually last year, end of last year. So kind of a year since it's conformant. Some OpenCL code is really, really heavy.

Starting point is 01:13:09 And the way we are compiling code inside MISA is that we basically inline everything into one huge function doing everything. And this can lead to some benchmarks using 30 some benchmarks using, like, 30 gigabytes of memory or more, which is a problem. So I'm not really comfortable enough to enable any devices by default yet because of this. Because, you know, if you start something and your system goes out of memory because of that, it's kind of bad.

Starting point is 01:13:45 But besides that, a lot of stuff is working. Recently, we also merged the support for OpenGL sharing. So there's an OpenCL implementation to import OpenGL objects into your OpenCL application. And we had a student intern working on this. And they've done a lot of great work on this. And I've mentored him to work on this project. And I'm very happy that we finally managed to merge it.

Starting point is 01:14:28 I think it was like the internship ended like nine or 10 months ago or something. And there was still a lot of details to cover. And the student, Antonio is his name, stuck around and helped out with random bugs and putting it into proper shape. So, and with that, you can actually run applications like DaVinci Resolve on MISA out of the box. Because I know that was one of the issues with um With davinci, I think you needed to use the proprietary amd gpu drivers to actually

Starting point is 01:15:12 Get anywhere with it previously Yeah, that uh, that might be right because now nvidia's I know that the cop sorry go on um Sorry, go on. Yeah, I think there were issues the way they were. The big problem with this GL sharing implementation is that it requires private extension on the OpenGL side as well. And the way AMD implemented it against MISA was kind of weird and probably not working well with MISA.

Starting point is 01:15:50 But I also haven't really tried it. I know that debugging DaVinci Resolve is a mess because it starts like hundreds of threads. And every time there's a bug, then I look at GDP, it's like, oh yeah, thread number 500. It's like, oh, yeah, thread number 500. It's like, oh, yeah, okay, fine. So it's heavily multi-threading, and a lot of bugs could trigger just by doing stuff. And I know, for example, if the video, if there's like a bug,

Starting point is 01:16:22 then the video preview wouldn't show and stuff like this. It's, yeah, It's a mess. So I guess it's sort of just as extreme as trying to test out, like, NVK, like, actually try to get NVK to work, like, with a game. If instead of doing the, like, individual test going through that, like, actually trying to, like, reverse engineer, like, from, like, a full game trying to run now

Starting point is 01:16:46 i probably worded that really badly um it's a lot easier i think it makes sense yeah okay good i hope it makes sense yeah there's just a lot of stuff going on with it enough yeah yeah definitely it's uh yeah it's uh it's a complex application. It's doing a lot of stuff. Yeah, and I also have other projects running. It's really hard to find fancy applications using OpenCL, because it's not graphical. It's not visually like a game.

Starting point is 01:17:24 You can impress people really easily. But yeah, DaVinci Resolve is definitely something people are happy to see working on more hardware and out of the box and everything. Yeah, I've seen some people talk about using RustyCL with

Starting point is 01:17:42 Darktable, but that's... Just looking at Darktable as someone who doesn't understand what dark table is it's just like okay like what what what am i looking at here like what what's what's so special about this yeah i mean dark table is just uh like um photoshop i would say, is equivalent. I don't know if it's like, or maybe more like darkroom is I think the, or Lightroom or something. One of those professional photo editing tools. Yeah, Lightroom is one you're thinking of, yeah.

Starting point is 01:18:20 Yeah, I think Lightroom is the proper equivalent. Yeah, and it's basically like you have, if you have your fancy camera and all the raw pictures you are importing there, you want to make it look better, essentially. And you have hundreds of filters to achieve that. And as far as I know, all those filters can also run through OpenCL.

Starting point is 01:18:45 So instead of using the CPU, you can actually run it on the GPU as well. And it usually means that the power consumption is going down. It's not necessarily faster on Intel, but instead of having your CPU going to 100%, it doesn't do that and the GPU is maybe a little bit busy. But yeah, your laptop is heating up less and on AMD it's even better. Like it's faster there than, you know, using the CPU.

Starting point is 01:19:18 Yeah, because AMD's put like, they've made a big deal about their whole APU things and they've tried to make their GPU core really powerful. That was a big push they started doing. I think when they started doing Ryzen, that was where they made that shift. Yeah, it might be. It's really cool that something like DaVinci

Starting point is 01:19:40 actually works. That's just cool yeah I would like to share a video but I don't think it's a good idea because I used material and there's raw video materials being copyrighted and everything

Starting point is 01:19:57 and I would have to ask permission don't show anything that could be a problem for you yeah so but yeah I have to ask permission. Yeah, no. Don't show anything that could be a problem for you. Yeah, so, but yeah, I, yeah, it works. I had to fix a few bugs, but yeah, it's, I think with like MISA 24, it should be working. What's the current version now? At 23.3. Okay. At some point next year. Oh, okay. So not that long away then.

Starting point is 01:20:33 Yeah. That's... That... that... yeah, that's just really cool. Let's see, where can we go from here so i actually don't know i i don't know we can go from here um i did ask people like in my like channel uh about like any anything they have to ask you so let's see what they have let's see if there's any questions that actually make sense in here uh anything we didn't already already cover in here

Starting point is 01:21:12 of course we've got questions about gsp please ask him if there are any plans for i'm just going to read the question if the question makes absolutely no sense we'll just move past it gonna read the question if the question makes absolutely no sense we'll just move past it um that's fair uh please ask him if there are any plans for s y c l is that supposed to be is is that oh camera's back uh yeah i didn't notice i haven't enabled it no no it's good oh now the camera is in the wrong order okay that's fine's fine. Oh no, I messed it up. It's fine. Please ask if you have any plans for SYCL on RustyCL, and when does he see Novo plus RustyCL becoming a viable alternative to CUDA?

Starting point is 01:21:57 Jeez, that's a... Can you quote? Can you post? Yeah, I will just post the comment. I don't know if maybe there was a misspelling in there or something. Oh, Cycle. Oh, Cycle. I think it's pronounced Cycle.

Starting point is 01:22:16 Okay, I've never actually heard of that. Yes, so Cycle is a fun story. I can definitely talk about it for quite some minutes. It will also go into compute ecosystem madness. That's fine. I don't know much about compute. Yeah, so the big advantage we have with Cuda is, maybe I start with OpenCL. So OpenCL has the host code, application code

Starting point is 01:22:44 you are writing inside C. And then it has the OpenCL C kernel language, which you write in a separate file. And it's a different syntax and everything. So at runtime, you kind of have to load the file and into a memory buffer and then compile your code from there and stuff like this. So it's not that great.

Starting point is 01:23:02 CUDA, on the other hand, put the gpu code and the cpu code inside the same file you compile it with one tool and then you're good to go like you call the gpu functions in a special way but the compiler is making sure that it all runs on the gpu um so people kind of like this model more. And people came up with Cycle, which is kind of the same thing, just open source and well, Cycle is just an API. But we have open source implementations, and it's kind of there to be C++ focused, which Scuda, I think,

Starting point is 01:23:47 also is. And you basically write your own, the same code for the GPU and the CPU in the same file. They use template for some of the magic. They compile the, well, now it's into implementation detail. So the problem with Cycle is that it's an API defined. It's not like the application API is defined, but not the runtime.

Starting point is 01:24:20 So you don't have any actual runtime with Cycle. You have it with OpenCL, you have it with OpenGL or Vulkan. So if you compile against OpenCL, OpenGL, you can run it on every driver. That's not the case with Cycle. There are Cycle implementations like the Intel one, which is able to layer it on top of OpenCL. And in order to do so, it compiles the GPU code into SPOV.

Starting point is 01:24:49 A SPOV is just some abstract intermediate representation for GPU code. It's like assembly, but not like this. And it's also used in Vulkan. So for everybody who doesn't know what SPOV is. And the problem is that there are bugs inside the Intel implementation which actually violate the OpenCL and SPIRV specification. So the question is always, do we think,

Starting point is 01:25:24 from a MISA perspective, do we think, is it important enough to work around those bugs or not? Or file the bugs and hope they fix it? I have a merge request, which I hope I can merge soonish, which kind of makes sure basic works. The biggest problem is that there is this optional API in OpenCL to accept SPURV instead of OpenCLC. And I actually validate the SPURV.

Starting point is 01:25:57 And the validator is just screaming at whatever I'm getting from cycle and says no. So I had to add a debug option to say, OK, I ignore what the validator is saying. And this can lead to crashes. It might not. Some stuff works, as far as I know. But yeah, having to deal with invalid speed is always like,

Starting point is 01:26:22 it's difficult because you also don't want to make your code way more complex just to accept buggy code for something which isn't really heavily used as much yet. Yeah. But yeah, it's like I work on this, and I kind of want to make it work, but there are challenges and everything.

Starting point is 01:26:48 So yeah, I think that answers the question. Oh yeah, a viable alternative to CUDA. Good question. I don't know, honestly. I know that NVIDIA has a very good compiler. And I know that in MISA, we would have to add a lot more optimization to match the performance of NVIDIA.

Starting point is 01:27:21 I think NVK is probably a better approach to get closer, because Faith was also starting a new backend compiler for Nouveau. And that will be critical to match NVIDIA's performance there. And once the compiler is in proper shape and compiles code pretty, like, is able to compile the quick code,

Starting point is 01:27:48 I was also wondering, thinking about using it for OpenCL. If it would match, like, if OpenCL would be able to match the performance of CUDA, I don't know, honestly. So we have to see. But I think it's good if we would have other alternatives besides CUDA in the compute ecosystem, because everything else is kind of weird.

Starting point is 01:28:27 Well, whether it's... The biggest advantage you have with CUDA is that you install it on your system. It doesn't matter what NVIDIA GPU you have, it runs. Right. You don't have it with Wacom, which is like an AMD clone of CUDA. They call it HIP, which is kind of the same API,

Starting point is 01:28:48 just renamed and a little bit changed. But their difference is that, for example, in CUDA, you have PTX, which is like Spirvy. It's not targeting a specific GPU. So if you're compiling a CUDA code, you can run it on every GPU. And on Wacom, that's not the case, because they compile to GPU code directly. So if you have newer GPUs with a new ISA on the shader

Starting point is 01:29:14 processors, it doesn't run. So you have to recompile the code. This is totally fine if you think in the HPC mindset, where you know what hardware you're targeting and you want to get the most performance out of this hardware as possible. But from a developer, I only have a desktop, laptop kind

Starting point is 01:29:37 of situation, it kind of sucks. And they have to improve on this. And the same is also with Cycle. They also are more driven from the HPC mindset, but they are improving. They are currently working on adding infrastructure codes to LLVM. So it's less of a problem there.

Starting point is 01:29:59 And I think there are also people working on the runtime API, but I don't know. There might be rumors. I never talked to anybody actually knowing anything there. But yeah, once this problem is solved, at least Cycle might be a good alternative. But so far, I think, at least for the things I care about, and that's mostly non-HPC stuff, it's more like desktop stuff,

Starting point is 01:30:27 where you have DaVinci Resolve and this kind of stuff, then besides CUDA, you only really have OpenCL. And even if it's not an alternative in terms of performance, it's the only alternative you have to actually have cross-band or GPU compute support right when I was saying a viable alternative, I don't think

Starting point is 01:30:51 it necessarily means one-to-one sort of performance but having something that actually works it probably won't be like you're not going to see giant GPU farms, switching from NVIDIA like and or whatever like switching to opencl but like actually having something that's like usable i think is definitely a good goal to have

Starting point is 01:31:16 yes and i would say it is at this level um i think obviouslyCL had kind of a sad story a few years ago where nobody really cared about it. But I, for example, participate in the OpenCL working group at Kronos. And there's active engagement there on improving the language and everything. So I think there is more interest in OpenCL than it was a few years ago. It's definitely going up than going down. So, yeah. I mean, yeah.

Starting point is 01:31:49 Intel is investing into it and some other vendors. This is a giant paragraph, but I'm going to assume the person makes sense. Okay. This might be a very harsh question, but I'd like to ask about visions of OpenCL. Currently, CUDA is very dominant on a GP GPU market space, which is NVIDIA specific.

Starting point is 01:32:17 And those trying to support AMD GPU usually fall towards HIP from ROKM instead of OpenCL. Blender dropped OpenCL backend for cycles in favor of hip tensorflow only supports cuda and only way to run with amd gpu is unofficial rockham fork which runs with hip where would opencl stand in these situations especially when chronos released vulcan to kill both opengl and opencl at the same time um Vulkan to kill both OpenGL and OpenCL at the same time? I think it makes sense for some projects to drop OpenCL support if they are not

Starting point is 01:32:57 willing to spend time on it. I don't actually know what was the reason for Blender, but I wouldn't be surprised if they looked at it and said, OK, nobody actually cares about the code. And that happens, and that's totally fine. TensorFlow doesn't only support CUDA. There is, I mean, it's not easy to run it with something else. But yeah, there is like Rackham stuff going on,

Starting point is 01:33:26 as the question stated. But there's also other projects to make it run on other hardware. There is the OpenVINO project Intel is working on, which is kind of making a common API for AI ML use cases. And it claims to support TensorFlow. I haven't tried it out yet, but I'm kind of interested in looking into OpenVINO

Starting point is 01:34:05 because the GPU backend actually uses OpenCL. And that's something I'm planning to look into. It's very Intel specific. It uses a bunch of Intel OpenCL extensions. I don't know yet how difficult would we support but um there are a few projects at least from intel where they are working on making it you know making all the aiml stuff usable besides cruder they also have like their own you know replacement for opencl called Level0, which is just more low level than OpenCL. And they have their own OpenCL on top of Level0 implementation. But that's the thing. But most people, they can ignore this for now. But yeah, they use OpenCL for quite a lot of stuff.

Starting point is 01:35:02 I wouldn't say that Kernos released Vulkan to kill both OpenGL and OpenCL. The premise of Vulkan was always to be as low level as possible. And if you're an application developer, so just want to get stuff done, then using Vulkan is usually not quite right. Like, I wouldn't be surprised if most, for example, indie game developers are still targeting OpenGL just because it's easier.

Starting point is 01:35:39 I also haven't asked what's the general opinion. But generally, if you look at Vulkan examples, and if you want to draw a triangle, then there's this amount of code and everything. So it's not easy to use. And generally, what people use are engines on top of Vulkan. OpenCL is also able to be layered on top of Vulkan. So I mean, I've talked about it earlier with Zynq. You can implement OpenCL, like run OpenCL applications

Starting point is 01:36:13 on top of any Vulkan implementation in theory. And Vulkan is adding extensions to make it easier to do so. So I'm making use of a bunch of very OpenCL-specific Vulkan extensions, because otherwise it would be like low performance. You would have to work around certain things inside Vulkan, which is just killing your performance. So yeah.

Starting point is 01:36:41 I think OpenCL is pretty straightforward to use. It's really easy to offload stuff onto the GPU with it. And I think it will stay around with OpenGL for quite a while exactly for this purpose. I think it has more, like OpenCL will probably stay around for longer. But I also wouldn't be surprised if you end up with a new API at some point.

Starting point is 01:37:05 So but it also doesn't really matter because, you know, most of the work in OpCL is really done on the compiler side and that can be also used for any other new compute API emerging. I was curious how much code it actually would take to make a triangle in Vulkan. So I had to look it up and there is a lot of code there i i i see i see what you mean there's like there's this yes this page that does like a big walkthrough of the code you would need and it's like i just keep scrolling it's just just a blue triangle it's very explicit it's it's very explicit. It's very explicit with everything. It's explicit because it's such a low-level way to interact with GPU.

Starting point is 01:38:01 Yes, so the big problem with OpenGL was that all the memory allocation was hidden. So you could allocate high-level API objects, but you couldn't say, oh, give me 500 megabytes of GPU memory. Right. And Vulkan is more like this, where you say, OK, give me this memory, and I want to do this and this and that with it, and I want to explicit synchronize all this stuff. And there's a lot of low level stuff you have to do and i guess that

Starting point is 01:38:27 explains why you can build something like zinc that then does these other apis on top of vulcan because it's so low level it gives you that control that allows you to do these these more, I guess, specialized APIs? Yes. I mean, I've also done some benchmarks with Zynq. And for example, there's this Luxmark ray tracing benchmarks a few people might know. And when I was testing on my Nvidia GPU, it was Nvidia OpenCL versus Zynq plus Nvidia Vulkan.

Starting point is 01:39:05 It ran at like 92% of the performance of the native OpenCL driver. So we have like 8% of performance loss. It isn't really runtime heavy, so it's really just executing a lot of GPU code. But it's a great benchmark on how well the GPU compiler works. And I think 8% isn't that much of a deal. Nope.

Starting point is 01:39:32 It's probably like, you know, at like the extremes, probably a massive deal. But like for like just having it like actually do the thing, like that's not that much. Yeah. Yeah. What else? Okay, let's see what else we have in here i think we sort of touched on this before um but it's worth mentioning again is there any hope for pascal and maxwell or are they doomed forever without rec-clocking support. So I would say Pascal is doomed because we can't control the voltage.

Starting point is 01:40:12 Maxwell is in this weird state that because we can't control fans, like only fans, that's not a problem on all GPUs. You might have a passive GPU, like passively cooled, or you might have a GPU in a laptop where usually the system firmware is responsible of driving the GPU fans. Most laptops just have one cooling system, but some laptops have two fans for the CPU and the GPU.

Starting point is 01:40:45 And in this case, we could actually re-clock the GPU, but the firmware situation is a little bit weird, because we are using the firmware of Nvidia, some stuff, and making use of our own firmware for, like, we need it for memory re-clocking, because that's kind of a pain. firmware for like we need it for memory we're clocking because that's kind of a pain um somebody would have to make that stuff work again i had a really really dirty hack to make it work but apparently it doesn't work anymore so

Starting point is 01:41:17 and considering the age of the hardware, it's like not a focus. Yeah, if somebody really wants to... Yeah, yeah. There's hope, but also not for every user of that hardware. Right, right. It wouldn't hold my breath for it. But if someone wants to do it, like, there's a spot open for you. But if someone wants to do it, like, there's a spot open for you.

Starting point is 01:41:47 Yeah, it would probably also be a pain to maintain, so it's kind of... Probably with an opt-in, it might be fine. But, yeah. Simple question. I don't know what they're trying to ask. What's with the 600 GPU series? I don't know what

Starting point is 01:42:05 that's probably Kepler I'm not sure what they're trying to ask I mean we have a clock it's not perfect but it runs and it's supported with NVK so that was the entire question um um yeah that was the entire question um

Starting point is 01:42:28 uh here we go copy this one as far as i know novo targets 535 gsp but 545 is now certified for g4 series Yeah, painful question. Less painful for us. But yeah, so there's not strictly a need to update to every version. There have been situations in the past where we needed updated firmware for existing generations of GPUs.

Starting point is 01:43:18 Like for GPUs, we had this problem that the firmware we initially got from NVIDIA didn't work on newer GPUs. So they had to give us an update. It took a while and landed at some point. But yeah, this can arise. And because there is no stable API, it might break. I think the API at least is stable

Starting point is 01:43:40 because they have an RPC mechanism to fall into firmware code and stuff. So I think this part is relatively stable. But yet the entire API you would use, it's like it can change it at any moment. It even changes within the release branch. So the new 5.35 update also broke the API.

Starting point is 01:44:05 The problem with this is, I think some articles already covered it, but the firmware is huge. And just the two files we have now are like 60 megabyte. And if you consider that some distributions are putting those files into their inner drum of S, and you have like, I don't know, if you have three kernels, you have like the files three times there, it's like 200 megabytes.

Starting point is 01:44:29 And if you would update to every firmware version, we also have to keep the old firmware files because of kernel not request user space kind of rules. So it would, people would run out of space on their boot partition because some distributions are still hiring this one gigabyte boot partition, and it's like, pain. Big problem is if you have full disk encryption,

Starting point is 01:45:03 you can't really access the system yet. So it kind of makes sense to have the firmware inside your boot partition. And some, you know, what's the usual discussion on this is, yeah, why aren't you just loading the GPU driver later? The problem with that is that sometimes you need the GPU driver to drive all your displays. Usually on a laptop, it's not a problem.

Starting point is 01:45:29 Usually on a desktop with single display, it's also not a problem because firmware will set up your display just fine. But for example, imagine you have an office at home, and then you have your laptop always closed. It's buried under a lot of stuff, and you don't want to get it out ever and have external displays connected. And sometimes the firmware is not

Starting point is 01:45:52 able to bring up those displays. So you need a native GPU driver to bring up all the displays on your system. This usually happens if you have multiple GPUs on your laptop. So the internal one works, but the discrete ones doesn't. Or I think it also doesn't work through Thunderbolt. I have this problem with my laptop. So the Thunderbolt DisplayPort MST connected display

Starting point is 01:46:18 doesn't boot up through the firmware. So it needs a GPU driver to actually bring up the display. So it's a really hard issue to solve. And if we keep adding more firmware files, then the boot partition is just out of space. So yeah, this is kind of the problem we have with this firmware. And distributions also need to figure out what to do with it.

Starting point is 01:46:50 So you mentioned the... If you want to update... Go on, go on. Yeah. I was going to bring up the boot partition. Yeah. I don't know what other distros do, but if you go to the Arch Linux wiki, it recommends at least 300 megabytes for your boot partition. Yeah, it's enough for one kernel.

Starting point is 01:47:16 I mean, Arch Linux also only has one kernel installed, so they don't really have that issue. Yeah. Distributions like Fedora, they keep three kernels. I think Ubuntu is kind of doing the same. Or at least, like Ubuntu, I think, always keeps the current one and then one or two new ones or something like this.

Starting point is 01:47:39 So it kind of depends. And yeah, it's not so bad yet with 60 megabytes. But if we would update to every single firmware version, then it's not really sustainable. Right, right. There are ideas on improving the situation. There are ideas of having a separate initRAMFS shared across all kernels where you could put all the firmware

Starting point is 01:48:04 files so you wouldn't have to duplicate them um there are also like idea david early is working on it to declare on the kernel module level what are the sets of files you need like versioned sets of files and you could say i support those 10 sets but i only need need one of those. And you could then, in the initRAMFS generator, just pick the newest or something. This would probably also solve a big chunk of this problem. Well, you could not have one gigabyte in it. Big petition.

Starting point is 01:48:40 Yeah. I mean, butter as subvolumes would also solve this problem. So. Yeah, but then you've got to-volumes would also solve this problem, so... Yeah, but then you gotta get people to stop using EXT4. Sure. I'm just saying that it's one solution to the problem. It's certainly a solution, yeah. Let's see, was there another one

Starting point is 01:49:06 that we haven't really addressed in here? I want to know how the progress with Rusty CL profiles are going. Profiles? I suspect they mean the OpenCL profiles, because you have embedded profile and full profile, which is to differentiate between embedded devices and four-file GPUs.

Starting point is 01:49:37 And on Intel, the full profile is supported. On AMD, the embedded profile is supported. The reason is very stupid. You need a certain amount of samplers supported, or images, or whatever. And the Radeon SI driver limits to 32. And the full profile requires 128. The difference between the embedded and the full profile isn 128. That's like the difference between the embedded

Starting point is 01:50:06 and the full profile isn't even that huge. It's just mostly some precision stuff. And yeah, it highly depends on the actual MISA driver, what gets advertised. But the profiles are supported, and the code detects on what's valid to advertise. I think we've addressed pretty much everything then. Yeah, let me just check.

Starting point is 01:50:34 Did anyone mention anything over on my master? I think I might have saw one here. Where is the replies? Why? Why is that? Okay. I would like to know what their reaction was when the NVIDIA drivers leaked. I know they can't legally look at them, but did they?

Starting point is 01:51:05 What a question. No but I seriously haven't looked at it. Yeah no definitely not. Definitely don't. Also what is the what is their major hurdle having to reverse engineer stuff DRM? What a mess of a question. I think we talked about what the major hurdles were with Driversport earlier. It's documentation and then having to reverse engineer stuff. I don't think there was anything... Is there anything else you would like to add to that? No. you would like to add to that or? No, I think that one problem is that how the driver works

Starting point is 01:51:49 can change, and then all the tools you have stop working. So that's why we have memory work on the new tool to dump the GPU commands. But yeah, it's something you have to do from time to time. And then, yeah, just figuring this stuff out is the biggest problem. Well, I think we're pretty much done here then. Thank you for doing this. Appreciate it.

Starting point is 01:52:17 I still don't understand much about the GPU stuff and compute stuff, but I feel like I'm more informed yeah I mean in the end it's just code on a GPU yeah fair enough um is there anywhere that you would like to direct people to any i don't know talks you've done or anything like that or just like nothing oh um the i i usually always have a talk at xtc um they can probably look at this but yeah i don't have a blog. I'm pretty bad with this. I have an account on Mastodon, so people can follow me there if they want to. But yeah, I'm... Fair enough. ...busying regularly enough, I think.

Starting point is 01:53:19 Sounds like you're busy enough doing actual important stuff that you don't need to spend your time writing a blog uh i don't think that's the reason i'm just uh i'm trying to give you i'm not a very yeah yeah i'm super busy that must be it um i'm sure if you if you sat down and wrote a blog, there's a lot of, definitely a lot of stuff you talk about, it's just a matter of do you feel like actually writing it down? Yep. So...

Starting point is 01:53:52 I usually talk about this stuff on my Mastodon account, so... Yeah, well that, yeah, that does the job anyway, and then, and then, you know, every, you just post something on Mastodon and then you get a bunch of articles written about it and then you know every you just post something of macedon and then you get a bunch of articles written about it and then you can do the work for you yeah basically that's uh kind of works pretty reliably well awesome um thank you as i said thanks for doing this um i guess i'll do my outro and then we'll just sign off uh so go check out the main channel. I do Linux videos there six days a week.

Starting point is 01:54:29 Not a clue what'll be out when this comes out. This will be out in a couple of weeks at this point. I've got the gaming channel that is Brody on Games. Right now, I'm probably still playing through Armored Core 6 and probably still Kingdom Hearts Dream Drop Distance. That I do twice a week on Thursday and Friday. Check out the Discord and you'll see when that all goes up. Or just check when the previous streams went up and there'll be a notification.

Starting point is 01:54:53 If you're listening to the audio version of this, you can find the video version on YouTube at Tech Over Tea. And if you're watching the video and you want to hear the audio version, there is an RSS feed. It's on pretty much every audio podcast platform. You'll find it pretty easily. So give me a final word. What do you want to

Starting point is 01:55:12 say? How do you want to sign up for the show? Try out everything what I'm doing and please report bugs. Absolutely. See you guys later.

Your Ad Here

Tech Over Tea - Making Nouveau & OpenCL Usable | Karol Herbst

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.