The Changelog: Software Development, Open Source - Making the ZFS file system (Interview)

Starting point is 00:00:00 What's up, welcome back friends. This is the Change Log. If I sound a little different, it's because I'm coming off of a cold and I appreciate your patience with my somewhat nasally voice today, but welcome to the Change Log. We do appreciate you listening.

Starting point is 00:00:15 And today I went solo talking to Matt Ahrens. Matt co-founded the ZFS project at Sun Microsystems back in 2001. And of course, 20 years later, I picked up ZFS for my use in my home lab and I loved it. So I reached out to Matt and invited him on the show. Today, we cover the origins of the file system, its journey from proprietary to open source, the architecture choices like copy on write, the ins and the outs of managing ZFS, RAIDZ and RAIDZ expansion, a highly sought after feature coming soon.

Starting point is 00:00:49 And Matt even shares plans for ZFS in the cloud with ZFS Object Store. Big thanks to our friends at Fastly for making our podcast super fast for you to download anywhere in the world. Check them out at Fastly.com. This episode is brought to you by our friends at Square. Square is the platform that sellers trust. There is a massive opportunity for developers to support Square sellers by building apps for today's business needs. And I'm here with Shannon Skipper, Head of Developer Relations at Square. Shannon, can you share some details about the opportunity for developers on the Square platform?

Starting point is 00:01:32 Absolutely. So we have millions of sellers who have unique needs. And Square has apps like our point of sale app, like our restaurants app. But there are so many different sellers, tuxedo shops, florists, who need specific solutions for their domain. And so we have a node SDK written in TypeScript that allows you to access all of the backend APIs and SDKs that we use to power the billions of transactions that we do annually.

Starting point is 00:01:57 And so there's this massive market of sellers who need help from developers. They either need a bespoke solution built for themselves on their own node stack, where they are working with Square Dashboard, working with Square Hardware, or with the e-com, what you see is what you get builder. And they need one more thing. They need an additional build. And then finally, we have the app marketplace where you can make a node app and then distribute it so it can get in front of millions of sellers and be an option for them to adopt. Very cool. All right.

Starting point is 00:02:25 If you want to learn more, head to developer.squareup.com to dive into the docs, APIs, SDKs, and to create your Square Developer account. Start developing on the platform sellers trust. Again, that's developer.squareup.com. Matt, I'm a big fan of your work on ZFS, and I'm so glad to have you here at The Change Law because I'm a newcomer to ZFS. So as you know, because you're a co-creator of it, it's been around for a very long time. It was created in 2001. And my first use of it was in a Homelab production scenario, powering my Plex server, basically, in the year of 2021. So I'm 20 years behind the adoption curve of ZFS. But when I found out about it, I loved the file system. I was like, you know what, we got to get Matt on the show. So

Starting point is 00:03:31 welcome to the show. Thanks. Happy to be here. Do you do any podcasts? Is this a thing you do? I know you give a lot of talks and you're in front of people a lot around ZFS and the community and whatnot. But what do you do around podcasts? Not really. I'm not really plugged into the tech podcast scene. I did one or two many years ago. There's two and a half admins, I believe, out there. I think even one of the writers around ZFS has been on there and maybe even a contributor to ZFS, I'm not sure. I'm new to the ZFS scene.

Starting point is 00:04:02 Yeah, I know the hosts of that podcast, and they'll probably hit me up at some point. Yeah, yeah. You should go on that show. I like it. I listened to a few of them. But yeah, I wanted to get you on to talk about ZFS because it's such a cool file system.

Starting point is 00:04:13 It's got some interesting roots in open source, and it's obviously an OSI-approved license that it's listed as, but it's got some drama behind the scenes. And I figured who better to go through the backstory of its origination and the problem set and its history and then to current than you, as you're a co-creator of it. Back in 2001, it was a file system designed by you and by Jeff Bonwick for the OpenSolaris operating system for Sun Microsystems. They were eventually acquired by Oracle.

Starting point is 00:04:46 And I just want to go into that history, whatever you want to share around that process, like the ZFS origination, you know, what was the problem set? What was OpenSolaris trying to solve at the time? What brought you to Sun at the time? Wherever you want to begin. So open it up. Sure. I joined Sun and the team just after college. So it was my first job out of college.

Starting point is 00:05:07 And I was lucky enough to be recruited by Jeff Bonwick to join him and work on a new file system. So at the time that I joined, it had been pitched to me as like, come join and we're going to work on a new file system. And I just thought that was like the coolest thing I'd ever heard of. That was the motivation for me. I showed up and it really was like nothing had been written yet. So zero.

Starting point is 00:05:34 That's cool. Yeah, zero. So Jeff and I started from what should this be doing? And, you know, I was obviously very junior software engineer at that point in time. So a lot of the ideas of like what it should be able to do and where it should fit in the industry came from Jeff. But really, we wanted to make a replacement. Originally, we were just thinking of it as, hey, UFS is kind of hard to use. UFS was like Sun's file system before ZFS. UFS is hard to use. How can we make this easier to use?

Starting point is 00:06:09 And we looked around at how people were using it, mostly in the enterprise context. So most of them were using it with volume managers, with either Sun's volume manager or Veritas volume manager were very popular at the time. And the volume managers were hard to administer, hard to set up. And then they had all these weird failure modes that some of the in-house sysadmins at Sun had experienced. Sun had a server that was called Jurassic. It was a, for the time, giant server that the kernel developer engineers ran themselves. So people took turns being primarily responsible for that.

Starting point is 00:06:48 And it used UFS and it used the Solaris volume manager. And I think that there are some horror stories that predated my arrival about disks dying and being re-silvered incorrectly by the volume manager and maybe mistakes being made due to the difficulty of understanding what was really going on there, even from people who are very experienced with software and computers. So, you know, one of the taglines that we created after the fact

Starting point is 00:07:16 was that the goal of ZFS was to end the suffering of administering storage hardware. And I think that that's pretty accurate. Yeah. It's a very painful process. It can be very painful. And I think that ZFS has succeeded in large degree at addressing the problems that we saw 20 years ago.

Starting point is 00:07:39 So that's kind of like the very high level of what we were trying to do. I'll point out that we were not setting out, the high level was not to create a product. It wasn't to make the fastest software. It really was to address the pain points of the difficulty of administering. And I think that those goals or lack of goals kind of have a long shadow, right? Yeah. I think that you look, I'll get into some more of the specifics of what that meant,

Starting point is 00:08:10 what those goals meant back in the day. But, you know, you look at even now, 20 years later, ZFS, it does perform well. But, you know, when people do benchmarks against other file systems, and a lot of times ZFS performs better than them. Sometimes it doesn't perform as well. Like I think that the people behind ZFS, like we don't really sweat that much. I don't see people being like, oh, like we got it.

Starting point is 00:08:33 We got to beat them in this other, in this maker benchmark, like what's going on? I think that the thought is more like on the whole, ZFS is very useful and performance is part of that utility, but snapshots are part of that utility. Replication is part of that utility. Checksums, compression, all of the different things in ZFS that work well together and are easy to use together and hopefully easy to understand what's going on. That's what brings a lot of the value compared to other technologies and product, right? So today, like back when we created it and today, ZFS is not a product.

Starting point is 00:09:12 There are products based on OpenZFS doing all kinds of different things. But OpenZFS is just an open source project. And we're working on creating that fundamental technology and making that easy to use for system administrators and also easy to integrate into systems and products. I think it's interesting, this history of it, because I can imagine as a junior developer coming out of college with a blank screen, essentially, with Jeff, one, you probably grew up a lot in terms of being a software developer and even a human being, right? I mean, your whole career has been spent

Starting point is 00:09:50 essentially on what is ZFS and now OpenZFS, the project. And I think that's just interesting how you can sort of attack a problem set way back in the day in that exact scenario. Come out of college, junior engineer, junior developer, you know know first real job right like it was your first real job and as a programmer yeah and now you're still doing it like it shed some light to like i guess the interesting bits around starting like sometimes you never know we're gonna end up you know like where you might end up is is sort of like this question mark and it's like well it could be a dead project or it could be something that people really get value over 20 years or more. Yeah, I think that I was very fortunate and lucky

Starting point is 00:10:33 both to work on something that turned out to be so successful. Like it's definitely more successful than our wildest dreams of 20 years ago. And also very fortunate to have the opportunity to work on something that's, that's brand new, even if it wasn't successful, you know, creating cool technology is always fun and it's a great experience. And then to be able to do that, you know, with, with a great mentor, somebody that, um, had more experience and, uh, was willing to do

Starting point is 00:11:02 a lot of the hard work that was probably invisible to me at the time of making the project exist and making the space for me and other developers to write the code while there's a lot of other things going on within the company. People want different things out of it, organizational things, which thankfully I didn't have to get too involved in at the time. But I know Jeff

Starting point is 00:11:28 put a lot of work into that as well as the work that he did on designing and implementing it back in the day. So when somebody asks you, what is ZFS? How do you how do you describe it? What do you say it is? It depends on who's asking, I think, or what I, you know, what level I think that they're going to be able to understand it because everybody understands things in the context that they have. So let's say an everyday software developer that doesn't know much about file systems. They know they exist. Sure, they're on their computer.

Starting point is 00:11:59 They use them, but they're just an everyday developer. They're not touching file systems too often. So for developers, first of all, you know what a file system is. Hopefully, you kind of understand what a file system is and what its purpose is in the most basic sense of storing files and data on hard disks. ZFS, our tagline from back in the day is that it combines the functionality of a file system and a volume manager into one integrated solution. And it makes using, it brings enterprise level storage technology to the masses. So those are technologies like snapshots and compression and replication

Starting point is 00:12:37 that, you know, those don't really exist or they're very primordial in more traditional file systems. And so ZFS lets you get a lot more out of your storage system and it lets you build really powerful storage systems with just a bunch of disks or SSDs, combinations of those without expensive technology, without expensive enterprise products. Right. Who's using, who's using CFS? I mean, I, I mentioned I'm a home lab or pretty much I'm a home lab or user at least. And I would call my scenario enterprise home lab because I don't want it to go down necessarily. If the data died, it's my Plex server. So like it's movies, right?

Starting point is 00:13:20 I don't want to rip all those movies again. And it's a lot. I think I might have like 10 or 14 terabytes of movies, maybe more than that even. I'm not even sure. 4K, 1080p, but that's my use case of it. But I imagine there's a lot of Homelab users out there. There's a lot of enterprise users out there. You mentioned it's for the masses, so I'm the masses. I'm the user of that.

Starting point is 00:13:40 Where is the – you're employed by Delphix. You get paid to do this daily. We talked about Sun from back in the you're employed by delphix you know you get paid to do this daily we talked about sun from back in the day acquired by oracle like this has been a career for you so you've obviously done some cool stuff with it but like where is it being used at the most highest level and at the most lowest level like say a home lab like me yeah it runs the whole spectrum um and one of the great but also challenging things about open source projects is that we don't necessarily know where it's being used, right? People can pick up the code and do whatever they want with it. We don't have like a product and a list of customers and numbers or things like that.

Starting point is 00:14:18 But I can tell you about some examples. Obviously, there's a lot of folks like you who are using it at home or in very small businesses. And that's probably the majority of people using, touching, running ZFS commands is probably those kinds of users because there's so many of them. The amount of data or the demands on the performance of those systems might not be the highest. And I think that those types of users are the ones that tend to be underserved historically by enterprise-focused open source projects. Because most of the work is done by people who are paid to do it, and they're paid to do it to make it work in some higher-end type of deployment.

Starting point is 00:15:06 Right, some sort of paid scenario, some sort of... Yeah. Even though it's open source, some sort of product that uses the open source to create a cloud product or some sort of serverless or service, essentially. Yeah, so if we go from there to kind of the very highest end, there's folks like Lawrence Livermore National Labs, which is a

Starting point is 00:15:25 US government research agency. And they run some of the biggest supercomputers in the world. And they actually originally ported ZFS from Solaris to Linux to be able to use on these enormous supercomputers. So Brian Bellendorf is the one who started that quite a few years ago now 10 plus years ago i think and they've been doing a lot of the work to maintain the zfs on linux and um you know they're using it in their huge supercomputers uh i don't have the numbers handy um they probably are available but it's like you know petabytes and petabytes and petabytes huge huge things that take up you know warehouses full right and i think that kind of because of their leadership, a lot of other supercomputer type applications have picked this up as well.

Starting point is 00:16:10 So, you know, Cray, HPE, Intel have all done work on putting ZFS into use in supercomputer type applications. One of the interesting things about those applications is that I didn't know this. When I learned about supercomputers back in the day, it was like, oh, it's all just like you're running a bunch of numbers and writing the numbers out. And so presumably it's just giant files, right? You read some big files, you write some big files. Maybe you probably care about throughput. And that's kind of the traditional space of things like Lustre. Lustre is a distributed file system that can run on top of ZFS. So it has pretty tight integration with ZFS. And they advertise things like, you know, basically like if you have enough servers and

Starting point is 00:16:55 enough clients, then you can get like your full network switch throughput of however many like terabits per second or whatever, because it's fully distributed. But a lot of those workloads, HPC is not just these big file streaming stuff. They do a lot of small file creations because a lot of these workloads are written by folks who are not file system engineers. They're just trying to solve their problem, and it doesn't necessarily map onto all giant files all the time.

Starting point is 00:17:25 And so you see a lot of various workloads. So I did some consulting for Intel several years ago now where they were trying to improve small file creation performance, which is not what you'd expect coming from an HPC-type workload. So even these largest, large-type use cases, they have a lot in common with home users use cases, right? Where it's like creating lots of files. Like that sounds like maybe, you know,

Starting point is 00:17:53 downloading your photos or reading your mail spool or writing out all your little text files for your code base. You know, if you're a software developer reading lots of small files when you're doing a compilation. So a lot of these use cases really transcend like the large to the small. In the middle, you have, I would put in the middle a lot of use cases of ZFS where companies have taken ZFS and embedded it into another product. So one example is folks like IX Systems and Nexenta

Starting point is 00:18:28 who have made kind of general purpose storage appliances based on ZFS. So inside is ZFS, and then they have a nice management interface that makes it easy. I think Plex is probably also in that category. And there's a lot of people who might be using those products.

Starting point is 00:18:48 FreeNAS comes from IX Systems. So a lot of people use FreeNAS in their home systems. A lot of people might be using those types of systems and not even know that there's ZFS under the hood. They're just like, I have a Plex. I don't know what's going on. Like, yeah, it has compression. That's great.

Starting point is 00:19:03 It has RAID. That's great. That's interesting because, you know, I'm in this home labor scenario and I think obviously I'm a developer and I'm a taking kind of person, but you know, there's this world where in the future where people are going to have, I would probably guess like their own clouds in their house. You know, as the technology gets more and more accessible, you're going to have the need for potentially, you know, privacy and storage and stuff like that. For example, I've got a unified network. I've got unified cameras.

Starting point is 00:19:29 They're local to my network. There's obviously some external accessibility via unified and stuff like that, but I'm trying to keep things local. Plex, I can access from outside my network. That's the extent I'm using it so far. I plan to eventually move our podcast archive, which I think is around the same. It's around 8 or 10 terabytes of data we've collected over the 12 plus years we've been doing podcasts. Which, you know, those archives are very precious to us. Like we lost, you know, the early days of this show, for example. You know, it probably wouldn't make or break the business, but we could never go back and alter that or remix them or remaster them

Starting point is 00:20:07 or do something new in the future if ever we wanted to. So our archives are pretty precious to us, but I think it's really interesting how you can serve, you can do this project and serve such a wide degree of, in quotes, customer type, user type, enterprise, small businesses to home labbers. That's just so wild how this file system could potentially power this future where I think more and more people will have a NAS on their home network. That's interesting to me.

Starting point is 00:20:39 You mentioned private clouds, home cloud. There are a couple of companies, you know, Joyent tried to do this of like taking the interface that you can get from a public cloud and sell a bunch of gear to a company that can deploy that on-prem and then have that same kind of cloud usability, you know, in a data center. You could imagine like pushing that to smaller and smaller and smaller deployments, right, into home and small businesses. And I think some of the folks from Joint, like Brian Kentrell, they are at Oxide now, which is kind of doing something similar, taking on even more of the stack. But both Joint and Oxide are using ZFS as part of their storage subsystem in those like cloud in a box or private cloud type deployment scenarios. This episode is brought to you by Influx Data, the makers of InfluxDB, a time series platform for building and operating time series applications.

Starting point is 00:21:53 In this segment, Marian Bija from NodeSource shares how InfluxDB plays a critical role in delivering the core value of the APM tool they have called InSolid. It's built specifically for Node.js apps to collect data from the application and stack in real time. At NodeSource, we want to lean into a time series database and InfluxDB quickly rose to the top of the list. One of the unique value propositions of Nsolid is real-time data. And there is a lot of APM tools out there, but there is a variance in terms of how available the data is. It's not really real-time. There is actually a staging period to data, and InfluxDB is magical and allows us to deliver on our unique value proposition of real-time data with Insolid.

Starting point is 00:22:37 To get started, head to influxdata.com slash changelog. Again, that's influxdata.com. So what is it then you think that makes people choose ZFS? Like of all the choices they have in the Oxide scenario or in my scenario, I've got a 45 drives Stornator, for example, here. It's got a 12, I think it's got 15 drive bay. For example, I can fill it up with massive amounts of storage. You know, it's a Linux box. It's, it's running Ubuntu, you know, 2004 or whatever, you know, what, what makes someone like me or someone like them choose ZFS for this storage engine? Why is ZFS the choice? Like what particular features, what makes them choose it? So I think that it's,

Starting point is 00:23:42 it's kind of out of necessity. There's really two kinds of data. There's disposable data where you can put it on whatever you want. You can put it on your thumb drive with FAT32. Sure. And it doesn't matter. There might be some performance requirements, but the requirements are not that great. And in those scenarios, you might use ZFS for interoperability if you're used to it, but there's not a ton of use cases. But if you care about your data,

Starting point is 00:24:10 then I care about my data, Matt. Yeah. And I think most people do. If you care about this data, then you need to be able to have some redundancy with it. So you need kind of the functionality of a volume manager where you can have multiple, you know, drives and some of them can die and you don't lose all your data. And you need to know that the data that the drives give you is correct. So you need checksums. And even just those two basic requirements, there's not, I mean, there are other technologies that do that. They're much harder to use typically. And so I think the choices are like, if you're deploying it yourself, ZFS is just so much easier. If you're making a product, then, you know,

Starting point is 00:24:55 your customers might not know or care, but, you know, the building your product on something that's as capable as ZFS is going to reap long-term rewards. And the fact that ZFS is under continuing development and improvement means that your product has a solid foundation that's going to keep up with what the future holds for software and hardware storage. That's just the very basics, right? I think a lot of people even forget about that because there's all these other cool, great features with ZFS that are very exciting, like snapshots, right? Being able to protect your data with snapshots. A lot of people nowadays are thinking about ransomware, right? And what if somebody has some virus or whatever they call it that encrypts my data or alters my data or deletes it, how do I recover that? Well, ZFS has built-in snapshots, takes snapshots every day or every hour. The storage cost of them is very low. You're only paying for the data that's different each snapshot.

Starting point is 00:25:55 And the performance impact is basically non-existent. So you get a lot of protection from accidental or malicious changes to your data very easily and at very low cost in terms of the hardware that you have to pay for it. Things like compression built in. I think a lot of people nowadays take this for granted. At least ZFS users probably take it for granted. But it's not present in all the computing technologies. Being able to dis-turn on compression.

Starting point is 00:26:24 We're using LZ4 compression by default, which is very, very fast. It doesn't give you the highest compression ratios, but it means that you can just turn it on and kind of not worry about it. Like you turn it on and typically performance improves because you don't need to read and write as much data to your desk.

Starting point is 00:26:39 And then, you know, in more kind of complicated deployments, people look for things like replication. Like I have data that's on this machine. I want to get to this other machine. I could use rsync, and that would probably work just fine if I just need to take one copy of the data one time. But if I need to, you know, continually move the changes over, rsync is very, very slow because it needs to check every file and every block of every file to see if it needs to be sent. And then if you're using ZFS to begin with, obviously you want to preserve things like the snapshots that you have on the source system, the compression that you have on the source system.

Starting point is 00:27:14 So ZFS has this built-in send and receive commands that let you serialize the contents of a snapshot, send it over to another machine, and it preserves all the complicated stuff like ACLs, access control lists that might be on files, and other esoteric things that might not be used that often. But when they are used, you don't want to have to worry about is rsync preserving them correctly or whatever. This is one area of ZFS I'm not taking advantage of right now. I do some replication, basically backup,

Starting point is 00:27:45 because RAID is not your backup. It's just being able to store more and do more with a volume, not so much necessarily an actual backup. And this is one area where I'm not taking advantage of a feature, really. And it's just because I'm still getting into the ZFS world where I find information. I find that it's actually kind of hard to find all the information you can of what you can do with the power of ZFS, for example. There's a lot out there, and I know it's actually kind of hard to find all the information you can of

Starting point is 00:28:05 what you can do with the power of ZFS. For example, there's a lot out there and I know it's probably getting better, but 20 years later, I'm still kind of like, wow, it's, it's, there seems to be a lack, like, or at least a vacuum, maybe not like somebody's doing a bad job, but more like there's an opportunity, you know, somebody's out there doing more. And there's a couple of books out that I've, I've picked up that I've liked a lot as well that really helped me school myself on what ZFS is and what it can do. But replication is one particular area where I'm still using rsync. I'm still using rsync and moving stuff over to a different store. Now, granted, currently that separate store is not a ZFS store, so I couldn't do replication there. But when I do fully move over all my stores, I have a couple of different rate scenarios where I'm not using ZFS everywhere.

Starting point is 00:28:48 I'm only using in this one pool currently. And mostly because I want to prove that it works well. I can actually manage it. And so to your credit and everyone else's credit involved in it, yeah, it's pretty user-friendly. I can use ZFS pretty easily. It's pretty easy to create a pool, pretty easy to manage a pool, pretty easy to do scrubs and stuff like that to like verify my data.

Starting point is 00:29:09 But that's the extent I'm doing. And it's pretty set it and forget it. Like I've pretty much gotten bored because it doesn't require a lot of maintenance. So good job on that part at least. That's good. Yeah, I would say don't feel bad about not using all the features of ZFS.

Starting point is 00:29:23 What's there? I want to use it, Matt. You know, I want to, if I have another ZFS store, I'm not going all the features of ZFS. What's there? I want to use it, Matt. You know, I want to, if I have another ZFS store, I'm not going to be rsyncing. I'm going to learn replication at that point. I think that's great to learn that stuff. I would say that like, you know, the ZFS enthusiasts who become evangelists

Starting point is 00:29:38 kind of talk up like all of these capabilities of ZFS. And I think that they're all for sure, like they're all useful in different scenarios, but ZFS has a lot of capabilities. It can do a lot of things. That doesn't mean that you should do all of them in all deployments, right? Sure.

Starting point is 00:29:54 It has all those things so that it's flexible and can be used in a lot of different scenarios. And like being interoperable with like being able to run our sync to send it to another machine is like, that's just fine you know that said i love zfs send and receive and it's it's really cool basically um so you're asking where can you learn more about how to how to use this stuff sure yeah what resources are out there yeah the the books um the books that i've been written about are pretty good the free bsd um mastery zfs is a good one

Starting point is 00:30:25 i think that there's a like a version two or like a advanced advanced mastery i forget what they called it we'll look it up on amazon well the one i have is uh is free bsd mastery zfs okay and that's alan jude and michael w lucas yeah and i think there's a second counterpart to that which is like even deeper so it's like advanced or something like that. So I think those are both great resources. Those are the ones that are coming to mind right now. I would for if you want to get an education, I think that those books are the way to go if you but like online forums and stuff are also very useful. It's just like it's more immediate and more personal. But the quality of information you're getting is more variable.

Starting point is 00:31:14 I've gotten a lot of mine from YouTube, various blog posts, obviously Stack Overflow here and there. The book I'd mentioned, I haven't gotten the advanced version of it yet because it's just not quite there yet, but it's been very helpful for me. I've been taking my own notes on different commands I've run for establishing a new zpool and stuff like that. We've gotten this far actually without even talking about features, but not the specifics of breaking down what ZFS does. So it is the open source project, OpenZFS. We haven't talked at all about its, you know, I guess...

Starting point is 00:31:49 The features and capabilities and like what do I type to do this? Yeah, exactly. So like how do you create a Zpool, for example? How do you create, what's the step? Like if I wanted to create, you know, a six drive scenario where maybe I'm, you know, a home lab or I'm doing plaques, like what would I do to create a, a six drive? Would I do, you know, RAID Z1, RAID Z2? And like, how do you choose which, which RAID level to choose all that good stuff? How do you even choose the number of drives? Obviously there's a cost factor, but you know, I think there's a number of where like,

Starting point is 00:32:20 if you wanted to do RAID Z2, you should do in multiples of like, I think it was like six, eight, 12, some sort of number where like you don't want to do seven, for example, because it doesn't map out well. Help me understand that. Yeah. So actually I wrote a blog post about this about, I wrote a blog post basically saying, don't worry about that number. Okay so uh in my opinion and experience um the the specific width of like how many drives do you have like there aren't magic numbers there uh basically like the more drives you have the you know the more performance you'll get the more space efficiency you'll get. And there aren't really like magic points on there that are like more optimal than others. Aside from some very, very specific scenarios

Starting point is 00:33:11 that don't apply to common cases, right? Basically, like, you know, you're using a database, it has a fixed record size, you're for some reason not using compression, then maybe there's some like more optimal things there. We wrote a blog post, if you search for some reason not using compression, then maybe there's some like more optimal things there. I wrote a blog post. If you search for like, I think if you search for RAID Z,

Starting point is 00:33:31 you'll find it. The title of it is like how I learned to, how I learned to stop worrying and love RAID Z. And it goes into excruciating detail about why this is true, about why people think that you need this like power of two plus n exactly and then why it is not really applicable and a lot of the reason is because like you want to use compression and uh probably you're either you're using compression or you're using very large files and very large block sizes because you have videos or something like that that's not compressible.

Starting point is 00:34:09 And if you have compressible data, then you should be using compression. And then you end up with variable block sizes. ZFS takes 128 kilobits of data. It compresses it down to a multiple of whatever the sector size is. So like four kilobytes. So you might have like a 70 kilobyte, a file that's like a big file. And the first block of it compresses to 70 kilobytes. The next block compresses to seven kilobytes. The next one compresses to 104 kilobytes, right? That means that any kind of fancy math that you're trying to do to arrange for things to be laid out just perfectly just isn't going to fly. And then on the other extreme, like if you have large files, then they have large blocks and then everything kind of everything is easy and there's no need to worry about like getting things perfect

Starting point is 00:34:57 because when you have those large blocks, they can spread evenly over all the disks. So that's one less thing to worry about, which is good in terms of your deployment. I'd love to, I'd love just to lean on ZFS to be smart enough, you know, to not have to worry about the number of disks. And I mean, that's kind of what you want to do, right? You want to take as much mental overhead on planning out a new store, right? Exactly. You want to put as much into the software to manage it than the person choosing the number of disks. You want to put as much into the software to manage it than the person choosing the number of disks. You want to be able to choose things like, you know, reliable hardware, reliable operating systems, reliable things that you can plan for.

Starting point is 00:35:32 Not so much, should I use seven or six disks? Yeah. And you see that reflected in like the user interface of ZFS where ZFS has, there are a lot of properties you can change and things you can do with it. But hopefully those things are all there for a reason. They have a real impact on how, you know, on what you're trying to do. And then there's a lot of the internals that, you know, you can do with like module parameters, you know, kernel module parameters that are like semi-documented, not supported. It changes the internal workings.

Starting point is 00:36:05 You just shouldn't have to deal with that. Like, you know, for the vast majority of people, you should never need to think about that. Of course, sometimes we fall short in there. There are things where it's like, oh yeah, you really do want to change that to Nobile in this scenario to get really good performance for this particular workload.

Starting point is 00:36:19 But the goal is that you don't need to do that. The system gives you very good performance and semantics out of the box. And then, you know, you can express your intent with the commands to set properties and stuff. So getting back to your question of like, I have a home lab, I have some disks, what do I do? Yeah, typically with the number of disks that you're talking about for that kind of scenario, it's like, you know, four to 12 kind of disks. Probably you're going to create one RAIDZ group and you're going to put all your disks in it and it's going to be either RAIDZ1 or RAIDZ2. So there's not a lot of real decisions there to make. I think you're going to be running a command that's like

Starting point is 00:37:05 zpool create, give the pool some name, RAIDZ2, and then just list each of the six disks that you have. Whether it's RAIDZ1 or RAIDZ2, it comes down to how much redundancy do you want. And with RAIDZ1, you know, it can tolerate one disk failing and not lose any data. If you lose that second disk before you've replaced the first one, then you lose all the data. With RAIDZ2, you can lose two disks without doing any replacements and you'll still have your data. So, you know, in like industrial deployments, the consideration is really like, how long does it take to replace a drive before you get back to full redundancy? And, you know, people are configuring spares and timing, like how long does it take to do the re-silver and all that kind of stuff. For small home deployments, people probably aren't doing that.

Starting point is 00:38:10 They're probably not configuring spares. The time to do the replacement is however long it takes me to order the disk online and get it shipped to me or whatever. That's the long pull. So I say as a rule of thumb, if you have a bunch of drives, RAIDZ2 is going to give you more redundancy than you're probably ever going to need. If you want to live a little bit dangerously and risk like, hey, if two drives fail in the same week,

Starting point is 00:38:36 then I'm going to have to go back to my backups, then RAID Z1 will save you a little money. It saves you the cost of that one drive. Which can get very expensive, honestly. If you're doing, you know, especially a Plex server, for example, if you want a lot of storage, I would say performance, not necessarily that you need all the size, that you need the performance between the disks,

Starting point is 00:38:58 where you have like maybe a 10 terabyte is a common size for an individual disk for a nas for say a plex server or anywhere you want something with decent performance you know maybe six maybe eight but the the eight to ten anything above that tends to be higher performance drives in terms of in terms of spin and you know movement of data and whatnot throughput on the actual disk itself but that's a pretty large disk so if you've got four of those that's what 40 terabytes it's a lot of storage right if you got six of those let's just do some math matt that's that's six terabyte i mean that's 60 terabytes it's a lot but in ray

Starting point is 00:39:35 z2 scenario you've got like you got reservation then you got i don't know how you say the other word but it's like rev reversation i don't know it's like oh yeah what's the other word what is the what is that stuff when you get into the semantics of, you know, how you, you know, plan for overhead in these, in these scenarios? Yeah. So now we're talking about like, you, you've created your pool, you have a bunch of different kinds of data on there, right? Maybe you have like your video files and you have your home directory and you have like your movies and you have a cache of other stuff how do i like manage that so typically people would create like different file systems for each kind of use case um and so in zfs when we're talking about a zfs

Starting point is 00:40:17 file system the storage pool is like all of the disks that zFS is managing. And then you can create file systems on the fly. They aren't assigned any particular space on the drives. They just consume and release storage as needed. So creating these ZFS file systems, what's inside the storage pool is very cheap and easy. And we use those file systems primarily as like administrative control points. So you could say like, here's all my video files.

Starting point is 00:40:46 Don't bother trying to compress them because they're already compressed. On the other hand, here's all my source code files for my development project. Let's compress those. So all the different ZFS settings and properties are per file system, so you can set them differently for different types of data that you have. Now you're asking about reservations and we call them ref reservations because it stands for referenced reservations. So you might want to think about I have some space that's for one use and I don't want that to, I want that use to never exceed some amount.

Starting point is 00:41:26 And I want this other use to have some reserved space. I want to always have some space available for, you know, my, my software development project, but like maybe the movies or like, you know, my kids are dropping their movies into some NFS share, probably not really an NFS share. It's probably like a, you know, SIFS or some, you know, fancy Dropbox thing on top of it. But, you know, my kids are dropping their movies into some other section of this. I'm going to put a quote on that

Starting point is 00:41:52 so they can't use more than like five terabytes or whatever. So quotas and reservations let you do that. And that's at the ZFS create layer, not the Zpool layer. Yeah, so that's at the ZFS layer. The file system. Yeah. So you have the pool. It has, you know, your 60 terabytes, create layer not the zpool layer yeah so that's the z at the zfs layer the file system yeah yeah so you have the pool it has you know your 60 terabytes um but each file system uses a variable

Starting point is 00:42:12 amount just depending on what's in there at the moment uh and the reservations and quotas let you control that so specifically the reservation says um like this file system and all the stuff associated with it. So all the snapshots and all the descendant file systems. So the file systems can be arranged hierarchically where the children like inherit property settings from the parents. So you could have like, you know, maybe it's not,

Starting point is 00:42:39 maybe you want to limit the kids movies, but each kid has their own like a directory. And so you make that directory a file system. So know there's one kid's file system the other kid's file systems then there's the parent file system that's like all the movies you can set quotas at any of those levels you can set a quota at the all movies file system and then that that limits the space used by all the file systems beneath it put together so the ref reservation is talking about, I want a set of reservation that is for the space. It's for like what I as a user ignoring like compression

Starting point is 00:43:15 and snapshots and other stuff. What I can see there, how much space, like I want to reserve space that's just for that. And the idea here is like, the system administrator configured some snapshots and those snapshots take up some space, but I want to reserve space that ignores those snapshots. So the ref reservation is actually more expensive

Starting point is 00:43:41 in terms of like it reserves more space because ignoring snapshots, I'm like, well, I already have like a terabyte of space in here. I have a two terabyte quota, which means like I can write a terabyte of new stuff and I could like maybe I could delete that original terabyte of stuff and replace it with other things. So I can actually write two terabytes of data here if I delete what's already there. So that means that the ref reservation has to make sure that there's actually two terabytes of data here if I delete what's already there. So that means that the ref reservation has to make sure that there's actually two terabytes available if there's a snapshot of that original one terabyte. So it needs more space, but it's kind of taking a different view.

Starting point is 00:44:18 Like if you're thinking as the system administrator and thinking about the cost of the snapshots, then you would use the reservation. If you're thinking as the end user and you want to ignore what snapshots are there or aren't there, then you would want to use the ref or referenced reservation. So that you can know how much storage you have available, right? To get an accurate depiction of what you have. And if you're the administrator, you want to have a different view of the world. If you're a user, you want to have an obviously micro view of the world. Because you don't really care about the snapshots and stuff like that. Fluff to you. It doesn't matter. That's the administrator's job. And you might be that same person.

Starting point is 00:44:56 In my case, I'm the same person. I'm the user and the administrator. Yeah, if it's the same person, then it's a little easier to understand what's going on. But when there's multiple people involved, then these concepts are, you know, the detail and richness of these properties are useful. One thing I haven't heard you talk about yet, which I feel like, as somebody who hasn't designed the system, hasn't been involved for 20 years and is just a user user, is the killer feature, in my opinion, is copy on write. Is this ability to be a secure file system because you want data integrity, you want to verify that you have things you want to automatically repair, and it's this copy on write that really sets ZFS apart from every other file system I can think of is that copy on write. Can you talk about that a bit?

Starting point is 00:45:44 Yeah. I mean, it's so fundamental to ZFS that we probably forget about it. And you kind of have for 40 minutes. Yeah. And it's not a feature, right? It's not a user-visible thing. But it's a kind of enabling data structure that allows us to have zero-cost snapshots and to be able to have always, like always the data is always self-consistent. If you crash or pull the plug at any time, you don't have to run FSDK.

Starting point is 00:46:14 What you have on disk is always a consistent view of the file system. Yeah. So that was one of the decisions we made, you know, very, very early on. Before we wrote a line of code, I think we decided that ZFS would be copy on write. What made that come to light? Like, since it's so fundamental, it goes that far back. Why did no one else copy this feature? Or how often is it used elsewhere? Why? Why is it so fundamental? How did you get there? I think that like a lot of the features, first I'll answer like, why aren't other people doing it? And then I'll talk about who is doing it. So why aren't other people doing it? Well, like a lot of features in ZFS, there's a cost to it. There's a, in terms of the, a runtime cost, right? Like it can make things faster in a lot of scenarios but like checksums which are you know on by default in zfs we we viewed it as like this is a fundamental enabling thing that everybody should be using and if if you're like really really have some hyper specific

Starting point is 00:47:19 use case where you can't pay the cpu cost of, okay, I guess we'll let you turn it off. Or maybe this just isn't the file system for you. I don't know. I mean, we kind of took that view with a lot of things in ZFS, including copy on rate. But that's why it is not or hasn't been used more widely is that like it's complicated. And copy on rate makes performance different,

Starting point is 00:47:41 at least not necessarily worse, but different. Now, as to how did we decide to use it and who else is using it? Well, at the time, I think the other major use of it was in Waffle, which is NetApp's proprietary file system. And I think they had used it to great success, especially with like enabling snapshots. A bunch of the kind of details of how they implemented the snapshots are different than how we keep track of them. But the fundamental

Starting point is 00:48:11 idea of like, we're always going to be writing new data to new places on disk, we're not going to be overwriting existing data in place. We saw that as like, snapshots are going to be just a base requirement in the future and doing it any other way than copy on right is just it's not scalable and uh you see that even today like there's other there are snapshots in things like ufs but it's like you get one snapshot or like you can have snapshots but when you create the snapshot it takes like minutes and minutes and minutes to go like copy a bunch of metadata or whatever. And we wanted it to be easy and cheap. We wanted to give people like no excuse for not protecting their data with snapshots.

Starting point is 00:48:54 Makes sense. The other thing that we saw, which is more from direct like pain experience, is that with earlier file systems, you know, if you crash, you had your own FSCK. And the bigger your disks got, the bigger your file system got you crash, you had to run FSDK. And the bigger your disks got, the bigger your file system got, the longer it took to run FSDK. And a bunch of file systems added things like to kind of reduce that time on big file systems so you didn't have to scan every data structure.

Starting point is 00:49:18 You only had to like, you know, you knew that it was only these certain ones that needed to be scanned, but still we saw the trend of like hard disks getting bigger and bigger and bigger. And even 20 years ago, it was getting to be taken unreasonably long time to run FFCK on that server that we were administrating in the Sun kernel group,

Starting point is 00:49:37 Jurassic. So even on Jurassic, which was, I mean, it had a lot, bunch of disks. I'm sure that it was a tiny amount of storage compared to today's storage systems,

Starting point is 00:49:44 but you know, it took like an hour to run fsck every time the system rebooted at least every time it um crashed yeah so you know we thought we saw that as totally unacceptable and rather than like make incremental improvements on fsck we wanted to design the problem out of existence so that we didn't have to worry about it as disk sizes increased. And I think that in retrospect, that was definitely the right way to go because storage is just so huge nowadays. The idea is that for copy and write is that, and I suppose because we do have much larger disks now,

Starting point is 00:50:21 there's more opportunity to always write new versus copy or write over old, right? In an opposite scenario, you always have a lot of disk space available, or at least in large pools. Sure, always is, you know, in quotes, it's not always true. But, you know, the idea is that you can always write new. And that makes snapshotting easier because you can just point to the newly written file versus the overwritten file. And if ever you needed to revert, you can point back to the old file that was not overwritten, which is what makes your snapshots so much faster and so much easier. Yeah. You can promote a snapshot to primary and kill the old thing altogether. Like there's a lot of interesting things you can do.

Starting point is 00:51:00 Very much similar to the way Git operates even, right? Like it's very similar to the way Git operates with master and branch or, you know, your main branch and different branches and stuff. It's very similar to that, at least in terms of how you fork the data. Yeah, definitely. Like forks are just like ZFS clones in terms of they start the same as some base, but then you can diverge them

Starting point is 00:51:25 to put your changes into each one. And then you can go back to how it was before and you can create lots of ZFS clones easily or lots of branches easily, I should say, about Git. Yeah. What about the real feature, I think, is that people have been, especially home labbers, I'm not sure you can speak to the enterprise customer

Starting point is 00:51:44 needing this, but I think it's more apparent in home labbers because you want to start small because you tend to have less money invested in disks and you want to eventually expand. But once you establish a pool and you create some file systems, I'm not even sure it's landed yet. I know it's a pull request out there, which is raise the expansion, being able to expand.

Starting point is 00:52:03 I could just imagine, I was watching your talk on this, I could just imagine the amount of mental overhead, the spaghetti in your head, thinking about how to explain. So I'm not going to ask you to necessarily explain raise the expansion, except for as you might need to. You don't need to point out the details, because you needed a screen for that, you needed to demonstrate it. This is a visual thing for sure, but this is a feature i know that's been long awaited

Starting point is 00:52:29 yeah you know being able to establish a arrayed array and be able to expand it from six drives to eight drives and with no foreseeable necessary penalty basically yeah less pain now it's possible but it's a it's a pr from what I understand. What's the details on that? This kind of brings us back to what we're talking about. First thing about this being a Winstress project and how do we serve the home users? So this is a feature that has been requested for years and years and years and years. Yeah, for sure. But enterprise users don't have this problem because it's like we buy the disks by the shelf full. You just add a new shelf and create a new RAID-Z group from the stuff in that shelf. Right.

Starting point is 00:53:10 Or you just buy a new rack. You know, it's like a whole new rack and a new system. And then that's that's what you're going to use for the next 10 years. Hopefully not that long. But, you know, the life cycles in enterprise are very long. But, you know, for home and small users, it makes a lot of sense to say, look, I started out with four disks. I mean, these disks are not cheap when it's coming out of my own pocketbook. You know, you're talking about like, you know, laying out $1,000 or something, and then I don't want to have to lay out, you know, $2,000 or like $1,500. I care about the extra, every extra $100. So sizing it for just what you need initially makes sense.

Starting point is 00:53:47 And then, you know, a couple of years down the road, your storage needs grow. Add one more disk or two more disks without having to, without having to like buy a whole new batch of disks. Like move it to, you know, get your friend to bring their system over so you can copy it over there and then reformat it and then copy it back. What a pain. People don't want to do that. It's no fun. No. So basically this project is about doing all that complexity for you under the hood.

Starting point is 00:54:15 You add a new drive. We have to move all the data around to spread it out over all the drives, including that new one. But it all happens, the drives, including that new one. But it all happens automatically. You just type zpool attach, blah, blah, blah, blah, blah, hit return. It says, great, the expansion is in progress. It'll be done in 20 hours or whatever when we've copied all the data around.

Starting point is 00:54:39 But the cool thing about, the interesting thing about this project is how did it come to be? So long requested feature, how did it come to be? So, you know, long requested feature. How did it get funded? So actually, it's funded by the FreeBSD Foundation. So the FreeBSD Foundation is like a nonprofit. They help to run the FreeBSD project, but they don't run it. It's run by volunteers, but foundation um uh helps with like administrative

Starting point is 00:55:06 stuff and uh one of the things that they do is they fund uh software development so they contacted me a long time ago i can imagine i think three or four years ago actually it's got to be more than that because i remember working on this uh when my second child was a baby. So, you know, for at least four years ago now, and they said, look, we have this idea. We want to do something for ZFS users that something that is not going to be done, that's going to help, that's something that's going to help the small users that isn't getting done by big, by, you know, the contributors today who mostly come from, well, I should say from the people who are developing new features,

Starting point is 00:55:49 there's lots of contributors that, that are, you know, maybe not even C developers, right. That are contributing lots, you know, new tests, man page changes, all kinds of cool stuff. But the new features are primarily coming from these enterprise use cases who are funding developers. How do we get something developed that's going to help every user? And so they came to me with this idea of doing RAIDZ expansion. And I kind of came up with a design of how it could be done,

Starting point is 00:56:19 proposed it to them, and they said, yeah, let's do it. I gave them the timeline. I said it would be done in a year. And four years later, it's almost done. That's awesome. This is primarily because of constraints on my time and just not being able to spend as much time as I thought I would on the project. Do they come to you personally or do they come to you through your employer? Because your time on ZFS and OpenZFS is probably pretty divided in terms of how you personally spend it, right?

Starting point is 00:56:52 They came to me personally because they know me from speaking at the FreeBSD conferences and stuff like that. Fortunately, I was able to arrange it so that the consulting work that I do is actually through my employer, Delphix. So it makes it easy for me to, you know, not have to like software that's for Delphix and to be a leader in the community. So I'm fortunate that Delphix values open source and they've seen the value of being a leader in this open source community in terms of our brand within engineers and being able to do recruiting. You know, our team has recruited a fair number of employees from the OpenZFS development community. So, you know, the time that I spend like reviewing pull requests on OpenZFS is on the clock time, right?

Starting point is 00:58:02 I mean, that's my company is paying me to do that, which is pretty great. I don't have to do it only on my nights and weekends. You're living the dream. Yeah. So it's worked out very well for me and I'm definitely very fortunate to be in that situation. What's up, friends? I want to tell you about one of our new partners for 2022, MongoDB, the makers of MongoDB Atlas, the multi-cloud application data platform. MongoDB Atlas provides an integrated suite of data services centered around a cloud database designed to accelerate and simplify how you build with data. Ditch the columns, the rows once and for all, and switch to the database loved by millions

Starting point is 00:58:57 of developers for its intuitive document data model and query API that maps to how you think and code. When you're ready to launch, Atlas automatically layers on production-grade resilience, performance, and security features so you can confidently scale your app from sandbox to customer-facing application. As a truly multi-cloud database, Atlas enables you to deploy your data across multiple regions on AWS, Azure, and Google Cloud simultaneously. You heard that right.

Starting point is 00:59:26 You can distribute your data across multiple cloud providers at the same time with a click of a button. All you got to do is try Atlas today for free. They have a free forever tier, so you can prove yourself and your team. The platform has everything you need. Head to mongodb.com slash Atlas. Again, mongodb.com slash atlas. Again, mongodb.com slash atlas.

Starting point is 00:59:46 What's up, friends? This episode is brought to you by our friends at Retool, the low-code platform for developers to build internal tools. Some of the best teams out there trust Retool.

Starting point is 00:59:55 Brex, Coinbase, Plaid, DoorDash, Legal Genius, Amazon, Allbirds, Peloton,

Starting point is 01:00:01 and so many more. The developers at these teams trust Retool as a platform to build their internal tools, and that means you can too. It's free to try, so head to retool.com slash changelog. Again, retool.com slash changelog. so four years later though raids the expansion four years later raids the expansion the pr has been opened yep it's not landed in terms of it hasn't landed yet it's a 3-0 is a plan for 3-0

Starting point is 01:00:46 it's it is hoped for 3-0 okay i would be cautious about planning about using the word plan because we don't uh open zfs doesn't have developers like on retainer right like we're not paying anybody to develop anything it's all volunteer volunteer. So I don't, I, we can't speak too strongly about plans. Like we can do a release, but we can't make anybody, we can't make anything get in, right? It takes a lot of people to get it in. I've done, you know, most of what I need to do as the developer to get the PR ready. But other people, it takes a lot of contributions from different people. So we need people to do code review is the big one. And there's a lot of code there, a lot of very tricky code.

Starting point is 01:01:35 So we need other experienced developers to do code review. We also need other people that might just be users to do testing, help give confidence to other folks in the community that this is going to work right and not break their pools. And I guess that's just as easy as, right, if I was an end user to test that, I could use Send to send all my data to a new pool. Obviously, I have to invest in the hardware and the drives and stuff

Starting point is 01:02:00 and replicate essentially my scenario. But I can use my existing production data copy of it essentially in a Newsy pool, exact same scenario and, you know, do an expansion on that pool. Yeah. So that could be a way that an end user could help,

Starting point is 01:02:15 but it's a matter of getting access to that, that future branch and being able to compile it and put on their machine, which probably takes a lot of effort. Yeah. It takes a little bit of doing to know how to compile and install because it is like it's a kernel module. There's it's a little more complicated than just like downloading your normal thing. It should just work.

Starting point is 01:02:35 There's like automake and autoconfig. All that stuff is there. So, you know, the steps are like type configure and then type make install. However, you know, depending on the particulars of the system, getting it installed correctly in a way that it gets picked up. You know, for example, like in preference to the kernel modules that may already be there. If you have like an Ubuntu system that comes with ZFS kernel modules already, there can be some tricks there. But this is a well-trod road. I mean, there's hundreds and hundreds of contributors who have gone through these steps

Starting point is 01:03:08 on all the different operating systems. So if you would like to help, it might not be a one-liner, but there are a lot of people that can help you. What's the best place to go to get that help then? Would you say like the repository or would you say like an issue or the mailing list? What's a good place to like say,

Starting point is 01:03:23 hey, willing participant, I'll help test this at least as an end user. If you're looking to volunteer on something specific, like I want to help test RAID's expansion, then probably comment on the PR would be the right place. If you're looking for like, I'm trying to compile this so that I can help somebody else, how do I get it installed?

Starting point is 01:03:43 Then the mailing list would be a great place to ask. Okay, cool. Before we move on to another topic, is there anything else in, say, the feature set of ZFS that makes people like it's, we talked about copy on write being a killer feature. You didn't really mention it because it's such a baked in 20-year feature.

Starting point is 01:04:01 It's more like just, it's just the system. That's just how it is. It's not even a feature these days. Anything else? I mean, I think I like the ideas around the Z Intent Log, I believe is what it's called, and the LARC, which I believe is a cache. Yeah, the L2ARC cache. The Layer 2 cache,

Starting point is 01:04:16 I believe. Those are a couple things that can sort of speed up systems. That's one thing I'm actually taking advantage of. I have an SSD as my cache, which was a one-liner. Install the hardware one-liner to add it. It's like you make it so boring to manage the system. Come on, Matt, make it harder, right?

Starting point is 01:04:33 It's super easy to manage the ZFS system. I think that's great that that's your experience. I mean, that's absolutely our goal is to make it easy to manage. So I would say the killer features are the ones that we've talked about. RAIDZ, compression, checksums, snapshots, and clones, and replication. And those are things that have been in ZFS for a long time. Obviously, we've refined over the years and added other new stuff, but those fundamentals are what sell it for 99% of the users.

Starting point is 01:05:08 I want to dovetail a little bit to back to the past to some degree. There's a ZDNet article that quotes Linus Torvalds as saying, don't use ZFS. I'm sure you've seen this and I'm sure you've read it. And I think it's really around licensing. He says, don't use ZFS. It's that simple. It was always more of a buzzword than anything else, I feel. And the licensing issues just make it a non-starter for me. So I kind of want to go back into the past to some degree back to the sundays before oracle acquired it were you involved in licensing it you know

Starting point is 01:05:33 share some of the drama i suppose behind the scenes that kind of made open zfs possible because it was really close to not being possible with the acquisition. Like thankfully Sun and potentially even you and Jeff were contributors to the idea to use the common development and distribution license 1.0, which is an OSI approved license, which does sort of, it definitely makes it open source by being OSI approved in that sort of thin layer of like, yes, it's open source or not. No, it's not open source, but this ability to sort of keep it going after this acquisition, that's kind of what I want to talk about. So I mentioned Linus saying don't use ZFS. What's the backstory there? Yeah. So the interesting thing is that,

Starting point is 01:06:14 you know, when we started working on ZFS back in 2001, you know, it was part of Solaris. Solaris was proprietary software. It was, I think at the time only available on, you know, with Sun's Spark-based hardware. Solaris X86 kind of came and went a couple of times. So maybe it was available on X86 hardware at some point. But, you know, as far as we knew, when we started out, we were developing proprietary software. But a couple of years into it, I think, I'm going to say maybe 2003, they started working on OpenSolaris and meaning like working on open sourcing Solaris and creating OpenSolaris. And I wasn't involved with those decisions or the licensing decisions. You know, I was like a junior engineer two years out of college.

Starting point is 01:07:07 Nobody asked me. You didn't need to know about this. Yeah. So when we, well, when I found out about it, I was thrilled. You know, I thought, oh, this is great. Like we're going to do an open source Solaris. Like ZFS is going to be part of it. We're going to open source it.

Starting point is 01:07:22 This is wonderful. We definitely didn't imagine how successful it would be outside of Sun at the time or how enabling that would be for our technology and in my career to continue for so long. So we kind of lucked into that. It lucked into it being open sourced. It was open sourced. We released it as open source first. So it became, when we integrated it into the Solaris code base in October of 2015, sorry, in October of 2005,

Starting point is 01:07:54 a long time ago, it went out as open source the next week, and before we'd ever shipped it in a product. That was really cool. People started using it, picking it up from the open slurs by weekly builds. So then from 2005 to 2008 or 2009, we were developing it in the open. It was picked up by, I think maybe FreeBSD was the first other operating system to take

Starting point is 01:08:22 the code and port it to FreeBSD. It became very successful there. And then towards the end of that, picked up by the folks at Livermore, Lawrence Livermore National Labs to port it to Linux as well. In terms of the, maybe I should talk about the licensing a little bit now. So the CDDL, as you mentioned, is an open source license. It was created by Sun to open source Solaris and create open Solaris, which ZFS is part of. I can't really speak to the motivations of why that license,

Starting point is 01:09:00 why didn't they use an existing one? Why did they come up with any of the particular terms in the CDDL? My, but my, my understanding of the intent is that it's called a, like a weak copy left type of license, which means that the changes that you make to ZFS or other CDDL license software,

Starting point is 01:09:28 like if you make changes to our software and you ship those changes, then you need to make your modifications available. So that's kind of similar to the GPL as opposed to more permissive licenses like the BSD or Apache licenses, which are basically like, here's some software. You can do whatever you want with it.

Starting point is 01:09:47 You can contribute changes back if you want. You don't have to contribute back the changes. So it's kind of more similar to the GPL in philosophy in terms of like, you need to contribute back the changes that you make. The main difference that I see with the GPL versus CDDL is that the CDDL explicitly applies on a per file basis. So like if you wanted to do something with ZFS and add some new functionality and not release

Starting point is 01:10:16 that new functionality as open source, you could put it all into a new file, compile it with the rest of the ZFS code. You know, maybe you have some changes into the existing ZFS files that you do have to open source, but you could do that and not open source your new file, your new source file and keep your new feature private in that way if you wanted to. Versus the GPL, like it's not as explicit about what you know constitutes uh change that needs to be open sourced and people generally interpret it much more like broadly that like anything kind of in the vicinity you got to open source it and you know you got a gpl as to if your if your code is near our code then your code has to also be gpl is kind of how people interpret it and i'm

Starting point is 01:11:02 deliberately being vague about what you know, what does near mean? Cause, cause there's, there's, you know, dissenting opinions about that. So Linus's comments, I think that as,

Starting point is 01:11:15 as you quote, as you kind of heard in the quote, I think that Linus has no love for Oracle. And I think that, you know, he's concerned, or at least at the time that he wrote that, he was concerned about- Litigious Larry, he calls them as. Litigious Larry is Oracle.

Starting point is 01:11:30 So I think that the reason that he was saying don't use ZFS was to avoid Larry suing you, sort of, is how I interpreted it. And, you know, I'm not a lawyer. I'm not giving anybody legal advice, but nobody has been sued for using ZFS since the NetApp lawsuit, which was a NetApp Sun lawsuit more than 10 years ago. And nothing came of that lawsuit. So, you know, nobody won or lost in that lawsuit. Everybody just dropped it.

Starting point is 01:12:02 The reason why I bring this up is less to be, like, be provocative like, oh, Linus says don't use ZFS, but more around this unavoidable tension between developer, you, and everyone else involved in the creation of ZFS and then eventually being open sourced through this license. And then the world-changing opportunity of software and then the license that sort of stands between that opportunity because I quoted linus saying that but i didn't quote him as saying which i'll do now as saying i i can't integrate this because i want it seemed as though i'm i'm t leaving here like between the lines but it seemed as though he wanted to integrate zfs into the linux kernel but was unable to do so because of essentially the license, the GNU license that Linux stands upon and the difference with the CDDL license that ZFS was licensed as, as OpenSolaris. And then I'm sure there's some details in there that made OpenZFS possible, which is super awesome because despite this acquisition, this accidental

Starting point is 01:13:06 to some degree, open sourcing of ZFS, it gets to live on and you get to have a career beyond this proprietary software you were originally hired to build, which I think is super wild in terms of a journey for a software developer like you. And then a community to appreciate and enjoy and use your work. Like if you wrote your best software and no one can use it, did you write the software? You know what I mean? Kind of like the tree, did it fall, did it make a sound kind of thing. It's almost like that.

Starting point is 01:13:36 Yeah, I agree. If no one can adopt your software and enjoy it, did you write the software? Kind of no, really, right? Yeah, I mean, that's one of the reasons that I really love open source is that it makes the software available. It makes it available without the constraints of like any one company living or dying or deciding to do whatever. If it's good software and it's useful, then people can continue using it and extending it and making it continue to be

Starting point is 01:14:05 relevant. Like the fact that we could take like the ZFS that Sun was doing in 2009 or 10 and take that and run with it as part of the Illumos project and part of the OpenZFS project. I mean, that's open source. That's what it's supposed to be. There wasn't really anything special that let us do that. Like the fact that it was an open source license let us do that. I think that it wasn't a given that people would actually pick it up and like continue the software development. So that's one of the reasons that the Lumosify and provide some kind of leadership around ZFS development that was happening on Lumos and FreeBSD and Linux altogether.

Starting point is 01:14:51 And that happened in 2013, right? Like OpenZFS began in 2013. Yeah. The original project, proprietary way back, before it was even open source licensed, was 2001. So you got, what, 12 years between inception of the project, several years before the common development distribution license was instilled, right, when you did the open Solaris part of that.

Starting point is 01:15:14 I mean, if that didn't happen, like, I don't know who did that inside of Sun, but like, if that didn't happen, then ZFS, as we know, it would have died. Yeah. You know, and it would still be, it would be in Oracle now because Oracle still is developing Solaris, right? It would be the closed source ZFS, which is continuous. You get a fork in the road.

Starting point is 01:15:32 This is history that I'm sort of sharing with the listeners. Like there's a fork in this road of ZFS, which is one that ended and sort of bifurcated, right? You got the open ZFS version that began in 2013 or whatever timeframe. That was maybe the 2009 snapshot of the project.

Starting point is 01:15:47 And then there's the closed sourced Oracle version still yet that is ZFS inside of Oracle, which I guess is just called Oracle ZFS. Yeah, so Oracle continued developing ZFS internally and just not sharing that source code with anyone. And that's fine. And the open source community picked up the open source code and we've continued developing that. And people maybe have asked, like, you know, which one is better? That was my next question. Which is better, Matt? That's really an academic question because nobody's really baking off like open source ZFS on Linux versus Oracle ZFS. The target audiences of these are just very different.

Starting point is 01:16:33 The target audience of the Oracle ZFS is probably people that have been locked in by Oracle. It's not about which one is better, it's just like, can I escape the clutches or not? Well, the good thing is that you are continuing development. We've just speculated about what will be in 3.0. Some other things that I think are interesting in the maybe category of OpenZFS 3.0, one, RAID-Z Expand we talked about. A couple that hit my radar, which is ZFS on Object Store, which I saw a talk on that from a recent conference, which I thought was pretty cool,

Starting point is 01:17:06 which is like ZFS in the cloud essentially, which I think is just really interesting to think about like different clouds being different, you know, in the V devs and whatnot. I run Mac OS primarily as my primary machine. So I'm excited about the opportunities of Mac OS support in the future.

Starting point is 01:17:22 But that's, I mean, I'm sure there's other cool stuff in there, but that's what hit my radar in terms of like, can't wait, looking forward to 3.0. But that's, I mean, I'm sure there's other cool stuff in there, but that's what hit my radar in terms of like, can't wait, looking forward to 3.0. So that's the good thing, though, is that it was open sourced. You and others are continuing to develop it. And there's a community behind this. You got the conference that happens each year, books being written, blog posts.

Starting point is 01:17:39 There's still a lot of momentum behind this project, obviously. Yeah, we have a ton of people contributing every year. We have our annual conference. We have monthly video calls where we're talking about new features and kind of getting design reviews and making sure bugs are being addressed. So the community is very active. If folks would like to participate, you can find info on OpenZFS.org. We have links to all the videos from past conferences

Starting point is 01:18:04 and how to join our Zoom meetings monthly.org. We have links to all the videos from past conferences and how to join our Zoom meetings monthly. Cool. Well, Matt, is there anything else that I left unchecked in terms of talking about your career trajectory? Maybe the only open question mark I really have that you can touch on if you'd like is is how you how you negotiated with delphix to be able to contract on top of the open source like i think you know developers the reason why i ask that question is less like are you an amazing negotiator probably but more so like if there's other devs out there who are thinking like i want to keep contributing to open source how do i how do i negotiate with my employer how do i obviously delphix appreciates and embraces open source? How do I negotiate with my employer? How do I, obviously Delphix appreciates and embraces open source, so maybe developers are already at a place like that. But if they're at a place where they embrace open source, what are some things

Starting point is 01:18:51 they can do to do things you've done to be able to buffer in the give back and the impact beyond just simply their daily nine to five at their job? Yeah, I think that the consulting per se, like getting paid for work is kind of a special case. I would probably focus on like, how can you contribute to open source, like as part of your job. And I think that there, it's mainly about like, making sure that your employer understands what they're getting out of it, right? Everybody wants to know what's in it for them, developers as well as employers. In my experience,

Starting point is 01:19:33 Delphix has been involved with ZFS and OpenZFS for a long time, 10 years or so, and it's a fundamental technology for our product. So the benefits are like... Super clear. Yeah. First of all, like we're using this and we want to make it better and we want to make it better in the best way. We want to get the contributions from the community and we want to be able to have other people from the community, like testing and validating the changes that we're making. So that's just on a very like low level. Like we want our code to work. We want our code to be the best it can be.

Starting point is 01:20:05 And to do that, like we want to get these contributions from other people. In order to get the contributions from other people easily, we need to upstream our changes so that we don't have merge conflicts all the time. And we want our changes to be validated and checked and tested by the community. So that's a very low level,, like quantifiable benefit to the company. The next level of benefit is like the corporate branding almost. Like it makes the company look good when people in the community see, oh, Delphix is contributing to OpenZFS. Oh, Delphix is leading OpenZFS.

Starting point is 01:20:42 Delphix is helping to organize this conference about OpenZFS. It creates mindshare around Delphix is a cool place. Delphix is a cool company. Even if I don't know anything about their actual product of database virtualization and masking and whatnot, I know that they're doing this cool open source work and that makes them seem cool. So for our case, our customers, the people that we're trying to sell to are generally not like software developers. So it doesn't go directly to our bottom line, but you know, there's a lot of other things that companies do besides just like

Starting point is 01:21:15 exchange services, goods and services for dollars, right? Like recruiting, like recruiting. So almost, I would say more than half of the team that i work on of about 10 people is people that have joined us from the people that i knew from the open source community and a lot of it was like serendipitous encounters where it's like i was asking one person like hey like we're looking to hire do you know of anybody and then somebody else happened to overhear that and be like hey uh are you looking to hire, do you know of anybody? And then somebody else happened to overhear that and be like, hey, are you looking to hire? Because I'm interested. So it's not, in terms of the branding and reputation and whatnot, it's a lot

Starting point is 01:21:55 harder to pin directly on it. I think you're going to have to find somebody within the company that kind of believes in that because it's less quantifiable. But at least in my experience, the benefits have turned out to be very real in terms of the reputation and, you know, within the software engineering community. What about you? How are you feeling about where you're at with your career and what you're working on? Any closing thoughts on like, are you winded with ZFS? I mean, 20 plus years so far with, I mean, I'm just going to imagine like you eat, sleep and breathe, you know, work-wise ZFS to some degree. Like, are you burnt on it?

Starting point is 01:22:34 Are you done with it? Are you more motivated than ever? To be honest, like the ZFS is, it's getting to be older, right? I mean, 20 years is a long time, even within enterprise software. And I think that it can be a challenge to remain relevant as things change within the industry. With things like, you know, first we had the challenges of SSDs with very different performance characteristics. Then with virtualization, changing kind of where the storage hardware fits into the stack. And now with the cloud, even more so the separation between the storage hardware and the actual use of it. So I think it could be a little discouraging. But to me,

Starting point is 01:23:17 you know, the project that we're working on now with ZFS on object storage has just been like incredibly fun. And I feel like we're, we're taking ZFS to the next, like we're giving us some more legs that'll, that'll keep it relevant for another decade. And it's not necessarily like, it isn't something that's going to be used by every ZFS user today, but it's going to enable a lot more ZFS users in the future by making ZFS integrate even better into the cloud and bring those capabilities of snapshots, compression, all that stuff to object storage and good performance object storage. And I've really I've been having a blast the past year with the team developing that and designing it. A lot of the code is actually in user land in writing and rust. So we all like learned rust, which is really exciting.

Starting point is 01:24:08 It makes me never want to touch C again, even though it is my job to do so. So we're going to do it. But, uh, you know, I rusted just feels so like, it feels so comforting now that I've learned it,

Starting point is 01:24:20 the, the, the safety of it feels very comforting and, and it makes the dealing with raw pointers in C everywhere feel scary as it should be. I would say like it should feel scary. It is hard. Like, you know, you got to get everything just right with C in order to not have bad bugs. That's, that's how it's more work, but that's fun work too. I see ZFS continuing to be relevant because we're adding these new use cases to it. And I find that really exciting.

Starting point is 01:24:49 On the Rust note, what made the team choose Rust? Was it because it's on the network? Yeah, so first we chose UserLand before Rust. So we need ZFS to talk to the object store. So it needs to talk like HTTP and HTML and JSON and all this stuff. And we did not want to do all that in the kernel. So we decided, okay, we'll have some user land process

Starting point is 01:25:15 that the kernel is going to talk to you to say, like get this block, read this block, write that block. And then this user land process is going to deal with like turning it into S3 requests. And once we had done that, then we thought, well, you know, C is like not the, there are languages that are higher level than C that could make our job easier. And so we looked around at like what the options were there. And Rust seemed like, like I didn't do a comprehensive survey of every possible language.

Starting point is 01:25:54 But because Rust seems so similar to C in terms of like, it's a low level language, you know, there aren't scary things like garbage collection. Java may have been another choice, especially given that like the rest of Delphix's software is written in Java. So like in-house, we have a bunch of Java developers, but the performance aspects of it, we felt more confident that we would be able to get all of the performance out of the hardware with a low level language like Rust. And then having the, you know,

Starting point is 01:26:16 the ecosystem of all of the Rust crates would let us develop it faster. And then, you know, the safety of, you know, not having like memory corruption would also let us develop it faster because we wouldn't have as many crazy bugs to debug interesting well cool i'm sure we can probably like do a whole entire separate segment that's deeper that than than

Starting point is 01:26:37 that answer there on rust because that's always interesting to be like you develop most of this in c so or all of it in c so why would would you choose Rust in the use land part of that? I'm always curious about those questions. Yeah, I mean, C would have been the kind of natural choice. I'm sure that there's libraries that we could have found for C to do all of the network communication, JSON stuff. But I feel really happy about the choice to use Rust. Good.

Starting point is 01:27:03 Anything else? Anything left unsaid? This is the closing. So any advice for those who are going to pursue a land that has ZFS all over it for them? Maybe they got some spare drives they want to play with. They got a home lab. They got that

Starting point is 01:27:17 Plex server that's still clunking on an old Mac mini or something like that. They want to move it to a Linux box with ZFS. You know, whatever. What kind of advice you got? Closing thoughts. I would say just go for it. I mean, if you like tinkering, then I would just like get your,

Starting point is 01:27:32 get some install of Ubuntu or another OS that has ZFS in it, maybe FreeBSD and start running, you know, Zpool create and whatnot. If you want to use ZFS, but don't necessarily like tinkering in the internals of everything, then a more packaged solution like FreeNAS would be another good option. Yeah.

Starting point is 01:27:51 Got a good interface for just doing most of the work. You don't have to do any of the command line stuff at all, really. It's all just a UI for it, which can be nice. I prefer the terminal when I manage ZFS personally. I feel like I can actually feel the heartbeat of the software rather than some UI trying to tell me what to do. I just couldn't understand it more. Once I moved to the terminal to mess with ZFS, I felt a lot better.

Starting point is 01:28:18 That's my take on it at least. I love that as well. I know some people don't necessarily take the same joy we do from feeling the heartbeat of their software running. So I'm glad that there are more kind of packaged, guided solutions as well. Well, Matt, it's been a pleasure talking to you through your software career, OpenZFS, the future and the past of ZFS itself. I really appreciate it.

Starting point is 01:28:42 Thank you so much for your time and really appreciate you. Thank you. Thanks for having me. That's it for this episode. Thank you for tuning in. If you enjoy the show, do me a favor, share it with a friend. And of course, thank you to Fastly for all that awesome bandwidth and also Breakmaster Cylinder for making all of our awesome beats. Here's a pro tip for you. Check out changelog.com slash master. That is our master feed. Get all our shows in one single feed.

Starting point is 01:29:11 And for those super little listeners, check out changelog.com slash plus plus. That's our membership. Get all our shows with no ads, plus some other perks. Again, changelog.com slash plus plus. That's it for this episode. Thanks for tuning in. We'll see you next time. Thank you. Game on.

The Changelog: Software Development, Open Source - Making the ZFS file system (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.