The Changelog: Software Development, Open Source - Making the ZFS file system (Interview)
Episode Date: January 18, 2022This week Matt Ahrens joins Adam to talk about ZFS. Matt co-founded the ZFS project at Sun Microsystems in 2001. And 20 years later Adam picked up ZFS for use in his home lab and loved it. So, he reac...hed out to Matt and invited him on the show. They cover the origins of the file system, its journey from proprietary to open source, architecture choices like copy-on-write, the ins and outs of creating and managing ZFS, RAID-Z and RAID-Z expansion, and Matt even shares plans for ZFS in the cloud with ZFS object store.
Transcript
Discussion (0)
What's up, welcome back friends.
This is the Change Log.
If I sound a little different,
it's because I'm coming off of a cold
and I appreciate your patience
with my somewhat nasally voice today,
but welcome to the Change Log.
We do appreciate you listening.
And today I went solo talking to Matt Ahrens.
Matt co-founded the ZFS project
at Sun Microsystems back in 2001.
And of course, 20 years later, I picked up ZFS for my use in my home lab and I loved it.
So I reached out to Matt and invited him on the show.
Today, we cover the origins of the file system, its journey from proprietary to open source,
the architecture choices like copy on write, the ins and the outs of managing ZFS,
RAIDZ and RAIDZ expansion, a highly sought after feature coming soon.
And Matt even shares plans for ZFS in the cloud with ZFS Object Store.
Big thanks to our friends at Fastly for making our podcast super fast for you to download anywhere in the world.
Check them out at Fastly.com.
This episode is brought to you by our friends at Square.
Square is the platform that sellers trust.
There is a massive opportunity for developers to support Square sellers by building apps for today's business needs.
And I'm here with Shannon Skipper, Head of Developer Relations at Square.
Shannon, can you share some details about the opportunity for developers on the Square platform?
Absolutely.
So we have millions of sellers who have unique needs.
And Square has apps like our point of sale app, like our restaurants app.
But there are so many different sellers, tuxedo shops, florists, who need specific solutions for their domain.
And so we have a node SDK written in TypeScript
that allows you to access all of the backend APIs and SDKs
that we use to power the billions of transactions
that we do annually.
And so there's this massive market of sellers
who need help from developers.
They either need a bespoke solution built for themselves on their
own node stack, where they are working with Square Dashboard, working with Square Hardware,
or with the e-com, what you see is what you get builder. And they need one more thing. They need
an additional build. And then finally, we have the app marketplace where you can make a node app
and then distribute it so it can get in front of millions of sellers and be an option for them to
adopt. Very cool. All right.
If you want to learn more, head to developer.squareup.com to dive into the docs, APIs, SDKs, and to create your Square Developer account.
Start developing on the platform sellers trust.
Again, that's developer.squareup.com. Matt, I'm a big fan of your work on ZFS, and I'm so glad to have you here at The Change Law
because I'm a newcomer to ZFS. So as
you know, because you're a co-creator of it, it's been around for a very long time. It was created
in 2001. And my first use of it was in a Homelab production scenario, powering my Plex server,
basically, in the year of 2021. So I'm 20 years behind the adoption curve of ZFS. But when I found out about
it, I loved the file system. I was like, you know what, we got to get Matt on the show. So
welcome to the show. Thanks. Happy to be here. Do you do any podcasts? Is this a thing you do? I
know you give a lot of talks and you're in front of people a lot around ZFS and the community and
whatnot. But what do you do around podcasts? Not really.
I'm not really plugged into the tech podcast scene.
I did one or two many years ago.
There's two and a half admins, I believe, out there.
I think even one of the writers around ZFS has been on there and maybe even a contributor to ZFS, I'm not sure.
I'm new to the ZFS scene.
Yeah, I know the hosts of that podcast,
and they'll probably hit me up at some point.
Yeah, yeah.
You should go on that show.
I like it.
I listened to a few of them.
But yeah, I wanted to get you on to talk about ZFS
because it's such a cool file system.
It's got some interesting roots in open source,
and it's obviously an OSI-approved license that it's listed as,
but it's got some drama behind the scenes.
And I figured who better to go through the backstory of its origination and the problem
set and its history and then to current than you, as you're a co-creator of it.
Back in 2001, it was a file system designed by you and by Jeff Bonwick for the OpenSolaris
operating system for Sun Microsystems.
They were eventually acquired by Oracle.
And I just want to go into that history, whatever you want to share around that process, like
the ZFS origination, you know, what was the problem set?
What was OpenSolaris trying to solve at the time?
What brought you to Sun at the time?
Wherever you want to begin.
So open it up.
Sure.
I joined Sun and the team just after college. So it was my first job out of college.
And I was lucky enough to be recruited by
Jeff Bonwick to join him
and work on a new file system. So at the time that I joined,
it had been pitched to me as like, come join and we're going to work on a new
file system. And I just thought that was like the coolest thing I'd ever heard of.
That was the motivation for me.
I showed up and it really was like nothing had been written yet.
So zero.
That's cool.
Yeah, zero.
So Jeff and I started from what should this be doing?
And, you know, I was obviously very junior software engineer at that point in time. So a lot of the ideas of like what it should be able to do and where it should fit in the industry came from Jeff.
But really, we wanted to make a replacement.
Originally, we were just thinking of it as, hey, UFS is kind of hard to use.
UFS was like Sun's file system before ZFS.
UFS is hard to use. How can we make this easier to use?
And we looked around at how people were using it, mostly in the enterprise context.
So most of them were using it with volume managers, with either Sun's volume manager or Veritas volume manager were very popular at the time.
And the volume managers were hard to administer, hard to set up.
And then they had all these weird failure modes
that some of the in-house sysadmins at Sun had experienced.
Sun had a server that was called Jurassic.
It was a, for the time, giant server that the kernel developer engineers ran themselves.
So people took turns being primarily responsible for that.
And it used UFS and it used the Solaris volume manager.
And I think that there are some horror stories that predated my arrival about disks dying
and being re-silvered incorrectly by the volume manager and maybe mistakes being made due to the difficulty of understanding
what was really going on there,
even from people who are very experienced
with software and computers.
So, you know, one of the taglines
that we created after the fact
was that the goal of ZFS was to end the suffering
of administering storage hardware.
And I think that that's pretty accurate.
Yeah.
It's a very painful process.
It can be very painful.
And I think that ZFS has succeeded in large degree
at addressing the problems that we saw 20 years ago.
So that's kind of like the very high level
of what we were trying to do.
I'll point out that we were not setting out, the high level was not to create a product.
It wasn't to make the fastest software.
It really was to address the pain points of the difficulty of administering.
And I think that those goals or lack of goals kind of have a long shadow, right?
Yeah.
I think that you look, I'll get into some more of the specifics of what that meant,
what those goals meant back in the day.
But, you know, you look at even now, 20 years later, ZFS, it does perform well.
But, you know, when people do benchmarks against other file systems, and a lot of times ZFS
performs better than them.
Sometimes it doesn't perform as well.
Like I think that the people behind ZFS,
like we don't really sweat that much.
I don't see people being like, oh, like we got it.
We got to beat them in this other,
in this maker benchmark, like what's going on?
I think that the thought is more like on the whole,
ZFS is very useful and performance is part of that utility, but snapshots are part of that utility.
Replication is part of that utility.
Checksums, compression, all of the different things in ZFS that work well together and are easy to use together and hopefully easy to understand what's going on.
That's what brings a lot of the value compared to other technologies and product, right?
So today, like back when we created it and today, ZFS is not a product.
There are products based on OpenZFS doing all kinds of different things.
But OpenZFS is just an open source project.
And we're working on creating that fundamental technology
and making
that easy to use for system administrators and also easy to integrate into systems and products.
I think it's interesting, this history of it, because I can imagine as a junior developer
coming out of college with a blank screen, essentially, with Jeff, one, you probably grew up a lot in terms of
being a software developer and even a human being, right? I mean, your whole career has been spent
essentially on what is ZFS and now OpenZFS, the project. And I think that's just interesting how
you can sort of attack a problem set way back in the day in that exact scenario. Come out of
college, junior engineer, junior developer, you know know first real job right like it was your first real job and as a programmer yeah and now
you're still doing it like it shed some light to like i guess the interesting bits around starting
like sometimes you never know we're gonna end up you know like where you might end up is is sort of
like this question mark and it's like well it could be a dead project or it could be something that people really get value over
20 years or more.
Yeah, I think that I was very fortunate and lucky
both to work on something that turned out
to be so successful.
Like it's definitely more successful
than our wildest dreams of 20 years ago.
And also very fortunate to have the opportunity
to work on something that's, that's brand new, even if it wasn't successful, you know,
creating cool technology is always fun and it's a great experience. And then to be able to do that,
you know, with, with a great mentor, somebody that, um, had more experience and, uh, was willing to do
a lot of the hard work that was probably invisible to me at the time
of making the project exist
and making the space for me and other developers
to write the code
while there's a lot of other things going on within the company.
People want different things out of it,
organizational things,
which thankfully I didn't have to get too involved in at the time. But I know Jeff
put a lot of work into that as well as the work that he did on designing and implementing it back
in the day. So when somebody asks you, what is ZFS? How do you how do you describe it? What do
you say it is? It depends on who's asking, I think, or what I, you know, what level I think
that they're going to be able to understand it
because everybody understands things in the context that they have.
So let's say an everyday software developer that doesn't know much about file systems.
They know they exist.
Sure, they're on their computer.
They use them, but they're just an everyday developer.
They're not touching file systems too often.
So for developers, first of all, you know what a file system is.
Hopefully, you kind of understand what a file system is
and what its purpose is in the most basic sense of storing files and data on hard disks.
ZFS, our tagline from back in the day is that it combines the functionality
of a file system and a volume manager into one integrated solution. And it makes using, it brings enterprise level storage
technology to the masses. So those are technologies like snapshots and compression and replication
that, you know, those don't really exist or they're very primordial in more traditional file systems. And so ZFS lets you get
a lot more out of your storage system and it lets you build really powerful storage systems with
just a bunch of disks or SSDs, combinations of those without expensive technology,
without expensive enterprise products. Right. Who's using, who's using CFS? I mean, I,
I mentioned I'm a home lab or pretty much I'm a home lab or user at least.
And I would call my scenario enterprise home lab because I don't want it to go
down necessarily. If the data died, it's my Plex server.
So like it's movies, right?
I don't want to rip all those movies again. And it's a lot.
I think I might have like 10 or 14 terabytes of movies, maybe more than that even.
I'm not even sure.
4K, 1080p, but that's my use case of it.
But I imagine there's a lot of Homelab users out there.
There's a lot of enterprise users out there.
You mentioned it's for the masses, so I'm the masses.
I'm the user of that.
Where is the – you're employed by Delphix.
You get paid to do this daily. We talked about Sun from back in the you're employed by delphix you know you get paid to do this daily we talked
about sun from back in the day acquired by oracle like this has been a career for you so you've
obviously done some cool stuff with it but like where is it being used at the most highest level
and at the most lowest level like say a home lab like me yeah it runs the whole spectrum um
and one of the great but also challenging things about open source projects is that we don't necessarily know where it's being used, right?
People can pick up the code and do whatever they want with it.
We don't have like a product and a list of customers and numbers or things like that.
But I can tell you about some examples. Obviously, there's a lot of folks like you who are using it at home or in very small businesses.
And that's probably the majority of people using, touching, running ZFS commands is probably those kinds of users because there's so many of them.
The amount of data or the demands on the performance of those systems
might not be the highest.
And I think that those types of users are the ones that tend to be
underserved historically by enterprise-focused open source projects.
Because most of the work is done by people who are paid to do it,
and they're paid to do it to make it work in some higher-end type of deployment.
Right, some sort of paid scenario, some sort of...
Yeah.
Even though it's open source, some sort of product
that uses the open source to create a cloud product
or some sort of serverless or service, essentially.
Yeah, so if we go from there to kind of the very highest end,
there's folks like Lawrence Livermore National Labs,
which is a
US government research agency. And they run some of the biggest supercomputers in the world.
And they actually originally ported ZFS from Solaris to Linux to be able to use on these
enormous supercomputers. So Brian Bellendorf is the one who started that quite a few years ago now 10 plus
years ago i think and they've been doing a lot of the work to maintain the zfs on linux and um you
know they're using it in their huge supercomputers uh i don't have the numbers handy um they probably
are available but it's like you know petabytes and petabytes and petabytes huge huge things that
take up you know warehouses full right and i think that kind of because of their leadership, a lot of other supercomputer type
applications have picked this up as well.
So, you know, Cray, HPE, Intel have all done work on putting ZFS into use in supercomputer
type applications.
One of the interesting things about those applications is that I didn't know this.
When I learned about supercomputers back in the day, it was like, oh, it's all just like
you're running a bunch of numbers and writing the numbers out. And so presumably it's just
giant files, right? You read some big files, you write some big files. Maybe you probably care
about throughput. And that's kind of the traditional space of things like Lustre. Lustre is a distributed file system that can run on top of ZFS. So it has pretty tight integration with ZFS.
And they advertise things like, you know, basically like if you have enough servers and
enough clients, then you can get like your full network switch throughput of however many like
terabits per second or whatever, because it's fully distributed. But a lot of those workloads,
HPC is not just these big file streaming stuff.
They do a lot of small file creations
because a lot of these workloads are written
by folks who are not file system engineers.
They're just trying to solve their problem,
and it doesn't necessarily map onto all giant files all the time.
And so you see a lot of various workloads.
So I did some consulting for Intel several years ago now
where they were trying to improve small file creation performance,
which is not what you'd expect coming from an HPC-type workload.
So even these largest, large-type use cases,
they have a lot in common with home users use cases, right?
Where it's like creating lots of files.
Like that sounds like maybe, you know,
downloading your photos or reading your mail spool or writing out all your
little text files for your code base.
You know,
if you're a software developer reading lots of small files when you're doing
a compilation.
So a lot of these use cases really transcend like the large to the small.
In the middle, you have, I would put in the middle a lot of use cases of ZFS where companies
have taken ZFS and embedded it into another product. So one example is folks like IX Systems and Nexenta
who have made kind of general purpose storage appliances
based on ZFS.
So inside is ZFS,
and then they have a nice management interface
that makes it easy.
I think Plex is probably also in that category.
And there's a lot of people
who might be using those products.
FreeNAS comes from IX Systems.
So a lot of people use FreeNAS in their home systems.
A lot of people might be using those types of systems
and not even know that there's ZFS under the hood.
They're just like, I have a Plex.
I don't know what's going on.
Like, yeah, it has compression.
That's great.
It has RAID.
That's great.
That's interesting because, you know, I'm in this home
labor scenario and I think obviously I'm a developer and I'm a taking kind of person, but
you know, there's this world where in the future where people are going to have, I would probably
guess like their own clouds in their house. You know, as the technology
gets more and more accessible, you're going to have the need for potentially, you know, privacy
and storage and stuff like that. For example, I've got a unified network. I've got unified cameras.
They're local to my network. There's obviously some external accessibility via
unified and stuff like that, but I'm trying to keep things local. Plex, I can access from outside
my network. That's the extent I'm using it so far. I plan to eventually move our podcast archive, which I think is around the same.
It's around 8 or 10 terabytes of data we've collected over the 12 plus years we've been doing podcasts.
Which, you know, those archives are very precious to us.
Like we lost, you know, the early days of this show, for example.
You know, it probably wouldn't make or break the business, but we could never go back and alter that
or remix them or remaster them
or do something new in the future if ever we wanted to.
So our archives are pretty precious to us,
but I think it's really interesting how you can serve,
you can do this project and serve such a wide degree
of, in quotes, customer type, user type, enterprise,
small businesses to home labbers.
That's just so wild how this file system could potentially power this future where I think more and more people will have a NAS on their home network.
That's interesting to me.
You mentioned private clouds, home cloud.
There are a couple of companies, you know, Joyent tried to do this of like taking the interface that you can get from a public cloud and sell a bunch of gear to a company that can deploy that on-prem and then have that same kind of cloud usability, you know, in a data center.
You could imagine like pushing that to smaller and smaller and
smaller deployments, right, into home and small businesses. And I think some of the folks from
Joint, like Brian Kentrell, they are at Oxide now, which is kind of doing something similar,
taking on even more of the stack. But both Joint and Oxide are using ZFS as part of their storage
subsystem in those like cloud in a box or private cloud type deployment scenarios. This episode is brought to you by Influx Data, the makers of InfluxDB,
a time series platform for building and operating time series applications.
In this segment, Marian Bija from NodeSource shares how InfluxDB plays a critical role
in delivering the core value of the APM tool they have called InSolid.
It's built specifically for Node.js apps to collect data from the application and stack in real time. At NodeSource, we want to lean into a time series
database and InfluxDB quickly rose to the top of the list. One of the unique value propositions
of Nsolid is real-time data. And there is a lot of APM tools out there, but there is a variance
in terms of how available the data is. It's not
really real-time. There is actually a staging period to data, and InfluxDB is magical and
allows us to deliver on our unique value proposition of real-time data with Insolid.
To get started, head to influxdata.com slash changelog. Again, that's influxdata.com. So what is it then you think that makes people choose ZFS?
Like of all the choices they have in the Oxide scenario or in my scenario,
I've got a 45 drives Stornator, for example, here.
It's got a 12, I think it's got 15 drive bay.
For example, I can fill it up with massive amounts of storage. You know,
it's a Linux box. It's, it's running Ubuntu, you know, 2004 or whatever, you know, what,
what makes someone like me or someone like them choose ZFS for this storage engine? Why is ZFS
the choice? Like what particular features, what makes them choose it? So I think that it's,
it's kind of out of necessity. There's really two kinds of data.
There's disposable data where you can put it on whatever you want.
You can put it on your thumb drive with FAT32.
Sure.
And it doesn't matter.
There might be some performance requirements, but the requirements are not that great.
And in those scenarios, you might use ZFS for interoperability if you're used to it,
but there's not a ton of use cases. But if you care about your data,
then I care about my data, Matt. Yeah. And I think most people do. If you care about this data, then
you need to be able to have some redundancy with it. So you need kind of the functionality of a
volume manager where you can have multiple, you know, drives and some of them can die and you don't lose all your data.
And you need to know that the data that the drives give you is correct.
So you need checksums.
And even just those two basic requirements, there's not, I mean, there are other technologies that do that.
They're much harder to use typically. And so I think the choices are like, if you're
deploying it yourself, ZFS is just so much easier. If you're making a product, then, you know,
your customers might not know or care, but, you know, the building your product on something
that's as capable as ZFS is going to reap long-term rewards. And the fact
that ZFS is under continuing development and improvement means that your product has a solid
foundation that's going to keep up with what the future holds for software and hardware storage.
That's just the very basics, right? I think a lot of people even forget about that because
there's all these other cool, great features with ZFS that are very exciting, like snapshots, right? Being able to protect your data with snapshots. A lot of people nowadays are thinking about ransomware, right? And what if somebody has some virus or whatever they call it that encrypts my data or alters my data or deletes it, how do I recover that? Well, ZFS has built-in snapshots, takes snapshots every day or every hour.
The storage cost of them is very low.
You're only paying for the data that's different each snapshot.
And the performance impact is basically non-existent.
So you get a lot of protection from accidental or malicious changes to your data very easily and at very
low cost in terms of the hardware that you have to pay for it.
Things like compression built in.
I think a lot of people nowadays take this for granted.
At least ZFS users probably take it for granted.
But it's not present in all the computing technologies.
Being able to dis-turn on compression.
We're using LZ4 compression by default,
which is very, very fast.
It doesn't give you the highest compression ratios,
but it means that you can just turn it on
and kind of not worry about it.
Like you turn it on and typically performance improves
because you don't need to read
and write as much data to your desk.
And then, you know, in more kind of complicated deployments,
people look for things like replication.
Like I have data that's on this machine.
I want to get to this other machine.
I could use rsync, and that would probably work just fine if I just need to take one copy of the data one time.
But if I need to, you know, continually move the changes over, rsync is very, very slow because it needs to check every file and every block of every file to see if it needs to
be sent. And then if you're using ZFS to begin with, obviously you want to preserve things like
the snapshots that you have on the source system, the compression that you have on the source system.
So ZFS has this built-in send and receive commands that let you serialize the contents of a snapshot,
send it over to another machine, and it preserves all the complicated stuff like ACLs,
access control lists that might be on files,
and other esoteric things that might not be used that often.
But when they are used, you don't want to have to worry about
is rsync preserving them correctly or whatever.
This is one area of ZFS I'm not taking advantage of right now.
I do some replication, basically backup,
because RAID is not your backup.
It's just being able to store more and do more with a volume,
not so much necessarily an actual backup.
And this is one area where I'm not taking advantage of a feature, really.
And it's just because I'm still getting into the ZFS world
where I find information.
I find that it's actually kind of hard to find all the information you can
of what you can do with the power of ZFS, for example. There's a lot out there, and I know it's actually kind of hard to find all the information you can of
what you can do with the power of ZFS. For example, there's a lot out there and I know it's probably
getting better, but 20 years later, I'm still kind of like, wow, it's, it's, there seems to be a lack,
like, or at least a vacuum, maybe not like somebody's doing a bad job, but more like there's
an opportunity, you know, somebody's out there doing more. And there's a couple of books out
that I've, I've picked up that I've liked a lot as well that really helped me school myself on what ZFS is and what it can do.
But replication is one particular area where I'm still using rsync.
I'm still using rsync and moving stuff over to a different store.
Now, granted, currently that separate store is not a ZFS store, so I couldn't do replication there. But when I do fully move over all my stores, I have a couple of different rate scenarios where I'm not using ZFS everywhere.
I'm only using in this one pool currently.
And mostly because I want to prove that it works well.
I can actually manage it.
And so to your credit and everyone else's credit involved in it, yeah, it's pretty user-friendly.
I can use ZFS pretty easily.
It's pretty easy to create a pool, pretty easy to manage a pool,
pretty easy to do scrubs and stuff like that
to like verify my data.
But that's the extent I'm doing.
And it's pretty set it and forget it.
Like I've pretty much gotten bored
because it doesn't require a lot of maintenance.
So good job on that part at least.
That's good.
Yeah, I would say don't feel bad
about not using all the features of ZFS.
What's there?
I want to use it, Matt.
You know, I want to, if I have another ZFS store, I'm not going all the features of ZFS. What's there? I want to use it, Matt.
You know, I want to, if I have another ZFS store,
I'm not going to be rsyncing.
I'm going to learn replication at that point.
I think that's great to learn that stuff. I would say that like, you know,
the ZFS enthusiasts who become evangelists
kind of talk up like all of these capabilities of ZFS.
And I think that they're all for sure,
like they're all useful in different scenarios,
but ZFS has a lot of capabilities.
It can do a lot of things.
That doesn't mean that you should do all of them in all deployments,
right?
Sure.
It has all those things so that it's flexible and can be used in a lot of
different scenarios.
And like being interoperable with like being able to run our sync to send it
to another machine is like,
that's just fine you know
that said i love zfs send and receive and it's it's really cool basically um so you're asking
where can you learn more about how to how to use this stuff sure yeah what resources are out there
yeah the the books um the books that i've been written about are pretty good the free bsd um mastery zfs is a good one
i think that there's a like a version two or like a advanced advanced mastery i forget what they
called it we'll look it up on amazon well the one i have is uh is free bsd mastery zfs okay and
that's alan jude and michael w lucas yeah and i think there's a second counterpart to that which
is like even deeper so it's like advanced or something like that. So I think those are both great resources.
Those are the ones that are coming to mind right now. I would for if you want to get an education,
I think that those books are the way to go if you but like online forums and stuff are also
very useful. It's just like it's more immediate and more personal.
But the quality of information you're getting is more variable.
I've gotten a lot of mine from YouTube, various blog posts, obviously Stack Overflow here and there.
The book I'd mentioned, I haven't gotten the advanced version of it yet because it's just not quite there yet, but it's been very helpful for me.
I've been taking my own notes on different commands I've run for establishing
a new zpool and stuff like that. We've gotten this far actually without even talking about
features, but not the specifics of breaking
down what ZFS does. So it is the open
source project, OpenZFS.
We haven't talked at all about its, you know, I guess...
The features and capabilities and like what do I type to do this?
Yeah, exactly. So like how do you create a Zpool, for example? How do you create,
what's the step? Like if I wanted to create, you know, a six drive scenario where maybe I'm,
you know, a home lab or I'm doing plaques, like what would I do to create a, a six drive? Would I do, you know,
RAID Z1, RAID Z2? And like, how do you choose which,
which RAID level to choose all that good stuff?
How do you even choose the number of drives? Obviously there's a cost factor,
but you know, I think there's a number of where like,
if you wanted to do RAID Z2, you should do in multiples of like,
I think it was like six, eight, 12, some sort of number where like you don't want to do seven, for example,
because it doesn't map out well. Help me understand that. Yeah. So actually I wrote a blog post about
this about, I wrote a blog post basically saying, don't worry about that number. Okay so uh in my opinion and experience um the the specific width of like how many drives
do you have like there aren't magic numbers there uh basically like the more drives you have the
you know the more performance you'll get the more space efficiency you'll get. And there aren't really like magic points on there
that are like more optimal than others.
Aside from some very, very specific scenarios
that don't apply to common cases, right?
Basically, like, you know, you're using a database,
it has a fixed record size,
you're for some reason not using compression,
then maybe there's some like more optimal things there. We wrote a blog post, if you search for some reason not using compression, then maybe there's some like more optimal things there.
I wrote a blog post.
If you search for like,
I think if you search for RAID Z,
you'll find it.
The title of it is like how I learned to,
how I learned to stop worrying and love RAID Z.
And it goes into excruciating detail about why this is true,
about why people think that you need this like power of two
plus n exactly and then why it is not really applicable and a lot of the reason is because
like you want to use compression and uh probably you're either you're using compression or you're
using very large files and very large block sizes because you have videos or something like that that's not compressible.
And if you have compressible data, then you should be using compression.
And then you end up with variable block sizes.
ZFS takes 128 kilobits of data.
It compresses it down to a multiple of whatever the sector size is. So like four kilobytes. So you might have like a 70 kilobyte, a file that's like a big file. And the first block of it compresses to 70
kilobytes. The next block compresses to seven kilobytes. The next one compresses to 104
kilobytes, right? That means that any kind of fancy math that you're trying to do to arrange
for things to be laid out just perfectly just isn't going to fly.
And then on the other extreme, like if you have large files, then they have large blocks and then everything kind of everything is easy and there's no need to worry about like getting things perfect
because when you have those large blocks, they can spread evenly over all the disks. So that's
one less thing to worry about, which is good in terms of your
deployment. I'd love to, I'd love just to lean on ZFS to be smart enough, you know, to not have to
worry about the number of disks. And I mean, that's kind of what you want to do, right? You
want to take as much mental overhead on planning out a new store, right? Exactly. You want to put
as much into the software to manage it than the person choosing the number of disks. You want to put as much into the software to manage it than the person choosing the number of disks.
You want to be able to choose things like, you know, reliable hardware, reliable operating systems,
reliable things that you can plan for.
Not so much, should I use seven or six disks?
Yeah.
And you see that reflected in like the user interface of ZFS where ZFS has,
there are a lot of properties you can change and things you can do with it.
But hopefully those things are all there for a reason.
They have a real impact on how, you know, on what you're trying to do.
And then there's a lot of the internals that, you know, you can do with like module parameters, you know, kernel module parameters that are like semi-documented, not supported.
It changes the internal workings.
You just shouldn't have to deal with that.
Like, you know, for the vast majority of people,
you should never need to think about that.
Of course, sometimes we fall short in there.
There are things where it's like, oh yeah,
you really do want to change that to Nobile in this scenario
to get really good performance
for this particular workload.
But the goal is that you don't need to do that.
The system gives you very good performance and semantics out of the box.
And then, you know, you can express your intent with the commands to set properties and stuff.
So getting back to your question of like, I have a home lab, I have some disks, what do I do?
Yeah, typically with the number of disks that you're talking about for that kind of
scenario, it's like, you know, four to 12 kind of disks. Probably you're going to create one
RAIDZ group and you're going to put all your disks in it and it's going to be either RAIDZ1 or RAIDZ2.
So there's not a lot of real decisions there to make. I think you're going to be running a command that's like
zpool create, give the pool some name, RAIDZ2, and then just list each of the six disks that you have.
Whether it's RAIDZ1 or RAIDZ2, it comes down to how much redundancy do you want. And with RAIDZ1, you know, it can tolerate one disk failing and not lose any data.
If you lose that second disk before you've replaced the first one, then you lose all the data.
With RAIDZ2, you can lose two disks without doing any replacements and you'll still have your data. So, you know, in like industrial deployments,
the consideration is really like, how long does it take to replace a drive before you get back
to full redundancy? And, you know, people are configuring spares and timing, like how long
does it take to do the re-silver and all that kind of stuff.
For small home deployments, people probably aren't doing that.
They're probably not configuring spares.
The time to do the replacement is however long it takes me to order the disk online and get it shipped to me or whatever.
That's the long pull.
So I say as a rule of thumb, if you have a bunch of drives,
RAIDZ2 is going to give you more redundancy
than you're probably ever going to need.
If you want to live a little bit dangerously and risk like,
hey, if two drives fail in the same week,
then I'm going to have to go back to my backups,
then RAID Z1 will save you a little money.
It saves you the cost of that one drive.
Which can get very expensive, honestly.
If you're doing, you know, especially a Plex server, for example,
if you want a lot of storage, I would say performance,
not necessarily that you need all the size,
that you need the performance between the disks,
where you have like maybe a 10 terabyte is a common size
for an individual disk for a nas for say a plex server
or anywhere you want something with decent performance you know maybe six maybe eight but
the the eight to ten anything above that tends to be higher performance drives in terms of
in terms of spin and you know movement of data and whatnot throughput on the actual disk itself but
that's a pretty large disk so if you've got
four of those that's what 40 terabytes it's a lot of storage right if you got six of those let's just
do some math matt that's that's six terabyte i mean that's 60 terabytes it's a lot but in ray
z2 scenario you've got like you got reservation then you got i don't know how you say the other
word but it's like rev reversation i don't know it's like oh yeah what's the other word what is
the what is that stuff when you get into the semantics of, you know, how you,
you know, plan for overhead in these, in these scenarios?
Yeah. So now we're talking about like, you, you've created your pool, you have a bunch of
different kinds of data on there, right? Maybe you have like your video files and you have your
home directory and you have like your movies and you have a cache of other stuff how do i like manage that so typically people would create like
different file systems for each kind of use case um and so in zfs when we're talking about a zfs
file system the storage pool is like all of the disks that zFS is managing. And then you can create file systems on the fly.
They aren't assigned any particular space on the drives.
They just consume and release storage as needed.
So creating these ZFS file systems,
what's inside the storage pool is very cheap and easy.
And we use those file systems primarily
as like administrative control points.
So you could say like, here's all my video files.
Don't bother trying to compress them because they're already compressed.
On the other hand, here's all my source code files for my development project.
Let's compress those.
So all the different ZFS settings and properties are per file system,
so you can set them differently for
different types of data that you have. Now you're asking about reservations and we call them
ref reservations because it stands for referenced reservations. So you might want to think about
I have some space that's for one use and I don't want that to, I want that use to never exceed some amount.
And I want this other use to have some reserved space. I want to always have some space available
for, you know, my, my software development project, but like maybe the movies or like,
you know, my kids are dropping their movies into some NFS share, probably not really an NFS share.
It's probably like a, you know, SIFS or some, you know,
fancy Dropbox thing on top of it.
But, you know, my kids are dropping their movies
into some other section of this.
I'm going to put a quote on that
so they can't use more than like five terabytes or whatever.
So quotas and reservations let you do that.
And that's at the ZFS create layer,
not the Zpool layer.
Yeah, so that's at the ZFS layer.
The file system.
Yeah. So you have the pool. It has, you know, your 60 terabytes, create layer not the zpool layer yeah so that's the z at the zfs layer the file system yeah yeah
so you have the pool it has you know your 60 terabytes um but each file system uses a variable
amount just depending on what's in there at the moment uh and the reservations and quotas let you
control that so specifically the reservation says um like this file system and all the stuff associated with it.
So all the snapshots and all the descendant file systems.
So the file systems can be arranged hierarchically where the children like
inherit property settings from the parents.
So you could have like,
you know,
maybe it's not,
maybe you want to limit the kids movies,
but each kid has their own like a directory.
And so you make that directory a file system. So know there's one kid's file system the other kid's file systems then there's
the parent file system that's like all the movies you can set quotas at any of those levels you can
set a quota at the all movies file system and then that that limits the space used by all the file
systems beneath it put together so the ref reservation is talking about,
I want a set of reservation that is for the space.
It's for like what I as a user ignoring like compression
and snapshots and other stuff.
What I can see there, how much space,
like I want to reserve space that's just for that.
And the idea here is like,
the system administrator configured some snapshots
and those snapshots take up some space,
but I want to reserve space that ignores those snapshots.
So the ref reservation is actually more expensive
in terms of like it reserves more space
because ignoring snapshots, I'm like, well, I already have like a terabyte of space in here.
I have a two terabyte quota, which means like I can write a terabyte of new stuff
and I could like maybe I could delete that original terabyte of stuff and replace it with
other things. So I can actually write two terabytes of data here if I delete what's already there.
So that means that the ref reservation has to make sure that there's actually two terabytes of data here if I delete what's already there. So that means that the ref
reservation has to make sure that there's actually two terabytes available if there's a snapshot of
that original one terabyte. So it needs more space, but it's kind of taking a different view.
Like if you're thinking as the system administrator and thinking about the cost of the snapshots, then you would use the reservation. If you're thinking as
the end user and you want to ignore what snapshots are there or aren't
there, then you would want to use the ref or referenced reservation.
So that you can know how much storage you have available, right? To get an accurate
depiction of what you have. And if you're the administrator, you want to have a different view of the world. If you're a user,
you want to have an obviously micro view of the world.
Because you don't really care about the snapshots and stuff like that.
Fluff to you. It doesn't matter. That's the administrator's job. And you might be that same person.
In my case, I'm the same person. I'm the user and the administrator.
Yeah, if it's the same person, then it's a little easier to understand
what's going on. But when there's multiple people involved, then these concepts are, you know, the detail and richness of these properties are useful.
One thing I haven't heard you talk about yet, which I feel like, as somebody who hasn't designed the system, hasn't been involved for 20 years and is just a user user, is the killer feature, in my opinion, is copy on write.
Is this ability to be a secure file system because you want data integrity,
you want to verify that you have things you want to automatically repair, and it's this
copy on write that really sets ZFS apart from
every other file system I can think of is that copy on write. Can you talk about that a bit?
Yeah. I mean, it's so fundamental to ZFS that we probably forget about it.
And you kind of have for 40 minutes.
Yeah.
And it's not a feature, right?
It's not a user-visible thing.
But it's a kind of enabling data structure that allows us to have zero-cost snapshots
and to be able to have always, like always the data is always self-consistent.
If you crash or pull the plug at any time, you don't have to run FSDK.
What you have on disk is always a consistent view of the file system.
Yeah. So that was one of the decisions we made, you know, very, very early on.
Before we wrote a line of code, I think we decided that ZFS would be copy on write. What made that come to light? Like,
since it's so fundamental, it goes that far back. Why did no one else copy this feature? Or how
often is it used elsewhere? Why? Why is it so fundamental? How did you get there? I think that
like a lot of the features, first I'll answer like, why aren't other people doing it? And then I'll talk about who is doing it. So why aren't other people doing it? Well, like a lot of features in ZFS, there's a cost to it. There's a, in terms of the, a runtime cost, right? Like it can make things faster in a lot of scenarios but like checksums
which are you know on by default in zfs we we viewed it as like this is a fundamental enabling
thing that everybody should be using and if if you're like really really have some hyper specific
use case where you can't pay the cpu cost of, okay, I guess we'll let you turn it off.
Or maybe this just isn't the file system for you.
I don't know.
I mean, we kind of took that view with a lot of things in ZFS,
including copy on rate.
But that's why it is not or hasn't been used more widely
is that like it's complicated.
And copy on rate makes performance different,
at least not necessarily worse, but different.
Now, as to how did we decide to use it and who else is using it?
Well, at the time, I think the other major use of it was in Waffle,
which is NetApp's proprietary file system.
And I think they had used it to great success,
especially with like enabling snapshots.
A bunch of the kind of details
of how they implemented the snapshots are different than how we keep track of them. But the fundamental
idea of like, we're always going to be writing new data to new places on disk, we're not going to be
overwriting existing data in place. We saw that as like, snapshots are going to be just a base
requirement in the future and doing it any
other way than copy on right is just it's not scalable and uh you see that even today like
there's other there are snapshots in things like ufs but it's like you get one snapshot or like
you can have snapshots but when you create the snapshot it takes like minutes and minutes and minutes to go like copy a bunch of metadata or whatever.
And we wanted it to be easy and cheap.
We wanted to give people like no excuse for not protecting their data with snapshots.
Makes sense.
The other thing that we saw, which is more from direct like pain experience, is that with earlier file systems, you know, if you crash, you had your own FSCK.
And the bigger your disks got, the bigger your file system got you crash, you had to run FSDK.
And the bigger your disks got,
the bigger your file system got,
the longer it took to run FSDK.
And a bunch of file systems added things like to kind of reduce that time on big file systems
so you didn't have to scan every data structure.
You only had to like, you know,
you knew that it was only these certain ones
that needed to be scanned,
but still we saw the trend of like hard disks getting bigger and bigger and
bigger.
And even 20 years ago,
it was getting to be taken unreasonably long time to run FFCK on that server
that we were administrating in the Sun kernel group,
Jurassic.
So even on Jurassic,
which was,
I mean,
it had a lot,
bunch of disks.
I'm sure that it was a tiny amount of storage compared to today's storage
systems,
but you know, it took like an hour to run fsck every time the system rebooted
at least every time it um crashed yeah so you know we thought we saw that as totally unacceptable and
rather than like make incremental improvements on fsck we wanted to design the problem out of
existence so that we didn't have to worry about it as disk sizes increased.
And I think that in retrospect, that was definitely the right way to go
because storage is just so huge nowadays.
The idea is that for copy and write is that,
and I suppose because we do have much larger disks now,
there's more opportunity to always write new versus copy
or write over old, right? In an opposite scenario, you always have a lot of disk space available,
or at least in large pools. Sure, always is, you know, in quotes, it's not always true.
But, you know, the idea is that you can always write new. And that makes snapshotting easier
because you can just point to the newly written file versus the overwritten file. And if ever you needed to revert, you can point back to the old file that was not overwritten, which is what makes your snapshots so much faster and so much easier.
Yeah.
You can promote a snapshot to primary and kill the old thing altogether.
Like there's a lot of interesting things you can do.
Very much similar to the way Git operates even, right? Like it's very similar to the way Git operates with master and branch or, you know,
your main branch and different branches and stuff.
It's very similar to that,
at least in terms of how you fork the data.
Yeah, definitely.
Like forks are just like ZFS clones
in terms of they start the same as some base,
but then you can diverge them
to put your changes into each one.
And then you can go back to how it was before
and you can create lots of ZFS clones easily
or lots of branches easily, I should say, about Git.
Yeah.
What about the real feature, I think,
is that people have been, especially home labbers,
I'm not sure you can speak to the enterprise customer
needing this, but I think it's more apparent in home labbers
because you want to start small
because you tend to have less money invested in disks
and you want to eventually expand.
But once you establish a pool and you create some file systems,
I'm not even sure it's landed yet.
I know it's a pull request out there,
which is raise the expansion, being able to expand.
I could just imagine, I was watching your talk on this,
I could just imagine the amount of mental overhead,
the spaghetti in your head, thinking about how to explain.
So I'm not going to ask you to necessarily explain raise the expansion,
except for as you might need to.
You don't need to point out the details,
because you needed a screen for that, you needed to demonstrate it.
This is a visual thing for sure, but this is a feature i know that's been long awaited
yeah you know being able to establish a arrayed array and be able to expand it from six drives to
eight drives and with no foreseeable necessary penalty basically yeah less pain now it's possible
but it's a it's a pr from what I understand. What's the
details on that? This kind of brings us back to what we're talking about. First thing about
this being a Winstress project and how do we serve the home users? So this is a feature that has been
requested for years and years and years and years. Yeah, for sure. But enterprise users don't have
this problem because it's like we buy the disks by the shelf full.
You just add a new shelf and create a new RAID-Z group from the stuff in that shelf. Right.
Or you just buy a new rack. You know, it's like a whole new rack and a new system.
And then that's that's what you're going to use for the next 10 years.
Hopefully not that long. But, you know, the life cycles in enterprise are very long.
But, you know, for home and small users, it makes a lot
of sense to say, look, I started out with four disks. I mean, these disks are not cheap when
it's coming out of my own pocketbook. You know, you're talking about like, you know, laying out
$1,000 or something, and then I don't want to have to lay out, you know, $2,000 or like $1,500.
I care about the extra, every extra $100. So sizing it for just what you need initially makes sense.
And then, you know, a couple of years down the road, your storage needs grow.
Add one more disk or two more disks without having to, without having to like buy a whole new batch of disks.
Like move it to, you know, get your friend to bring their system over so you can copy it over there and then reformat it and then copy it back.
What a pain.
People don't want to do that.
It's no fun.
No.
So basically this project is about doing all that complexity for you under the hood.
You add a new drive.
We have to move all the data around to spread it out over all the drives, including that new one.
But it all happens, the drives, including that new one. But it all happens automatically.
You just type zpool attach, blah, blah, blah, blah, blah,
hit return.
It says, great, the expansion is in progress.
It'll be done in 20 hours or whatever
when we've copied all the data around.
But the cool thing about, the interesting thing
about this project is how did it come to be?
So long requested feature, how did it come to be? So, you know, long requested feature.
How did it get funded?
So actually, it's funded by the FreeBSD Foundation.
So the FreeBSD Foundation is like a nonprofit.
They help to run the FreeBSD project, but they don't run it.
It's run by volunteers, but foundation um uh helps with like administrative
stuff and uh one of the things that they do is they fund uh software development so they contacted
me a long time ago i can imagine i think three or four years ago actually it's got to be more than
that because i remember working on this uh when my second child was a baby. So, you know,
for at least four years ago now, and they said, look, we have this idea. We want to do something
for ZFS users that something that is not going to be done, that's going to help, that's something
that's going to help the small users that isn't getting done by big, by, you know, the contributors
today who mostly come from, well,
I should say from the people who are developing new features,
there's lots of contributors that, that are, you know,
maybe not even C developers, right. That are contributing lots, you know,
new tests, man page changes, all kinds of cool stuff.
But the new features are primarily coming from these enterprise use cases who
are funding developers.
How do we get something developed that's going to help every user?
And so they came to me with this idea of doing RAIDZ expansion.
And I kind of came up with a design of how it could be done,
proposed it to them, and they said, yeah, let's do it.
I gave them the timeline.
I said it would be done in a year. And four years
later, it's almost done. That's awesome.
This is primarily because of constraints on my time and just
not being able to spend as much time as I thought I would on the project.
Do they come to you personally or do they come to you through your employer?
Because your time on ZFS and OpenZFS is probably pretty divided in terms of how you personally spend it, right?
They came to me personally because they know me from speaking at the FreeBSD conferences and stuff like that.
Fortunately, I was able to arrange it so that the consulting work that I do is actually through my employer, Delphix.
So it makes it easy for me to, you know, not have to like software that's for Delphix and to be a leader in the community.
So I'm fortunate that Delphix values open source and they've seen the value of being a leader in this open source community in terms of our brand within engineers and being able to do recruiting.
You know, our team has recruited a fair number of employees from the OpenZFS development
community.
So, you know, the time that I spend like reviewing pull requests on OpenZFS is on the clock time,
right?
I mean, that's my company is paying me to do that, which is pretty great. I don't have to do it only on my nights and weekends.
You're living the dream.
Yeah. So it's worked out very well for me and I'm definitely very fortunate to be in that situation. What's up, friends?
I want to tell you about one of our new partners for 2022, MongoDB, the makers of MongoDB Atlas,
the multi-cloud application data platform.
MongoDB Atlas provides an integrated suite of data services centered around a cloud database
designed to accelerate and simplify how you build with data.
Ditch the columns, the rows once and for all, and switch to the database loved by millions
of developers for its intuitive document data model and query API that maps to how you think
and code.
When you're ready to launch, Atlas automatically layers on production-grade resilience, performance,
and security features so you can confidently scale your app from sandbox to customer-facing
application.
As a truly multi-cloud database, Atlas enables you to deploy your data across multiple regions
on AWS, Azure, and Google Cloud simultaneously.
You heard that right.
You can distribute your data across multiple cloud providers at the same time with a click
of a button.
All you got to do is try Atlas today for free.
They have a free forever tier, so you can prove yourself and your team.
The platform has everything you need.
Head to mongodb.com slash Atlas.
Again, mongodb.com slash atlas. Again, mongodb.com
slash atlas.
What's up, friends?
This episode is brought to you
by our friends at Retool,
the low-code platform
for developers
to build internal tools.
Some of the best teams
out there trust Retool.
Brex,
Coinbase,
Plaid,
DoorDash,
Legal Genius,
Amazon,
Allbirds,
Peloton,
and so many more.
The developers at these teams
trust Retool
as a platform to build their internal tools,
and that means you can too.
It's free to try, so head to retool.com slash changelog.
Again, retool.com slash changelog. so four years later though raids the expansion four years later raids the expansion the pr has
been opened yep it's not landed in terms of it hasn't landed yet it's a 3-0 is a plan for 3-0
it's it is hoped for 3-0 okay i would be cautious about planning about using the word plan because
we don't uh open zfs doesn't have developers like on retainer right like we're not paying anybody
to develop anything it's all volunteer volunteer. So I don't,
I, we can't speak too strongly about plans. Like we can do a release, but we can't make anybody,
we can't make anything get in, right? It takes a lot of people to get it in. I've done, you know,
most of what I need to do as the developer to get the PR ready. But other people, it takes a lot of contributions from different people.
So we need people to do code review is the big one.
And there's a lot of code there, a lot of very tricky code.
So we need other experienced developers to do code review.
We also need other people that might just be users to do testing, help give confidence
to other folks in the community
that this is going to work right and not break their pools.
And I guess that's just as easy as, right,
if I was an end user to test that,
I could use Send to send all my data to a new pool.
Obviously, I have to invest in the hardware and the drives and stuff
and replicate essentially my scenario.
But I can use my existing production data copy of it essentially in a Newsy
pool,
exact same scenario and,
you know,
do an expansion on that pool.
Yeah.
So that could be a way that an end user could help,
but it's a matter of getting access to that,
that future branch and being able to compile it and put on their machine,
which probably takes a lot of effort.
Yeah.
It takes a little bit of doing to know how to compile and install because it is like
it's a kernel module.
There's it's a little more complicated than just like downloading your normal thing.
It should just work.
There's like automake and autoconfig.
All that stuff is there.
So, you know, the steps are like type configure and then type make install.
However, you know, depending on the particulars of the system, getting it installed correctly in a way that it gets picked up.
You know, for example, like in preference to the kernel modules that may already be there.
If you have like an Ubuntu system that comes with ZFS kernel modules already, there can be some tricks there.
But this is a well-trod road. I mean, there's hundreds and hundreds of contributors
who have gone through these steps
on all the different operating systems.
So if you would like to help,
it might not be a one-liner,
but there are a lot of people that can help you.
What's the best place to go to get that help then?
Would you say like the repository
or would you say like an issue or the mailing list?
What's a good place to like say,
hey, willing participant,
I'll help test this at least as an end user.
If you're looking to volunteer on something specific,
like I want to help test RAID's expansion,
then probably comment on the PR would be the right place.
If you're looking for like,
I'm trying to compile this so that I can help somebody else,
how do I get it installed?
Then the mailing list would be a great place to ask.
Okay, cool.
Before we move on to another topic,
is there anything else in, say, the feature set of ZFS
that makes people like it's,
we talked about copy on write being a killer feature.
You didn't really mention it
because it's such a baked in 20-year feature.
It's more like just, it's just the system.
That's just how it is.
It's not even a feature these days.
Anything else? I mean, I think I like the ideas
around the Z Intent Log,
I believe is what it's called, and the
LARC, which I believe is a cache.
Yeah, the L2ARC cache. The Layer 2 cache,
I believe. Those are a couple things that can sort of speed up
systems. That's one thing I'm actually taking advantage of.
I have an SSD
as my cache, which was
a one-liner.
Install the hardware one-liner to add it.
It's like you make it so boring to manage the system.
Come on, Matt, make it harder, right?
It's super easy to manage the ZFS system.
I think that's great that that's your experience.
I mean, that's absolutely our goal
is to make it easy to manage.
So I would say the killer features
are the ones that we've talked about. RAIDZ, compression, checksums, snapshots, and clones, and replication. And those
are things that have been in ZFS for a long time. Obviously, we've refined over the years
and added other new stuff, but those fundamentals are what sell it for 99% of the users.
I want to dovetail a little bit to back to the past to some degree.
There's a ZDNet article that quotes Linus Torvalds as saying, don't use ZFS.
I'm sure you've seen this and I'm sure you've read it.
And I think it's really around licensing.
He says, don't use ZFS.
It's that simple.
It was always more of a buzzword than anything else, I feel.
And the licensing issues just make it a non-starter for me. So I kind of want to go back into the past to some degree back to the sundays before oracle acquired it were you involved in licensing it you know
share some of the drama i suppose behind the scenes that kind of made open zfs possible
because it was really close to not being possible with the acquisition. Like thankfully Sun and potentially even you and Jeff were
contributors to the idea to use the common development and distribution license 1.0,
which is an OSI approved license, which does sort of, it definitely makes it open source by being
OSI approved in that sort of thin layer of like, yes, it's open source or not. No,
it's not open source, but this ability to sort of keep it going after
this acquisition, that's kind of what I want to talk about. So I mentioned Linus saying don't
use ZFS. What's the backstory there? Yeah. So the interesting thing is that,
you know, when we started working on ZFS back in 2001, you know, it was part of Solaris.
Solaris was proprietary software. It was, I think at the time only available on,
you know, with Sun's Spark-based hardware. Solaris X86 kind of came and went a couple of times.
So maybe it was available on X86 hardware at some point. But, you know, as far as we knew,
when we started out, we were developing proprietary software. But a couple of years into it, I think, I'm going to say maybe 2003, they started working
on OpenSolaris and meaning like working on open sourcing Solaris and creating OpenSolaris.
And I wasn't involved with those decisions or the licensing decisions.
You know, I was like a junior engineer two years out of college.
Nobody asked me.
You didn't need to know about this.
Yeah.
So when we, well, when I found out about it, I was thrilled.
You know, I thought, oh, this is great.
Like we're going to do an open source Solaris.
Like ZFS is going to be part of it.
We're going to open source it.
This is wonderful.
We definitely didn't imagine how successful it would be outside of Sun at the time or how enabling that would be
for our technology and in my career to continue for so long. So we kind of lucked into that.
It lucked into it being open sourced. It was open sourced. We released it as open source first. So it became, when we integrated
it into the Solaris
code base in October
of 2015,
sorry, in October of 2005,
a long time ago,
it went out as open
source the next week, and before we'd
ever shipped it in a
product.
That was really cool. People started using it, picking it up from the open slurs by weekly builds.
So then from 2005 to 2008 or 2009, we were developing it in the open.
It was picked up by, I think maybe FreeBSD was the first other operating system to take
the code and port it to FreeBSD.
It became very successful
there. And then towards the end of that, picked up by the folks at Livermore,
Lawrence Livermore National Labs to port it to Linux as well. In terms of the, maybe I should
talk about the licensing a little bit now. So the CDDL, as you mentioned, is an open source license. It was created by Sun to open source Solaris
and create open Solaris, which ZFS is part of.
I can't really speak to the motivations
of why that license,
why didn't they use an existing one?
Why did they come up with any of the particular terms in the CDDL?
My,
but my,
my understanding of the intent is that it's called a,
like a weak copy left type of license,
which means that the changes that you make to ZFS or other CDDL license
software,
like if you make changes to our software and you ship those changes,
then you need to make your modifications available.
So that's kind of similar to the GPL
as opposed to more permissive licenses
like the BSD or Apache licenses,
which are basically like,
here's some software.
You can do whatever you want with it.
You can contribute changes back if you want.
You don't have to contribute back the changes.
So it's kind of more similar to the GPL in philosophy
in terms of like,
you need to contribute back the changes that you make.
The main difference that I see with the GPL versus CDDL
is that the CDDL explicitly applies on a per file basis.
So like if you wanted to do something with ZFS and add some new functionality and not release
that new functionality as open source, you could put it all into a new file, compile it with the
rest of the ZFS code. You know, maybe you have some changes into the existing ZFS files that you do have to
open source, but you could do that and not open source your new file, your new source
file and keep your new feature private in that way if you wanted to.
Versus the GPL, like it's not as explicit about what you know constitutes uh change that needs to be open
sourced and people generally interpret it much more like broadly that like anything kind of in
the vicinity you got to open source it and you know you got a gpl as to if your if your code is
near our code then your code has to also be gpl is kind of how people interpret it and i'm
deliberately being vague about what you know, what does near mean?
Cause,
cause there's,
there's,
you know,
dissenting opinions about that.
So Linus's comments,
I think that as,
as you quote,
as you kind of heard in the quote,
I think that Linus has no love for Oracle.
And I think that,
you know,
he's concerned,
or at least at the time that he wrote that,
he was concerned about- Litigious Larry, he calls them as. Litigious Larry is Oracle.
So I think that the reason that he was saying don't use ZFS was to avoid Larry suing you,
sort of, is how I interpreted it. And, you know, I'm not a lawyer.
I'm not giving anybody legal advice,
but nobody has been sued for using ZFS since the NetApp lawsuit,
which was a NetApp Sun lawsuit more than 10 years ago.
And nothing came of that lawsuit.
So, you know, nobody won or lost in that lawsuit.
Everybody just dropped it.
The reason why I bring this up is less to be, like, be provocative like, oh, Linus says don't use ZFS, but more around this unavoidable tension between developer, you, and everyone else involved in the creation of ZFS and then eventually being open sourced through this license.
And then the world-changing opportunity of software and then the license that sort of stands between that opportunity because I quoted linus saying that but i didn't quote him as saying which i'll do now
as saying i i can't integrate this because i want it seemed as though i'm i'm t leaving here like
between the lines but it seemed as though he wanted to integrate zfs into the linux kernel
but was unable to do so because of essentially the license,
the GNU license that Linux stands upon and the difference with the CDDL license that ZFS was
licensed as, as OpenSolaris. And then I'm sure there's some details in there that
made OpenZFS possible, which is super awesome because despite this acquisition, this accidental
to some degree, open sourcing of ZFS, it gets to live on and you get to have a career beyond
this proprietary software you were originally hired to build, which I think is super wild in
terms of a journey for a software developer like you. And then a community to appreciate and enjoy and use your work.
Like if you wrote your best software and no one can use it,
did you write the software?
You know what I mean?
Kind of like the tree, did it fall, did it make a sound kind of thing.
It's almost like that.
Yeah, I agree.
If no one can adopt your software and enjoy it, did you write the software?
Kind of no, really, right?
Yeah, I mean, that's one of the reasons
that I really love open source is that it makes the software available. It makes it available
without the constraints of like any one company living or dying or deciding to do whatever.
If it's good software and it's useful, then people can continue using it and extending it and
making it continue to be
relevant. Like the fact that we could take like the ZFS that Sun was doing in 2009 or 10 and take
that and run with it as part of the Illumos project and part of the OpenZFS project. I mean,
that's open source. That's what it's supposed to be. There wasn't really anything special that
let us do that. Like the fact that it was an open source license let us do that.
I think that it wasn't a given that people would actually pick it up and like continue the software development.
So that's one of the reasons that the Lumosify and provide some kind of leadership
around ZFS development that was happening on Lumos and FreeBSD
and Linux altogether.
And that happened in 2013, right?
Like OpenZFS began in 2013.
Yeah.
The original project, proprietary way back,
before it was even open source licensed, was 2001.
So you got, what, 12 years between inception of the project,
several years before the common development distribution license
was instilled, right, when you did the open Solaris part of that.
I mean, if that didn't happen, like, I don't know who did that inside of Sun,
but like, if that didn't happen, then ZFS, as we know, it would have died.
Yeah.
You know, and it would still be, it would be in Oracle now
because Oracle still is developing Solaris, right?
It would be the closed source ZFS,
which is continuous.
You get a fork in the road.
This is history that I'm sort of sharing
with the listeners.
Like there's a fork in this road of ZFS,
which is one that ended
and sort of bifurcated, right?
You got the open ZFS version
that began in 2013 or whatever timeframe.
That was maybe the 2009 snapshot of the project.
And then there's the closed sourced Oracle version still yet that is ZFS inside of Oracle, which I guess is just called Oracle ZFS.
Yeah, so Oracle continued developing ZFS internally and just not sharing that source code with anyone.
And that's fine.
And the open source community picked up the open source code and we've continued developing that. And people maybe
have asked, like, you know, which one is better? That was my next question. Which is better, Matt?
That's really an academic question because nobody's really baking off like open source ZFS
on Linux versus Oracle ZFS.
The target audiences of these are just very different.
The target audience of the Oracle ZFS is probably people that have been locked in by Oracle.
It's not about which one is better, it's just like, can I escape the clutches or not?
Well, the good thing is that you are continuing development.
We've just speculated about what will be in 3.0.
Some other things that I think are interesting in the maybe category of OpenZFS 3.0,
one, RAID-Z Expand we talked about.
A couple that hit my radar, which is ZFS on Object Store,
which I saw a talk on that from a recent conference, which I thought was pretty cool,
which is like ZFS in the cloud essentially,
which I think is just really interesting
to think about like different clouds
being different, you know,
in the V devs and whatnot.
I run Mac OS primarily as my primary machine.
So I'm excited about the opportunities
of Mac OS support in the future.
But that's, I mean,
I'm sure there's other cool stuff in there,
but that's what hit my radar in terms of like, can't wait, looking forward to 3.0. But that's, I mean, I'm sure there's other cool stuff in there, but that's what hit my
radar in terms of like, can't wait, looking forward to 3.0.
So that's the good thing, though, is that it was open sourced.
You and others are continuing to develop it.
And there's a community behind this.
You got the conference that happens each year, books being written, blog posts.
There's still a lot of momentum behind this project, obviously.
Yeah, we have a ton of people contributing every year.
We have our annual conference.
We have monthly video calls where we're talking about new features
and kind of getting design reviews and making sure bugs are being addressed.
So the community is very active.
If folks would like to participate, you can find info on OpenZFS.org.
We have links to all the videos from past conferences
and how to join our Zoom meetings monthly.org. We have links to all the videos from past conferences and how to join our Zoom meetings
monthly. Cool. Well, Matt, is there anything else that I left unchecked in terms of talking about
your career trajectory? Maybe the only open question mark I really have that you can touch on if you'd like is is how you how you negotiated with delphix to be able to
contract on top of the open source like i think you know developers the reason why i ask that
question is less like are you an amazing negotiator probably but more so like if there's other devs
out there who are thinking like i want to keep contributing to open source how do i how do i
negotiate with my employer how do i obviously delphix appreciates and embraces open source? How do I negotiate with my employer? How do I, obviously Delphix appreciates and embraces open source, so maybe developers
are already at a place like that. But if they're at a place where they embrace open source, what are some things
they can do to do things you've done to be able to buffer in the
give back and the impact beyond just simply their daily nine to five
at their job? Yeah, I think that the consulting per se, like getting paid
for work is kind of a special
case. I would probably focus on like, how can you contribute to open source, like as part of your
job. And I think that there, it's mainly about like, making sure that your employer understands
what they're getting out of it, right? Everybody wants to know what's in it for them, developers
as well as employers. In my experience,
Delphix has been involved with ZFS and OpenZFS for a long time,
10 years or so, and it's a fundamental technology for our product.
So the benefits are like... Super clear. Yeah.
First of all, like we're using
this and we want to make it better and we want to make it better in the best way. We want to get the
contributions from the community and we want to be able to have other people from the community,
like testing and validating the changes that we're making. So that's just on a very like
low level. Like we want our code to work. We want our code to be the best it can be.
And to do that, like we want to get these contributions from other people.
In order to get the contributions from other people easily,
we need to upstream our changes so that we don't have merge conflicts all the time.
And we want our changes to be validated and checked and tested by the community.
So that's a very low level,, like quantifiable benefit to the company.
The next level of benefit is like the corporate branding almost.
Like it makes the company look good when people in the community see, oh, Delphix is contributing to OpenZFS.
Oh, Delphix is leading OpenZFS.
Delphix is helping to organize this conference about OpenZFS.
It creates mindshare around Delphix is a cool place.
Delphix is a cool company.
Even if I don't know anything about their actual product of database virtualization and masking and whatnot,
I know that they're doing this cool open source work and that makes them seem cool.
So for our case, our customers, the people that
we're trying to sell to are generally not like software developers. So it doesn't go directly
to our bottom line, but you know, there's a lot of other things that companies do besides just like
exchange services, goods and services for dollars, right? Like recruiting, like recruiting. So
almost, I would say more than half of the team that i work on of about 10 people
is people that have joined us from the people that i knew from the open source community and
a lot of it was like serendipitous encounters where it's like i was asking one person like hey
like we're looking to hire do you know of anybody and then somebody else happened to overhear
that and be like hey uh are you looking to hire, do you know of anybody? And then somebody else happened to overhear that and be like, hey, are you looking to hire? Because I'm interested.
So it's not, in terms of the
branding and reputation and whatnot, it's a lot
harder to pin directly on it. I think you're going to have
to find somebody within the company that kind of believes in that because it's less
quantifiable. But at least in my experience, the benefits have turned out to be very real in terms of the reputation and, you know, within the software engineering community.
What about you? How are you feeling about where you're at with your career and what you're working on?
Any closing thoughts on like, are you winded with ZFS?
I mean, 20 plus years so far with, I mean, I'm just going to imagine like you eat, sleep
and breathe, you know, work-wise ZFS to some degree.
Like, are you burnt on it?
Are you done with it?
Are you more motivated than ever?
To be honest, like the ZFS is, it's getting to be older, right?
I mean, 20 years is a long time, even within enterprise software.
And I think that it can be a challenge to remain relevant as things change within the industry.
With things like, you know, first we had the challenges of SSDs with very different performance characteristics.
Then with virtualization, changing kind of where the storage hardware fits into the stack. And now with the cloud, even more so the separation between the storage
hardware and the actual use of it. So I think it could be a little discouraging. But to me,
you know, the project that we're working on now with ZFS on object storage has just been like incredibly fun. And I feel like we're, we're taking ZFS to the next,
like we're giving us some more legs that'll, that'll keep it relevant for another decade.
And it's not necessarily like, it isn't something that's going to be used by every ZFS user today,
but it's going to enable a lot more ZFS users in the future by making ZFS integrate even better into the cloud and bring those capabilities of snapshots, compression, all that stuff to object storage and good performance object storage.
And I've really I've been having a blast the past year with the team developing that and designing it.
A lot of the code is actually in user land in writing and rust.
So we all like learned rust,
which is really exciting.
It makes me never want to touch C again,
even though it is my job to do so.
So we're going to do it.
But,
uh,
you know,
I rusted just feels so like,
it feels so comforting now that I've learned it,
the,
the,
the safety of it feels very comforting and,
and it makes the dealing with raw pointers in C
everywhere feel scary as it should be. I would say like it should feel scary. It is hard. Like,
you know, you got to get everything just right with C in order to not have bad bugs. That's,
that's how it's more work, but that's fun work too. I see ZFS continuing to be relevant because we're adding these new use cases to it.
And I find that really exciting.
On the Rust note, what made the team choose Rust?
Was it because it's on the network?
Yeah, so first we chose UserLand before Rust.
So we need ZFS to talk to the object store.
So it needs to talk like HTTP and HTML and JSON
and all this stuff.
And we did not want to do all that in the kernel.
So we decided, okay, we'll have some user land process
that the kernel is going to talk to you to say,
like get this block, read this block, write that block.
And then this user land process is going to deal with like
turning it into S3 requests.
And once we had done that, then we thought, well, you know, C is like not the,
there are languages that are higher level than C that could make our job easier.
And so we looked around at like what the options were there.
And Rust seemed like, like I didn't do a comprehensive survey of every possible language.
But because Rust seems so similar to C in terms of like, it's a low level language, you know, there aren't scary things like garbage collection.
Java may have been another choice, especially given that like the rest of Delphix's software is written in Java. So like in-house, we have a bunch of Java developers, but the performance aspects of it,
we felt more confident
that we would be able to get
all of the performance
out of the hardware
with a low level language like Rust.
And then having the, you know,
the ecosystem of all of the Rust crates
would let us develop it faster.
And then, you know,
the safety of, you know,
not having like memory corruption
would also let
us develop it faster because we wouldn't have as many crazy bugs to debug interesting well cool
i'm sure we can probably like do a whole entire separate segment that's deeper that than than
that answer there on rust because that's always interesting to be like you develop most of this
in c so or all of it in c so why would would you choose Rust in the use land part of that?
I'm always curious about those questions.
Yeah, I mean, C would have been the kind of natural choice.
I'm sure that there's libraries that we could have found for C
to do all of the network communication, JSON stuff.
But I feel really happy about the choice to use Rust.
Good.
Anything else?
Anything left unsaid? This is the closing.
So any advice for
those who are going to pursue a land
that has ZFS
all over it for them? Maybe they
got some spare drives they want to play with. They got a
home lab. They got that
Plex server that's still clunking on an old
Mac mini or something like that. They want to move it to a Linux
box with ZFS. You know, whatever.
What kind of advice you got?
Closing thoughts.
I would say just go for it.
I mean, if you like tinkering,
then I would just like get your,
get some install of Ubuntu
or another OS that has ZFS in it,
maybe FreeBSD and start running,
you know, Zpool create and whatnot.
If you want to use ZFS,
but don't necessarily like tinkering in the internals of everything,
then a more packaged solution like FreeNAS would be another good option.
Yeah.
Got a good interface for just doing most of the work.
You don't have to do any of the command line stuff at all, really.
It's all just a UI for it, which can be nice.
I prefer the terminal when I manage ZFS personally.
I feel like I can actually feel the heartbeat of the software
rather than some UI trying to tell me what to do.
I just couldn't understand it more.
Once I moved to the terminal to mess with ZFS, I felt a lot better.
That's my take on it at least.
I love that as well.
I know some people don't necessarily take the same joy we do
from feeling the heartbeat of their software running.
So I'm glad that there are more kind of packaged, guided solutions as well.
Well, Matt, it's been a pleasure talking to you through your software career,
OpenZFS, the future and the past of ZFS itself.
I really appreciate it.
Thank you so much for your time and really appreciate you.
Thank you.
Thanks for having me.
That's it for this episode. Thank you for tuning in. If you enjoy the show,
do me a favor, share it with a friend. And of course, thank you to Fastly for all that awesome bandwidth and also Breakmaster Cylinder for making all of our awesome beats. Here's a
pro tip for you. Check out changelog.com slash master.
That is our master feed.
Get all our shows in one single feed.
And for those super little listeners,
check out changelog.com slash plus plus.
That's our membership.
Get all our shows with no ads, plus some other perks.
Again, changelog.com slash plus plus.
That's it for this episode.
Thanks for tuning in.
We'll see you next time. Thank you. Game on.