The Offset Podcast - The Offset Podcast EP011: Backup & Archiving
Episode Date: June 3, 2024As a postproduction professional, one of your primary jobs is to ensure the data integrity of the projects you are working on - essentially don't lose stuff! Over the years we've heard hundr...eds of horrifying stories of data loss and unfortunately, most of them were avoidable. In this episode of The Offset Podcast, we dive into backup and archiving strategies. We'll start out exploring the differences between a backup and an archive, why it's important to NEVER work off a client-supplied drive(s), understanding online, nearline, and offline data lifecycle states, redundancy at each state, and understanding the gear needed you'll need. We'll also dive into an overview of LTO and why it is the best option for long-term archiving. We'll discuss LTO generations, connectivity, using LTFS as a file system, tape redundancy, and why a stack of drives is NOT a suitable replacement for LTO. Finally, we'll discuss some business/billing implications of archiving. If you like the show please give us a like and subscribe to stay up to date on future episodes!
Transcript
Discussion (0)
Hey there, welcome back to another installment of the Offset Podcasts.
And today we're talking about a not so exciting topic, but one that's really, really important.
Backing up and archiving. Stay tuned.
This podcast is sponsored by Flanders Scientific, leaders in color accurate display solutions for professional video.
Whether you're a colorist, an editor, a DIT, or a broadcast engineer,
Flanders Scientific has a professional display solution to meet your needs.
Learn more at flanderscientific.com.
All right.
Welcome back, everybody.
I am Robbie Carmen, and that is Joey Deanna.
And Joey, we are here today to talk about, as I said in the tease, a subject that's not all that sexy, not all that exciting.
It doesn't get really people.
Maybe not to you.
People really going.
And that is the idea of backing up and archiving, right?
I mean, how many times have we, over the years, heard, I mean, sob stories of all sorts.
of things going bad, wrong, you know, very fast for people.
And our first question is, oh, well, just restore your backup or go to your archive or
whatever. And they look at us with this like white, you know, stare going, uh, well,
about that. Um, I didn't have a backup or an archive, right? Um, I think it should be in
every post-production Bible. It should be preached from, you know, mountaintops that,
backing up and archiving is something that you have to inject into your DNA if you want to work in
production and video post production, right? There is nothing, I mean, nothing worse than that
horrible, horrible, horrible, horrible feeling when you realize that something is gone and you're
not getting it back. And in fact, Joey, we just had this happen five minutes before we got on.
Yeah, we had to dig into an archive and undelete a file that had been inadvertently deleted.
And this is, it's, it's something that I think you need to run under the assumption that you're going to need it.
Right.
It's like, you know, people that ride motorcycles say there's two types of motorcyclists, ones that have crashed and ones that will crash.
Right.
There's two types of operation of computers, ones that have failed and ones that will fail.
So if your data structure, however you're storing the important assets that you need, is not fault tolerant enough.
to handle a full failure of all of your data going away,
you're sitting on a ticking time bomb because there is no 100% reliable data storage method anywhere.
Yeah.
And I think the important thing that I'll just riff on that to say is that one,
no matter how good the marketing that you look into somebody selling hard drives,
computers, etc.
It's all just a matter of time before something goes wrong with that eventually.
And then two.
They measure it in what's called.
mean time before failure.
Oh, yeah, yeah.
Average time it takes for the device to fail.
This means the failure rate of these devices is 100%.
It's just a question of time.
Right.
And then the second part about that to riff on that is that, and I think this is going to be
a theme for this episode, is that it is not a one step or a one-size-fits-all approach.
We are going to make the case in this episode of a multi-tiered approach to
to archiving and backing things up because honestly there are different levels of this to what you need depending on the project,
but also cost plays a factor, capacity and speed plays a factor.
So we're going to dive into all three of those things.
But Joey, the first place I want to start is this concept, because I think these phrases,
and we even do it back and forth wrong sometimes too.
But the difference between backing something up and a true archive.
In your own words, kind of give us what the difference is between this.
Well, simplified to the very core, an archive is saving things for a long time for the ability to get them back.
A backup is for fault tolerance and recovering from failures.
So your backup is there to get you back to work when your storage explodes.
Your archive is there for when a client asks for a problem.
to come back, you can get it back for them. And both of those things have very different technical
requirements, logistical and operational requirements, and cost implications. And in fact,
I think we'll talk about this in more detail later, you know, where your responsibility as a vendor
lies, changes between those two categories, right? I feel that most vendors would agree that
they are responsible for the backup side, no matter what, right? The client isn't trusting you.
with their data and their project, they expect you not to lose it mid project.
Correct.
However, archiving, nobody can be expected to manage someone's data forever for free.
So there are costs and schedules and fees that can be associated with that that can and should
be managed.
So you really got to look at both of these as two kind of completely separate things with
separate solutions that are both very important.
Yeah, no, I 100% agree with that.
So let's start with also some vocabulary that I think is going to be germane to archiving and to back up.
And again, we'll dive into both of these scenarios in just a second.
And I want to bring these vocab words up just because I think that a lot of people, some people use them, some people have no idea what we're talking about.
And then there's just, I just feel like I'm on a mission to kind of standardize this language a little bit because it does get a little confusing.
and I think it pertains to both kind of avenues of things,
both backing up and archiving.
And there's three words that I want everybody to kind of know, right?
And they are online, near line, and offline, right?
And we can use those to kind of accurately describe
where in the life cycle data is.
And I think that's an important thing to really kind of cover
because it's going to very quickly allow you to go,
oh, that's the type of device or the medium that I need to be using.
these are the things I needed to be thinking about
when we're talking about that part of the lifecycle for data.
And that's a very important thing to think
when we're talking about this entire discussion,
the phrase you used data life cycle,
that's what you need to be thinking about.
It's not I'm working on a project, I'm done with the project.
There is a complicated life cycle for your data
moving through the entire production process
and post-production process and managing it is what we're going to get into.
Yeah.
Okay, so let's talk about the first.
The first scenario is the one that we in post-production face every single day.
That is client hands off a big pile of media, whether that's on a hard drive, it's a download link, wherever.
We need to initially transfer and store that stuff somewhere, right?
Think about this as our active storage, but in our discussion, I'm going to use that word again, online storage,
meaning this is the storage that is generally going to be the fastest, the most accessible,
your best storage, if you will.
And so in our case, you know, that's various NASA arrays.
It could be, you know, your biggest and best SSD attached to your system.
It's going to be the thing that you're going to access the most, most frequently,
and is going to keep the current active projects that you're working on, right?
Yeah.
So the first step here is to think about what your needs are for online storage.
How fast does it need to be?
Do multiple people need to access it?
Does it need to be a NAS or can it be direct attached?
And most importantly, how redundant does it need to be?
Because that determines the rest of the backup strategy.
Here's our first line of defense.
Okay, I want to stop for a second and pick up on something that you just said there,
which I think is really, really important, right?
Because I see this happening to people all the time.
They go, well, I got my stuff from clients on my biggest and fast online.
line storage, it's, you know, whether that's a NAS or SSD or whatever, and they stop there.
But you just said something that's really important, and that is the word redundancy, right?
And redundant does not mean, one, as far as I'm concerned, I think you'll agree with this,
does not mean just because it's on one unit or one drive that we now have redundancy at all.
And that includes if you're using a raid or a ZFS or some other technology, even though, you know,
redundant is the R in Rade.
right, it's not a backup.
Okay, and that's an important thing.
Think about it more like fault tolerance, right?
Like, yeah, you could have a hard drive die within this unit and we still have data, right?
But that is not the same thing as having a backup.
And so let's begin there.
Okay, so your raid is, you know, and I recommend having some kind of raid solution for your online storage
because what that does is it lets you survive one, two, or more hard drive.
drive failures. Right. Right. That's great. When the hard drive fails, you can keep working. But
guess what? It doesn't protect you up. You fat fingered and deleted a file. Well, that's gone forever now.
Your entire array, power supply dies and gets corrupted somehow. Doesn't protect you from that.
Somehow, two drives fail before you can replace them in time. Doesn't protect you from that.
So the first stage of the online storage is once everything is copied over, I feel like it should also be
cloned to another completely separate media.
So that's stage one, right?
And that's what we do with our projects, right?
We have the NAS systems.
Everything goes on the NASS.
And there's an active projects folder.
And that gets a nightly clone to a completely separate raid five.
So our NASAs are Raid 6 that can tolerate two drive failures.
Then we have a complete clone of the working storage set that can tolerate one or two
drive failures, depending on if it's configured for
Raid 6 or Raid 5.
But what makes that a backup,
not just a redundancy, is that
let's say the entire
storage
goes up in flames.
I can take my Raid 5 or Raid 6
array that I've been cloning to.
I can plug that
into a brand new computer that I just bought
at the MacSore, and I can continue
to work. Right?
And then the third layer of this
backup solution for us is that we have multiple sites.
So our NASAs are cloned to multiple sites.
So to start with our online storage consists of the NASS at both my house, Robbie's house,
and at our office, then the backup of the local clones to another array at those offices.
And then finally, the second backup, and this is a very, very important thing to have,
is a off-site, as in we have the three Nassas synced together.
All the active projects are synced.
So those are in multiple physical locations.
So literally if my house burns down, the project doesn't go away.
Yeah.
And so when you're thinking about that clone device,
because I think people are probably going,
what do you mean?
I just bought my NASS and it cost me 10 grand.
I'm not going to buy another NASS.
Right.
Well, so here's the thing, right,
is that I think that,
there is an argument to be made for that, that literal redundancy and that mirroring of a similar
caliber device, right? So, for example, here at home, I have my main NASS and I actually have a
second NASDAs that operates and it clones every morning, every night or morning at 3 a.m.
It's cloning all the active stuff on one device over to the other device.
Now, I have it on a NASS because of the situation that you just,
describe, right? One NASC completely blows up. I can't afford that time of, you know,
kind of going back to other hard drives or I need something that can just plug and play and it's
ready to go, right? But that right there is a cost to convenience decision, right? So the further
we move down this data life cycle, the more sacrifices we can make to the performance of the device.
So you might not have the, you know, budget to double your storage just to have a clone.
But guess what?
You could buy a cheap rate array with slower drives that's direct attached that costs a fraction of what your NASS is.
That's what you do, right?
Exactly.
I've got a five or six drive, Raid 5 external Thunderbolt chassis way less cost than my NAS, but also, you know, less interactive performance.
That's fine.
You know, you just make a decision based on do I need, when I get into a disaster recovery mode,
how quickly and performance-wise do I need to have immediately?
And just for those of you are wondering, like,
what's the reality that this is going to happen
and something's going to break?
I'm going to give you a perfect case of a perfect example of this.
About a month ago, I'm sitting here and I'm gridden a show
and all of a sudden, like, hit and playback, I'm like,
I'm like, gosh, that really should be playing back.
Like, why is it dropping frames or whatever?
And next thing you know, I get a message,
an email message from my main NASS that says,
hey, look, performance has been degraded on the array
because you have lost a drive
and turns out a second drive is actually failing, right?
So after a mini panic attack,
I was like, well, this is really bad.
So I just happened to have some spares.
So I popped in those spares, but guess what?
Cool.
Popped in the spares, the whole system has to rebuild that data.
So that's actually a really dangerous time, by the way,
to be working off of something when something's rebuilding that.
So what did I do?
I said, cool.
I'm just going to unmount that volume from my machine,
and I'm just going to let it do its thing,
and I'm just going to relink things over to this other array,
and like nothing ever happened to me, and I could wait the time.
So it does, as we said, I started at the beginning of this.
It's not a matter of if, it's just a matter of when.
In that case, that redundancy of having two online systems kind of greatly help me.
Okay, so plan brings stuff in.
You transfer it to online storage.
We're making a backup of that, ideally,
onto something that's similarly equipped in terms of-
two backups, one off-site.
Right, exactly.
So we have something that should go wrong.
Joey's house burns down, my house burns down,
raid blows up.
We have that data, and we're not going to list.
I want to make another point there that I think I see too often,
especially in message boards with this.
And I understand it's a cost factor,
and I understand not everybody can do this,
but it makes me get.
gasp every time I look into it.
And that is people describing working off of their client drives as their primary storage to do something,
right?
That is an absolute recipe for disaster.
That is a hard never for us.
Hard never, ever, ever do.
You have to think of that client supplied hard drive or that client supplied link as like something
out of like Mission Impossible, right?
It will self-destruct, right, at any given point in time.
And your first responsibility to that client is integrity of their data.
And that means I'm going to say never.
I'm going to really underline that and bold it.
Just never work off their storage, okay?
You need to be transferring things to your own storage.
And we have a whole episode talking about conforming and offline online with client drives
and that kind of stuff that I think is coming out soon.
And we'll talk about that more in that episode.
But that's a big never, never do.
Okay.
All right.
So we do our project.
everything is hunky-dory, drives work flawlessly, and I'm done with the project.
Client has paid their bill.
They're giving me a high five.
What next?
Well, if you're like most of us, right, you're not convinced that a project is done when the client says,
oh, thanks, we're done, and I paid you, right?
Inevitably, a week or two, a month, maybe two or three months later,
hey, you know, I was just looking at this more
and I think we want to make that title change
or I got a new mix.
Could you remarry that and just do a new output for me?
Okay.
The thing about online storage
is that it's going to be your most expensive storage
and it could also potentially be your lowest capacity storage.
You know, because if you're wanting to have super, super high performance
and let's say you're using SSDs,
well, SSDs are really expensive.
So it's really difficult to build up, you know,
hundreds of terabytes of online storage unless you have unlimited funds, right?
So that brings us kind of to this next idea that I want to differentiate from backing up,
and that is the idea of near-line storage, right?
This is something that you often hear talked about, and to me,
the differentiating factor between near-line and online storage
is mainly a factor of speed and connectivity, right?
Yeah, time and speed.
Time and speed, right?
So in my mind, a perfect near line option is high capacity, but not necessarily high bandwidth or high speed, right?
I want to have a lot of space that I can just kind of throw things over to this storage that I may or may not need to access, but I'm not ready to put it in, as we'll discuss in an minute, offline or long term archive.
I just need it around, but I don't need it around taking up space on my main storage, right?
Yeah. And again, we got to look at the pros and cons here because if we take it off of our main storage, we're losing one layer of redundancy.
So if we go to a separate near line storage or if we use the same clone volume as our near line storage, you do need to make sure there's some other level of redundancy there.
Because if it's only sitting on a single raid drive now where you're between your online and your final archive, well, you're in a really dangerous spot.
because now you could lose that data.
My personal kind of mentality on this
is that I like to go to the offline
to what we're going to talk about in a little bit,
which is a full tape archive.
I go to that the same time I go to the near line.
So my near line is a convenience.
Basically, when a project is done,
it goes to tape for my final archive,
and then it goes off.
of the main project folder that sinks to the three locations and sinks to my clone drive.
So in Nearline world, I can still get to it if I need to, but if it gets blown up and I have to blow it away, I've got it on tape.
Right. So it's just a matter of time versus expense versus speed.
Right, right, exactly. And to be clear, nearline storage, as we're describing it, is a convenience.
It's, you know, I consider a good online storage and a good archive solution and a good backup solution to be necessities, right?
Nearline is a convenience to, oh, the client comes back, now I don't need to restore from tape, or I don't need to keep this online forever.
I agree.
And in a certain level, your backup of your online storage is kind of your near line storage for all intents and purposes, right?
Yeah.
We kind of made it out to say, like, oh, well, we have the super, I mean, that's a best case scenario that you have something.
as high performing as your online storage as your backup for that or your near line.
But in reality, you know, the case that I see a lot of people having is,
okay, I have everything on my fast NAS and, you know, high capacity NAS.
And now I went out and bought a Thunderbolt raid as you have.
And that's my, that's my near line, right?
It's just, and I agree with the idea that of, it's kind of a temporary location, right?
It's kind of a nice to have thing.
But, you know, the other way of looking at near line, too, is that,
you know if you have high enough capacity online storage and you have an archiving solution that you
trust you might be able to in the truest sense forego near line but your near line essentially
just becomes your online backup at that point right it's a one at one of the same thing so in my
case i'm actually kind of backwards my clone volume is smaller than my big nas because because of the
order in which i bought things so my near line solution
is a folder on my main NAS,
where it doesn't get cloned
to the essential backup area, right?
But it's still accessible,
so I don't have to restore it from tape
if I need to.
Cool. All right.
So online,
near line,
as a convenience factor,
and then finally we're going to arrive at offline,
right?
And offline, as its name implies,
is something that we just don't really need
to access all that often.
This is the archive.
This is the cold storage.
There's a lot of different ways people describe this, right?
But generally speaking, offline storage is going to be the slowest, least performing storage that you are going to utilize, right?
It is going to be something just because of that speed, it's not going to be readily accessible in the same sense of just like plugging in an SSD and ready to go, right?
So you have to factor in time to get stuff better.
back from it. And third, offline storage is a little bit of a choice of how you want to go with it,
because there are now actually really two main paths of archiving as far as I'm concerned.
And that is first local offline storage, which for a lot of you,
consists of buying some hard drives and putting it on a shelf. And I just want to go on the record and say,
this is the most horrible idea that you've had.
Do not do it.
Don't even think about being it.
It's a waste of money.
I understand that you look at archiving solutions and go,
oh, this service or this drive is expensive,
but it is not worth putting a hard drive on a shelf
and just praying that it stays.
And this is going to be one of the,
this is the only other thing that I'm going to say in this episode
that is a hard, hard, absolute never.
A drive sitting on the shelf,
is the technological equivalent of a drive sitting at the bottom of the ocean.
Just assume that it is dead.
And by drive, I mean anything.
I mean raid array.
I mean SSDs.
Any magnetic media.
One of them has a higher likelihood of failure when they start up,
which means it will sit on a drive or sit on a shelf forever.
You plug it in.
The second all those drives spool up,
two or three of them will fail.
All your data is gone.
Now, I'm not saying that happens every time, but it will always happen eventually.
Again, hard drives and SSDs have a 100% failure rate.
It is just a matter of time.
So if you spend money buying drives and putting them on a shelf for an archive, because the other solutions are too expensive, you're throwing away money to avoid spending money.
So I had a filmmaker a year or two ago that was local here to the DC area and I went over to their house to,
have a meeting about a film we were going to work on.
And I walked into their work area, their studio area.
And there was this big double set of IKEA bookshelves with probably no joke,
300, 400 of those Lacey orange rugged drives.
And I literally had a panic attack because I was just sort of like,
they're not that rugged.
They're not that rugged.
And the money spent on those hard drives could have a far more robust solution.
So, yes. Now, I want to go back to one thing we said at the very beginning here, I consider backup to be the mandatory responsibility of the post-production vendor.
Archive doesn't have to be. Archive has a hard cost associated with it and a hard responsibility of keeping these things over time.
If a client is not willing to pay you for that archive, there is nothing wrong as long as you give clear communication about the responsibility to the client to say, hey,
Here is all of your project back.
Permanent archive is now your responsibility because this is your intellectual property.
If you need to come back to me, you need to bring it back to me.
That's a perfectly valid thing.
I agree.
If you can't afford a long-term archive solution.
And we'll hit on some of that business stuff in just a minute.
But I did mention that there's two main paths of this now that we've cleared away,
don't ever put it on some extra drives they have lying around.
And those two main paths are LTF.
and some sort of in the cloud cold storage, right?
In the cloud stuff, honestly, I don't recommend a lot of,
I know it has a lot of, there's backblades,
there's Amazon Glacier, there's, I mean,
there's a lot of those services that are out there
to kind of back up, you know, very, very, very cost effectively.
We're talking about, you know,
a cent per gigabyte or something like that on some of these,
The downside of going to a cloud cold storage archive like that or offline storage is it is going to take, in our workflows, it is going to take a long time to potentially get there and even longer to come back from there, right?
That's one of the reasons that it is so cheap is because these data centers have, you know, Mount Everest worth of drive sitting in sitting in Iraq.
and these are something that they treat it just like we're describing.
It is offline storage for them.
So to get something back from there, it requires, you know, a lot of effort on the data
centers part or whatever to get it back.
And I just, I've just never found it for our volume of media.
You know, any given project might be four, five, six terabytes and, you know, times 20
at a time in a month, you know, you got 20 terabytes, 30 terabytes to back up.
And the hard part too is also, you know, your data size needs for archive never shrink, right?
So the more you archive, the more your monthly bill goes up and it'll just never go down.
You're only adding cost to a recurring monthly fee.
And if you're not recouping that from your client, then it's just not economically viable.
So those solutions in terms of, like I said, Amazon Glacier, BlackBase, you know, Backblaze, the elk.
I think they're valuable.
I think they're valuable to a point and for certain types of data, for sure, smaller files, project files, you know, Photoshop files.
you know, Photoshop files, you know, graphics, some of those kind of stuff.
Yeah, of course, that's going to be quick, easy to get up there, long-term storage.
But for moving terabytes or petabytes of data, not going to be on the top of my list.
Especially not for a smaller studio, right?
For a large company that has existing cloud contracts and, you know, a normal monthly cloud
spin that scales up and down with their needs, you know, the cloud archive is a fantastic solution.
I think for businesses like Robbie and I's size,
a local LTO is better in most cases.
Okay, so LTO or linear tape open,
which is the most obtuse name like ever, right?
People look at it and I say to clients all the time,
like, oh, we're just going to put that back on tape.
And they look at me like I have two heads.
I thought tape was dead.
Yeah, I thought tape was dead.
What are you talking about?
Okay, LTO is a data tape format.
It has its roots and we're probably dating ourselves
a little bit here with, you know, going back to the days of that and worm and all these kind of these tape technologies that were out there.
LTO has become a standard archiving medium that literally everybody involved in big data uses, right?
These tapes, a single tape has a shelf life of 20, 30 years.
It's, you know, think about it this way.
It's what your bank uses.
It's what, you know, governments use.
It's, you know, it is a verified, bona fide archiving format that is designed for longevity and for, you know, future restoring, right?
And there's a couple things you need to know about LTO besides just the name of it.
Number one, there's two components of an LTOS or really three components.
There is the drive itself, right?
This is a physical piece of hardware that has connectivity on it sometimes, and we'll put this in the show notes.
There's a couple different ones out there.
Could have, you know, Thunderbolt backplane where it's Thunderbolt connectivity.
It could have SaaS connectivity right in the black plane.
They're all SaaS.
They're all SaaS.
All LTO drives are SaaS.
There's a couple vendors that make really convenient enclosures that have things like Thunderbolt or USB to make it more accessible to desktop computers.
Yeah.
And these days, you know, there's still, there's only a few companies that are actually making the actual drives.
IBM comes to mine.
HP might see.
IBM, Quantum.
HP, and I believe there's one more.
The LTO stands for linear tape open.
It is a consortium of those companies that make the drives, and they all are the same
standard.
So there's no IBM version or HP version or quantum version.
Right, right, right.
They're all cross-compatible.
So there's the drive itself.
There's the tape, which is the physical medium that gets stored on.
We'll get back to that in a second.
And then there is the archiving software that you optionally can use.
And I say optionally because most of the platforms that, you know, Mac OS or whatever,
you can install a tool set to kind of just mount that LTO drive directly in your OS and treat it like any other drive,
you know, just to drag things back.
Now, do you be clear, it's not like any other drive.
It's not as fast as any other drive.
It's not, you're not going to have that interactivity.
But you can through a technology or a standardized format called LTFS,
mount that as just like any other drive and copy it back.
You may want to consider, and there's various tools out there, we'll link to some of them in the show notes, but there are archiving pieces, LTO archiving software.
Think about it as a piece of software that builds a database about the content that's on that tape, right?
Yeah, it tells you what tapes have what files.
And more importantly, the LTO software is usually designed to make your copies and interactions efficient.
You know, LTFS as a thing, it presents to the operating system as a regular file system.
you just open up Explorer Finder and start navigating around.
But when you start navigating around, like in Finder, if you open up a folder,
it's going to look at all the files to try to make thumbnails.
Well, if all those files are on tape, that tape drive is going to be spinning up and down,
like crazy.
You're going to get a beach ball and be like, what's going on.
Yeah, yeah, yeah.
Right.
Your normal file system interaction on the computer is not designed for tape.
And there are programs available that, yes, they copy via LTFS,
but they make it more of a tape-based interaction so you're not waiting on the tape drive.
Yeah, and they do a lot of things, including formatting, erasing tapes, they can add some metadata to the tape, you know, that kind of stuff.
And there's a lot out there.
Hedge comes to mind, yo yada yada, yada.
You know, there's various platforms out there.
And you just have to compare the features of what this is not a show about which features are.
But the thing that is the most confusing about those three components, you know, drive, tape, and software, I think to people is the generational numbering
of LTO, right? So you see LTO, 5, 6, 7, 8, I think we're up to 9 now, right? And essentially,
this is the way it works, right? Every generation that comes out is a new tape format, and it is a new
drive that supports that tape format. And essentially what you get over every successive generation
is usually two things. Increased capacity on that tape, right? So, you know, going from a couple
terabytes, five or six. I forget how many
terabytes eight and nine are now,
but it's a lot. LTO 8 is 12.
12, right, okay. And then
so more capacity on a single
tape, and then generally speaking, some
speed improvements, writing
back to each tape, right? Now, those
generations are really important because
any given LTO drive
so let's say you go out and buy an LTO
8 drive. The way that
the standard works is that an LTO
8 drive will read and write
eight tapes. It will read,
and write the previous generation. So if I got an 8 drive, it will read and write 8 and it will
read and write 7, but only at the specs that 7 was originally capable of. And it will read,
read only another generation back. So in the case of LTO 8, read and write 8, read and write 7 at
seven speeds and capacity, and read only from 6. You have an LTO 4 tape. Guess what? You either
need to get an LTO 5 or an LTO 4 drive to be able to support that. You cannot read that.
And so, and the reason for this is that these, this kind of technology is a very long-term investment with a very long life cycle, right?
Like Robbie said, these tapes can last 20 to 40 years.
So it is not something like the latest iPad where a new generation of tape comes out and you need to jump on and buy the new version to stay up to date.
No, you make an investment in your archiving solution and that is an investment that lasts.
years and years and years and years.
There's no reason why if you have an LTO 7 drive,
you need to move to an LTO 8.
It's just you have to buy more tapes.
I'm on 7 and I've probably been on 7 for five or six years
and really not running to any limitations otherwise,
other than I have to use more tapes than you do because you have an 8 drive, right?
So that is something to consider.
I generally find with LTO, I think probably every six or seven years,
I consider an upgrade.
and by that time, you know, if it's a huge jump in capacity or a huge jump in speed, maybe.
But that investment...
And part of this also is, you know, the cost of entry, the biggest cost of entry is the drive, right?
The drives range from like three to six or even seven or eight thousand dollars.
Oh, yeah, exactly.
You can save money by buying a generation back.
Oh, yeah.
And the tapes are cheaper, too.
So that's a generational thing about LTO.
When it comes to the software, again, LTO's software, mainly think about...
it as a database for being able to quickly find and restore things because great you get on tape but now
who knows what you have uh that's a big thing but the other thing it helps you do is things like
make redundant backups which i'm going to talk about in a second so you can actually have multiple
tapes at the same time uh be written especially if you have dual drive systems um so one of the things
to understand about lTO software and this is a change that happened what do you think probably around
LTO 4, LTO 5, was that prior to that, yes, LTO was an open standard, but there was a lot of people
that were doing their kind of own proprietary ways to write that tape, right?
Their own...
My old Quintel EQ system literally plugged Scuzzy into an LTO drive and archived in
completely its own format.
And that's how it worked, right?
Only another quantel could pull those tapes.
Right.
So that's not a concern for modern archiving, but it is something to pay attention.
to if you have, you'd be surprised.
You know, clients say, whoa, I have this project.
I have an LTO.
I have a project from 12 years ago, and I want you to restore this, right?
And you plug it into your drive and it looks like the tape is dead.
First thing is it might be one of those proprietary formats.
And so unfortunately, with one of those primary formats, proprietary formats,
the only thing you're going to be able to do is get that piece of software that that that
tape was originally written to be able to read that, which brings us to the change
that happened right around then.
and it's important for people to understand the technology of LTFS.
Essentially, LTFS is a open file system that is shared between manufacturers and is used by everybody pretty much now.
But what this means is that I can make an LTO archive on my end and bring that, assuming that they have the correct hardware to support that generation of tape,
I can bring that tape anywhere and they can mount it.
So for intraoperability, delivery, there was a period for a while where a lot of networks like Discovery Channel, for example, were asking for deliverables on LTO as an LTFS tape, right?
And they had some specifics about metadata and stuff.
But the beauty is you can make archives and deliverables like that.
So that asked the question that we've been hinting at and going around or the course of this episode is that what is the business model in post-production for doing this, right?
I want to clarify something that I think is, if it's not abundantly clear about Joey and I's
outlook on this, I want to make it perfectly clear.
We archive for sanity.
And what I mean by that is that we, the feeling that we get of data going, you know,
hitting the delete button is just not something that's in our DNA, right?
And so we invested in these solutions long before we were offering it as a service.
to our clients. To me, an LTO drive is like anti-anxiety medication. I'm not, I need to tell an
anecdote here because, and I, sorry, this is not meant to poke fun at you, but it's, it's it, and I,
and I feel you on this. I remember last summer, Joey, but got, about to go on vacation with his
family, right? And he's like, rushing, you can tell like he was rushing when I talked to him
on a phone call. What, what are you doing? He's like, I'm just trying to write another tape.
I'm just trying to write another tape. Literally could not leave the house.
house for vacation without archiving some stuff because of the anxiety associated with data.
And I'm right there with you, man.
I feel the same, the same way.
So it's one of those things that like, if you're obsessive, compulsive like us and you're
scared of data loss like us, it's probably something you want to just think about doing anyway.
But I don't think, as Joey pointed out a few minutes ago, you know, you could back up
that online storage, do the process.
deliver the project, give it back to the client and just wipe your hands of it. That's totally an option.
So it brings up the idea of do we charge people for this? And I think it really kind of depends, right? I try to do it more and more and more and more.
We have rates for long form and short form archival. And we have rates for restoration, which is an important part about this. Years later, they come back to us.
We got to find that tape. We got to restore those files back from tape.
to near line or online storage.
And there's time involved in that, right?
So we have rates for long form and short form for that.
What that rate might be is something that I don't think there's probably much standardization on.
I think it varies, you know, depending on what people want to do.
It's also something that I would just, I would urge you to include in your project proposals and bids on everything.
Just have that fee, make it transparent to the client, right?
And I think the one important distinction to make to the client, if you're going to charge for this, is we are archiving the media that you delivered to us and we are archiving the work that we did.
It does not mean that you are archiving everything for that project for the client.
Yeah, we're not becoming an archive vendor.
Right.
If you wanted to offer that as a surface, great.
It seems like a headache to me to take mountains of data from clients to archives.
and it's a lot of wear on your tapes and drives.
But I think that's something important
just to make sure, like, hey, I'm not backing up
all your camera originals. I'm just backing up
the conform that you gave me
or something like that, right?
Yeah, and I think the last thing I want to talk about
on the business and billing side of this
is, you know,
like Robbie said, we archive just about everything, right?
For our own sanity.
That doesn't mean that we nickel and dime
every client on every single archive.
If it's a 30-second spot,
we're probably not going to put a line item on the bid of tape archiving, right?
If it's a two-hour feature, yes, we're going to talk to the client in advance,
explain to them what the process of LTO archiving is, and say this is what it costs to archive it.
But if a client comes back two years later and says, I need my 30-second spot back because I deleted
it off my system, sure, we can say, of course, but it'll be an hour of archive retrieval time
to do that.
Yeah, totally.
And I think, you know, at the end of the day, that discussion is something you're
clients will ultimately appreciate going, you know, knowing that, oh, thank God, I don't have
to spend megabucks to, you know, crappy R drives to put on a shelf. Or even if I do, when they blow
up, I've got a backup. Right. And one of the, the last thing I'll add to that part of the,
the business part of this is, I think it's important that clients walk with their data as well, right?
There's no guarantee that they're going to come back to you. So one of the,
the things that I would consider is when you're making that archive of a client's project,
that you think about doing it in duplicate. And we didn't really touch on this, but
duplicate LTO authoring is something that from a pure data integrity point of view is
something that you probably always want to do, you know, want to have that tape, you know,
have it just like backup of the online storage. We're backing up our archive to two different
tapes should something go wrong with that and putting, ideally, putting one of the, one of the copies
in a separate location from the other copies.
So, you know, whatever, office burns down.
We're not losing two.
But what I tend to think about like that is,
hey, I'm going to make two copies,
one for us to keep and one for, you know,
a deliverable for the client to have.
So I am going to charge them not just the archiving and restoration fee,
but the nominal cost, you know, the 50, 60, 70 bucks
or whatever it is for, you know, LTO 7 or so tape
and charge that as a line item as well,
just so they have their own copy of it.
And as LTFS, they can bring it anywhere they want,
if they need a resource.
So let me just wrap this all up.
I'm going to try to wrap this up and wait one nice little bundle here because I know we've
talked about a ton of things.
First, backup is different than archive.
Backup is an absolute requirement.
Archive is negotiable with your client.
We've talked about very expensive solutions from beginning to end, right?
Every single thing that we've mentioned has a cost associated with it.
So you've got to look at your data life cycle.
and where it's required to be redundant, where it's required to be fast, where it's required to not be fast,
come up with your own strategy that works for your workflow, your budget.
But like anything else, there's ratios of cost to benefit, and you need to analyze that kind of yourself
and figure out what makes the most sense.
But having that plan, having real backups in place, having an understanding,
of what Raid can and can't do, what backup is, what archive is, I think is just absolutely
essential to anybody working in post-production. And when you look at those costs, right,
when you look at that initial investment of I need to buy extra drives, I need to buy extra tapes,
I need to buy a tape drive. You know, Joey's telling me to spend thousands of dollars on this thing
when I've never had a problem of just plugging in my SSD. Well, let me just tell you this.
I have worked in post-production for almost 25 years.
Between 20, 25 years?
You're getting old, man, yeah.
I have been around computers my entire life.
I have been around computers in post-production for literally the entire time computers
have been used in post-production as data devices.
I have never, ever once lost an essential piece of customer data.
Yep.
I have endured hundreds of hardware failure.
and software failures and temporary losses of data, of many different things.
I have never, using these philosophies that we've talked about, I have never lost a customer
project.
It's one of those things where people are cavalier about it because they haven't had the pain,
the guilt, the agony of losing.
And all it takes is one time for somebody to be.
a true believer, but I think if you really digest what we talked about on this episode,
the time to become a believer is not in the middle of it happening, but is ahead of time.
And yeah, I would concur that, you know, spending two, three, four, five, six grand,
whatever it is for an LTO solution or an additional rate or whatever, seems like a lot of money
up front. And it is, it is. But in the grand scheme of things, think about the detriment
to your business, your reputation, all of that kind of stuff.
If in the middle of a super important project, things go wrong.
And I've seen those arguments, right?
Like, you know, like the operator blames the client for not having done their own back.
Like, you just have to go into it assuming that the client doesn't care or understand
how to maintain data integrity.
And as we said at the top of this episode, that should be besides the amazing creative work
you're going to do.
Data integrity should be your, you know, it's the one B to the one A of being creative, right?
That data integrity throughout the entire pipeline and life cycle is something you should consider.
So, awesome.
I think this is a great talk.
If you're new to LTO, new to archiving, feel free to leave us some comments.
As always, we really appreciate you listening and watching to the show.
As a reminder, the show is available on all major podcast platforms, including Spotify and Apple Music.
You can also get an RSS feed right off of our site.
If you go to YouTube, you can find the show there
where you can actually watch a video of this episode
and listen to Joey and I
and watch us just stickulate with our hand-talking
that we always do.
And if you do like the show,
please give us a thumbs up and a like wherever you've seen it
and subscribe wherever you're seeing it.
And if you are able to do a review, even better,
those reviews really help us out gain traction with the show.
So for The Offset Podcast, I'm Robbie Korn.
And I'm Joey Deanna.
Thanks for listening.
