Storage Developer Conference - #117: Developments in LTO Tape Hardware and Software
Episode Date: January 7, 2020...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast, Episode 117.
Thanks for coming. My name is Takeshi Ishimoto of IBM. Today I'm going to talk about tape.
So that's why I'm asking you if you joined the previous speech by Fujifilm. There are
quite a lot of overlap, so you may see a very similar chart in my slide deck. Even though I'm part of IBM's storage development team,
I'm going to talk about the R activity in the SUNYATWIG.
We have the SUNYATFS-LTFS-TWIG.
I think if you have attended the opening session today,
the executive talked about the R standard, linear tape file
system format specification.
So we will talk about the new specification 2.5 we
released this year.
And I'm with Dr. David Peace, who invented LTFS.
And we have been working for LTFS productization
for last 10 years.
So he will be talking about the new update
on LTFS format.
So as I said, and also in the title,
I'm talking about tape.
Tape is a very exotic topic.
If you look at the agenda, there are NVMe, persistent memory.
Those are the things most popular.
And they love speed, right?
Performance is most critical. And if you have a budget affordable
for buying all your data
by using those storage, that's fine.
But if you have hundreds of petabytes
or exabytes or zettabytes,
then your cost is limited.
That's where the tape is fit in.
So historically, the tape was started very early in the 50s.
It's like the size of a large refrigerator,
and it was more than 60 years ago.
Now, LTO8 in the middle
is the most recent
technology we have.
And it's about the size of your palm size.
It's about four inch square.
And that can house 12 terabytes.
And also, the speed is growing. So it can reach to 360 megabytes per second with single tape drive and single tape cartridge.
But if you look at the future, number might be different, but we have IBM research team prototype in the year 17 and proved that we can store more than 300 terabytes in the
same form factor.
And this will affect to the cost of storage and that's why IBM is developing this tape.
But let me talk about the device today.
So it's very obvious.
Linear tape file system.
It's a tape file system for tape.
Right?
And it supports multiple technology.
It's technology neutral, even though you
have to have a tape drive.
And there are currently three different technology available
from various vendors.
Most popular one is the linear tape open.
And also IBM and Alcoa has their own proprietary form factor
with more enterprise features.
Oh, let me go back.
So this file system requires a multi-partition.
You need to split the tape into two logical partitions
so that you
can store the files and for the first choice is a linear tape open LTO is the
joint redeveloped by the IBM HP and quantum and providing that technology to
the using the same standard.
So you can buy a tape from one vendor, and you can use that tape on the different vendors.
It's about half inch wide.
He already mentioned that it's about the length of one kilometer, or half mile, more than
half mile. So you can go to Studium, come back,
and go again and come back.
It's very long.
And it's recording in the linear.
That's why we had a linear at the beginning.
It's not the helical scan method.
You can see on the video tapes
so that you cannot affect the quality of tape.
First generation comes out in the year 2000.
And we have been updating the technology every two, three years.
Now we have L updating the technology every two three years now we have a LTO 8 generation as for the LTFS LTFS supports is supported from generation 5
in 2010 you already see this already.
As I mentioned, we started the LTO in 2000,
and LTO 8 was coming actually in 2017.
And there are, after generation 12,
that's promising that we will deliver higher capacity.
There are two numbers.
Actually, that 12 terabyte is the native capacity of LTO8.
And with the completion capability in tape drive,
you can store 30 terabyte
with a 2.5 completion ratio.
And when the year comes,
it will reach into 192 terabytes
in the same form factor.
And with compression, 480 terabytes
in the same one tape.
That's pretty amazing.
So it has has very historical achievement and also the future commitment by the vendors.
If you look at the LTFS, it was available from 2010.
Same time we have the LTO5 tape drive.
And it was already adopted by various hardware vendors and also software companies.
I'm showing the list of their names here.
It's actually coming from the LTO consortium.
You can see that IBM is listed
as well as the other vendors.
I think Fujifilm is listed too.
So you can see that it is already adopted.
And to make this kind of adaption,
SUNIA has been discussing this standard
and make it available as two,
make two specifications available.
One is the LTFS file system format.
The other one is the bulk transfer specification.
We work together with the cloud storage Twig
to make the data portable from one cloud to another using tape. And we're talking about the detail
of a format specification in soon.
So here's this chronology of the format specification,
starting from version 1.0.
Actually, it was developed by IBM,
and David was the lead designer of that software.
And we donated that, IBM donated that standard to SNIR
and becoming the version 2.2 standard in 2013.
Later, this version becoming ISO standard,
so it's now available as this number.
So just like the LTO makes the tape and tape drive compatible
between the companies,
LTFS is the design to make
one data on tape
accessible from different software.
And since then,
we have been updating, adding more functionality
in the file system.
2.3 was in 15 with additional support of more characters.
And also, other features listed here.
And this year, 19, we have version 2.5.
This is going to be submitted to ISO,
so it will become standard, international
standard. Okay. So, Y-CAPE, I think you also see this
one. And the key benefit, as I said, if you think about the large amount of data and you want to store it for long years,
it's not only the cost of media itself.
Your data center space, electricity,
those affecting the total cost of ownership.
Right?
And of course it's highly available.
It's highly reliable.
And also the other interesting capability is portable.
So you can ship one cartridge to the other location and the other party can read it using the same format.
Little illustration about tape, it's very conceptual.
Basically the tape is, you have two leaves, and motor is moving the tape,
and there are heads moving on a y-axis. Even though it's half inch wide,
it actually has more than
6,000 tracks.
Can you imagine each track is how narrow it is?
And it requires highly,
high precision control of tape movement and head movement in the tape drive so that we don't miss any data.
And there are a lot of functionality in the tape drive, including the encryption, compression, and other functionality, other function in tape
drive. As for the interface of tape drive, we have
fiber channel interface, SAS interface, and also lock interface. It depends on the vendors.
What's else?
Just for the people,
I'm not so sure if there's any HDD vendors here, but tape is basically the single recording.
So it's writing
half the overlap so that we can write more data. recording. So it's writing hardly overlapped
so that we can
write more data.
So that makes
a little difficult
to update in the middle.
So basically
tape is append-only device.
You can keep adding more data
until you reformat the tape entirely
and you cannot update in the middle.
But I can talk about why we need a dual partition for file system.
You can see this chart already, right?
It's a really good question why this is affecting the cost, but it's an
area of density trend of different storage device.
Blue one, red ones are the HDDs, they are saturated,
and this line and green lines are the tape.
Because the difference between red line and green lines are the tape. Because the difference between red line and green line make as a room to improve the capacity now.
And also this blue line is in the research organization,
did the demonstration.
So this gap is actually telling you how soon
you will get the same capacity in future.
It's about 7 or 10 years
difference. So we are preparing for 10 years
later so that we can get
this capacity improvement.
If the cost of tape is the same,
this will dramatically reduce the cost of tape is the same this will dramatically reduce the
cost of storage and there is a room okay so now that the LTFS format so format So, format specification defines how you, this compatible software, have to record the data on the tape.
There are two things. One is how we describe the metadata of file, timestamp, and other attributes.
And put one big XML.
And record the most recent update at the end.
We have two partitions.
One is the smaller one called index partition.
The other one is data partition.
It's about 3% of tape capacity,
and the other 95 or 96% are used for data recording.
And the index partition is basically have only one index
and always overriding with the most recent index.
For the data partition is interleave of user data and index
so that you can add more data to the tape.
So here's more illustration to the tape. So here's more
illustration about
the format. So IP,
as I said, index partition is
overwrite-only mode, and the
error data partition is append-only mode
until you reformat the tape completely.
And we place the most
recent index at the beginning of
index partition so that at the time the tape drive mounts the tape,
you can get the contents of this tape very easily
instead of going through the tape.
And the data partition has multiple index.
It's like a snapshot.
So every snapshot, we generate the complete index. It's like a snapshot. So every snapshot, we generate the complete index
and write at the end, at the time.
And start writing the new files
and then write another index.
That's how the LTFS tape is formatted.
Now, the key topic here is, I'm going to turn this mic to David
about what are the changes we did in the latest release.
Thank you.
Thank you. Can you all hear me?
So I'm going to be talking about, as Ishimoto-san said, the latest changes to LTFS.
Before I jump into this, Ishimoto-san has given a nice overview of LTFS.
Does anybody have any general questions about LTFS before we jump into what's new?
Okay, so a picture that Ishimoto-san showed a few slides back is a picture of how indexes or indices,
we've had multiple discussions of which is the proper usage,
and the dictionary says either works,
are laid out on an LTFS tape.
As Ishimoto-san said, in the index partition,
at the time a tape is unmounted,
there is always the most recent index that has everything that's on the tape written in the index partition so that it's very quick to find at mount time.
Because you think of this as a file system, just mounted and unmounted just like a disk or a CD or something would, and you want to be able to read what would be the inodes and such in a disk file system.
And so in the case of LTFS, you read the index partition, you find the index,
and you build your in-memory data structures that know where to find data on the tape.
My original concept of LTFS was that a data partition would have nothing but data blocks in it, but it turned out that that was impractical. So in fact, as we record data in the data partition,
we periodically, and Nishimoto-san's last slide showed this better than mine does,
we're writing data blocks, and then we periodically record an index and then data blocks. And the
indexes are written in the data partition for two reasons.
One is a recovery or a sync point.
So as you're writing your files in the file system
and you're writing along on the tape, and tape is append only,
and it comes time for a file system sync,
maybe the application does a sync call,
or we have some policies in LTFS that you can set for
don't write more than this amount of data before a sync point, or don't go more than this amount of time.
You can even set it at the close of every file.
If you were a media and entertainment company who wrote multi-hundred gigabyte files,
and every time you wrote one of these, you wanted to make sure that it was committed.
You could have a policy that said, I want a sync point at every file close.
So it depends on how you use it.
But in any case, at various points, we write an index.
Now, you say, well, you could go back and write the index on the index partition,
but that would have several bad effects.
One is that you'd be rewinding the tape to the beginning,
switching partitions and overwriting the index partition,
and then having to seek back to where you were and continue,
and that would be horribly inefficient.
And the other is that you would actually eventually wear out that section of the tape.
So that's not a good plan.
So instead, we periodically write these indexes in the data partition,
and that has a couple of added benefits.
If you happen to be writing data out here,
and the tape drive, let's say, loses power due to a data center power outage,
at the time you mount the tape later,
the system will immediately recognize that it wasn't unmounted cleanly
due to some things that we store in a small non-volatile memory,
and it will go search backwards on the tape for a file mark
and find an index and be able to recover to that last sync point. It also has the interesting capability of giving you a rollback point,
kind of like Time Machine on an Apple,
where you can actually go back and say,
well, I want to find the closest index to this date and time
and go back and open the tape as of that index
and recover an old version or a lost
version of a file or that kind of thing.
So it's just kind of an added benefit of having
indexes in the
data partition
because tape is append only. You never
overwrite. You never delete.
So that data is always still there. You can go back
and find old data if you need to.
Is that by design or is that by policy? That it's never overwrite? That's by tape architecture. find old data if you need to.
That's by tape architecture.
So what Ishimoto-san was explaining is that tape is written in a shingled fashion,
so you can't, I mean, you could back up to here
and overwrite everything forward,
and in fact we support that,
but you can't just go and say,
I'm going to go rewrite this block on that data file. That's because you'd be destroying data that follows.
Are there still worm tapes?
There are worm tapes, yes.
Sorry, that's not the one I wanted to ask. You can only write once? That's right once. Write once, read many. Yes, there are worm tapes, yes. Sorry, that's not the one I wanted. You can only write once.
That's right, once.
Write once, read many.
Yes, there are worm tapes, yes.
Okay.
And actually, LTFS has to run in kind of a special mode with worm
because it doesn't allow you to overwrite.
So actually, you might have more than one index in the index partition in that case.
You did say that the index partition is that case.
Yes.
Data partition is append only.
And except in the case of Worm,
index partition is overwrite.
Overwrite starting from not the volume
labels, but just the actual index
itself.
We actually have standard Vol 1, old-fashioned 80-character IBM tape volume labels on here, if you can believe it.
Speaking of old-time tapes, I'll just comment that when I started in computing,
tape was also half an inch wide, but instead of 6,000 tracks, it had nine.
It's just an astounding number. In any case,
the other thing about this indexes in the data partition and finding these earlier indexes
is that we have this unbroken chain of index back pointers. So whenever we write an index,
we write into it a back pointer to the prior index.
Whenever we unmount a tape,
we take this index,
we write it up,
well, we write the last index here.
We set its back pointer.
We then write that index again in the index partition.
We make a pointer here,
and that's how indexes are chained.
So this is all well and good good except that we have a problem.
Tape capacities since LTO5
when LTFS was first released
we've gone from about 3 terabytes
I'm sorry
1.5 terabytes.
That's uncompressed, though, right?
So it was 3 terabytes compressed.
I'm going to use compressed numbers because the tape drive compresses internally.
Unless you turn it off specifically, you're going to get the 3 terabytes kind of number.
At some point along the compression ratio also changed, right?
Right. We went from 2 to 2.5.
We changed the compression algorithm and the,
they don't call it a buffer size,
but there's a term for it that I can't remember.
In any case, we've gone to about an order of magnitude larger
and we're looking at another order of magnitude
over the next 10 years.
And so tape capacities are growing
essentially exponentially.
That increases the number of files on a tape,
thus the index size,
because every file takes a certain amount of space
in the index.
Plus, in many cases,
people are actually, oddly enough,
using smaller rather than larger files,
which also increases the amount of space that the index tapes.
So we're getting this corresponding increase in the overhead of recording this index in the data partition.
It takes time to go through and build the index into an XML structure,
because obviously it's not stored as XML in memory.
It takes time to write that on the tape, and of course it takes space on the tape.
So this has gotten to the point now
where that's become an issue in some installations,
and so we set out to solve that problem.
And the way we decided to solve it is probably kind of obvious.
We decided that, well, rather than writing a full index
every time we write an index on the tape,
we'll write what we've decided to call an incremental index,
which is just the incremental changes from the last index.
So with version 2.5 of LTFS, we now have these incremental indexes
that record only the changes to the file system since the prior index.
That prior index could be a full index or it could be an incremental index,
but whatever's changed since the last
index gets written in the next incremental index.
So there's some rules about incremental indexes.
First of all, they only appear in a data partition, and they only get written as a result of one
of these sync points.
They never get written at unmount time.
So whenever you unmount the tape, you're writing a full index so that you end
up with the same index at the end of the data and index partition, just like always. And
of course, the index partition can only contain a full index. But incremental indexes can
be interspersed with full indexes in the data partition.
And again, they would only be written as a result of this kind of a sync point function, right?
Calling sync or having the policy for when you do synchronization trigger an index in the data partition.
So we expect that full indexes would be written periodically,
and we're saying kind of as a rule of thumb,
we would think that every 5 to 10 indexes,
you'd probably want to write a full index,
because if you needed to recover the tape for some reason,
you wouldn't want to write one full index
and then have the rest of your tape be full of incremental indexes.
You'd have to go back to the beginning of the tape and roll forward. It's kind of like an iframe, an MPEG, and then the
other frames that follow it. So periodically, we expect that you'll still write a full index, but
maybe only every five to ten or so indices on the tape would be a full index. And then from that
point forward, you can have incremental indexes. Although an implementation can always choose to write a full index, let's suppose,
for instance, that there have been a huge number of changes to the file system since the last
incremental index, and the implementation says, oh, I've got so many changes that it doesn't make
sense to write an incremental. I'll just write a full. Or maybe that implementation, as mine did,
keeps a log of changes, and finally you say,
my log's too big, throw it away,
and mark it as a full index next time.
So anyway, those are some rules about incremental indexes.
As I talked about before,
we have this chain of back pointers.
We kind of saw this picture basically before.
But now we have two kinds of indexes, so we actually need to have two kinds of back pointers.
And this is important for backwards compatibility.
And my next slide is going to be all about backwards compatibility because that was a big issue for us, of course.
So we now have two kinds of indexes.
We have full indexes and incremental indexes.
And you can see a picture here of a tape that is in a consistent, what we call a consistent
state, meaning it's ready to be unmounted.
And so we have here a full index, of course, in the index partition, pointing to a full index at the end of the data partition,
but a couple of incremental indexes and then a full index.
By the way, when the tape gets formatted,
empty full indices are written at the beginning of each partition.
So that's kind of the starting state of the tape.
You do a file system create, of course.
So now we have these two different back pointers.
We maintain what I have in red here, the red back pointers. So you can always follow the full index
chain from this full index to this full index, and if there were more, continuing back. But now
we also have an incremental chain, which I've shown in blue.
So if there is a prior incremental index, there will also be a pointer backwards to the prior incremental index until we don't find one, and then we use the back pointer. So a version 2.5
and later implementation would always look for a blue pointer,
and then if they don't find one, use the red pointer.
If you'll excuse my using the colors instead of the terminology.
But it'll always follow the incremental chain back, unless there was some reason it wanted to go all the way back to the full index
and look forward for some reason.
Something we didn't talk about is that these file marks offset
indexes in the partitions. They're not used to separate data blocks and files. They're
just all interleaved as they would be on a disk without any file marks. Old fashioned
tapes used to use file marks to separate files. We don't do that. So you could actually always
get the tape drive to scan forward to the next file mark
and find the next index if you wanted to do that kind of processing.
But as I mentioned earlier, one of the really important considerations in this design,
because this was kind of a major overhaul, actually, of our format was backwards compatibility.
We did not want to create a format of tape
that could no longer be used in earlier versions of LTFS.
We wanted to create tapes that, when they were complete,
could be sent to any other LTFS installation and still read.
So what we call a consistent volume
is a volume that's ready for unmount.
It's had its final index written in the data partition
and copied to the index partition,
and the chain's been built
so that it's ready to be unmounted and read.
So consistent volumes always end with a full index.
Therefore, they can be mounted and used
without any difference by earlier versions of LTFS.
The only thing an earlier version would do is if you were saying, I want to roll back to an
earlier version of an index on a tape, an earlier implementation of LTFS would only see the full
indexes and would miss the incrementals, but it would still operate perfectly correctly.
What about inconsistent volumes?
What about a volume where you were writing incremental indices
and power dropped?
Well, it turns out that inconsistent volumes
are recovered using a utility we call ours LTFS-CK, LTFS-CHECK.
An older version of the LTFS-CHEK utility would fail horribly, probably,
if it encountered an incremental index where it expected to find a full index
because it wouldn't know what to get.
An incremental index would be the same as data.
It would just look like data and say, well, I looked at both sides of the tape mark.
I couldn't find an index and throw up its hands. However, what's the likelihood
you're going to take a tape that's been not correctly unmounted on a 2.5 system and decide
to send it to somebody who's running 2.3? Pretty unlikely. If you did work in an environment
where for some reason you had two different versions of LTFS running,
what you'd want to do is take the newer LTFS-CK version and run it at the older installation and use it for your recovery.
And then you could still recover newer versions of the tape took to do this in terms of LTFS tags and that kind of thing.
But I think that rather than doing gory details, we have less than 10 minutes, I think, before we're around 10 minutes.
I'll stop and ask questions.
And if your questions involve the gory details, then we can go to those slides.
So, yes?
So that's an interesting question.
The tape drive, by its very nature, does shingled writing.
So in Ishimoto-san's picture, it showed, but he didn't really go into the detail of a rap.
A rap is, the way the tape drive writes is called serpentine recording.
Let's go one more.
No, I'm sorry.
Somewhere in there you had raps, and I didn't.
In any case, serpentine recording means you write down the tape, you move over a little bit, you write back, you move down, you write, you write back.
Serpentine, this is...
Yeah, jeez.
208 wraps, and so it turns at both ends, it does this 208 times.
As we're writing one of these wraps, we're overwriting the next
pair of wraps, interestingly. So now if you would go forward, Ishimoto-san. Thank you.
So basically, we have two wraps here. We have two more wraps that are called a guard band,
and then we have all these other wraps. So although we are doing shingled writing here because of this guard
band, it isn't
actually overwriting any of our data
and that's how we get away
with dual partition tapes.
Thanks.
Any other questions?
Gory details? Anything
like that?
Yeah.
I haven't heard anything about...
Oh, oh, oh, I'm sorry.
That's a hardware statement.
This has nothing to do...
Oh, I'm sorry. That's a hardware statement. This has nothing to do... Can you repeat it?
Oh, I'm sorry.
The question was about backwards compatibility of generations of LTO.
So LTO...
Tell me if I get this wrong, but I think I've got it right.
The LTO standard says...
So remember that as we're going to different generations of LTO,
the tracks are getting closer and closer together,
smaller, and therefore the read-write
heads are changing size.
So the
rule for LTO is that we'll
always be able to read hardware.
The hardware will always be able to read
one or two back.
Two back and one forward.
That's up to you. So 828, we can read and write in the previous situation. One or two back? Two back and one forward?
Seven.
Right.
But not six.
And that's a hardware statement.
That has nothing to do with this backwards compatibility in the format that I've just talked about.
That's because the hardware changes to the point where it's impossible to read those wider tracks any longer. So the answer is no, but that's a great question. so that you can use the old stuff to create a new page for the new version.
And later you may decide to copy the old generation page, the newer generation, by using tape to tape drives.
That's how you can migrate to the new generation.
Yeah.
But the use of...
My background is, we have a site, the older library, 5 and 6 and 9.
So my report is from 8 to 8.
And I also have to have 7.
It's a lot of straight current.
Yeah.
Yeah.
And basically, I think Haru kind of alluded to this as well
when he talked about reclamation, and he said,
well, you would do reclamation when you did this kind of migration
to the next generation, combining two tapes into one
and reclaiming dead space at the same time.
That was kind of the same idea.
Yes? Is the same idea. Yes?
Is the latest generation currently available?
Currently is 8.
Yeah, and that came out last year.
So we would expect 9 to come out maybe next year.
It's usually every 2 to 3 years.
So 17, 18, 19, 20.
You might expect something.
I have no idea.
I actually am not involved in the hardware, 18, 19, 20, you might expect something. I have no idea.
I actually am not involved in the hardware,
so anything I say is non-binding.
Excuse me?
Yeah, it could be, yeah.
Any other questions?
Okay, well, thank you very much, and enjoy the rest of your afternoon.
I guess we'll get out a little early.
Thanks for listening. If you have questions about the material presented in this podcast,
be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further with your peers in the storage developer
community. For additional information about the storage developer conference, visit www.storagedeveloper.org.