Storage Developer Conference - #117: Developments in LTO Tape Hardware and Software

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, Episode 117. Thanks for coming. My name is Takeshi Ishimoto of IBM. Today I'm going to talk about tape. So that's why I'm asking you if you joined the previous speech by Fujifilm. There are quite a lot of overlap, so you may see a very similar chart in my slide deck. Even though I'm part of IBM's storage development team,

Starting point is 00:01:09 I'm going to talk about the R activity in the SUNYATWIG. We have the SUNYATFS-LTFS-TWIG. I think if you have attended the opening session today, the executive talked about the R standard, linear tape file system format specification. So we will talk about the new specification 2.5 we released this year. And I'm with Dr. David Peace, who invented LTFS.

Starting point is 00:01:44 And we have been working for LTFS productization for last 10 years. So he will be talking about the new update on LTFS format. So as I said, and also in the title, I'm talking about tape. Tape is a very exotic topic. If you look at the agenda, there are NVMe, persistent memory.

Starting point is 00:02:17 Those are the things most popular. And they love speed, right? Performance is most critical. And if you have a budget affordable for buying all your data by using those storage, that's fine. But if you have hundreds of petabytes or exabytes or zettabytes, then your cost is limited.

Starting point is 00:02:43 That's where the tape is fit in. So historically, the tape was started very early in the 50s. It's like the size of a large refrigerator, and it was more than 60 years ago. Now, LTO8 in the middle is the most recent technology we have. And it's about the size of your palm size.

Starting point is 00:03:13 It's about four inch square. And that can house 12 terabytes. And also, the speed is growing. So it can reach to 360 megabytes per second with single tape drive and single tape cartridge. But if you look at the future, number might be different, but we have IBM research team prototype in the year 17 and proved that we can store more than 300 terabytes in the same form factor. And this will affect to the cost of storage and that's why IBM is developing this tape. But let me talk about the device today. So it's very obvious.

Starting point is 00:04:13 Linear tape file system. It's a tape file system for tape. Right? And it supports multiple technology. It's technology neutral, even though you have to have a tape drive. And there are currently three different technology available from various vendors.

Starting point is 00:04:37 Most popular one is the linear tape open. And also IBM and Alcoa has their own proprietary form factor with more enterprise features. Oh, let me go back. So this file system requires a multi-partition. You need to split the tape into two logical partitions so that you can store the files and for the first choice is a linear tape open LTO is the

Starting point is 00:05:14 joint redeveloped by the IBM HP and quantum and providing that technology to the using the same standard. So you can buy a tape from one vendor, and you can use that tape on the different vendors. It's about half inch wide. He already mentioned that it's about the length of one kilometer, or half mile, more than half mile. So you can go to Studium, come back, and go again and come back. It's very long.

Starting point is 00:05:54 And it's recording in the linear. That's why we had a linear at the beginning. It's not the helical scan method. You can see on the video tapes so that you cannot affect the quality of tape. First generation comes out in the year 2000. And we have been updating the technology every two, three years. Now we have L updating the technology every two three years now we have a LTO 8 generation as for the LTFS LTFS supports is supported from generation 5

Starting point is 00:06:37 in 2010 you already see this already. As I mentioned, we started the LTO in 2000, and LTO 8 was coming actually in 2017. And there are, after generation 12, that's promising that we will deliver higher capacity. There are two numbers. Actually, that 12 terabyte is the native capacity of LTO8. And with the completion capability in tape drive,

Starting point is 00:07:15 you can store 30 terabyte with a 2.5 completion ratio. And when the year comes, it will reach into 192 terabytes in the same form factor. And with compression, 480 terabytes in the same one tape. That's pretty amazing.

Starting point is 00:07:43 So it has has very historical achievement and also the future commitment by the vendors. If you look at the LTFS, it was available from 2010. Same time we have the LTO5 tape drive. And it was already adopted by various hardware vendors and also software companies. I'm showing the list of their names here. It's actually coming from the LTO consortium. You can see that IBM is listed as well as the other vendors.

Starting point is 00:08:35 I think Fujifilm is listed too. So you can see that it is already adopted. And to make this kind of adaption, SUNIA has been discussing this standard and make it available as two, make two specifications available. One is the LTFS file system format. The other one is the bulk transfer specification.

Starting point is 00:09:10 We work together with the cloud storage Twig to make the data portable from one cloud to another using tape. And we're talking about the detail of a format specification in soon. So here's this chronology of the format specification, starting from version 1.0. Actually, it was developed by IBM, and David was the lead designer of that software. And we donated that, IBM donated that standard to SNIR

Starting point is 00:09:58 and becoming the version 2.2 standard in 2013. Later, this version becoming ISO standard, so it's now available as this number. So just like the LTO makes the tape and tape drive compatible between the companies, LTFS is the design to make one data on tape accessible from different software.

Starting point is 00:10:35 And since then, we have been updating, adding more functionality in the file system. 2.3 was in 15 with additional support of more characters. And also, other features listed here. And this year, 19, we have version 2.5. This is going to be submitted to ISO, so it will become standard, international

Starting point is 00:11:07 standard. Okay. So, Y-CAPE, I think you also see this one. And the key benefit, as I said, if you think about the large amount of data and you want to store it for long years, it's not only the cost of media itself. Your data center space, electricity, those affecting the total cost of ownership. Right? And of course it's highly available. It's highly reliable.

Starting point is 00:11:50 And also the other interesting capability is portable. So you can ship one cartridge to the other location and the other party can read it using the same format. Little illustration about tape, it's very conceptual. Basically the tape is, you have two leaves, and motor is moving the tape, and there are heads moving on a y-axis. Even though it's half inch wide, it actually has more than 6,000 tracks. Can you imagine each track is how narrow it is?

Starting point is 00:12:39 And it requires highly, high precision control of tape movement and head movement in the tape drive so that we don't miss any data. And there are a lot of functionality in the tape drive, including the encryption, compression, and other functionality, other function in tape drive. As for the interface of tape drive, we have fiber channel interface, SAS interface, and also lock interface. It depends on the vendors. What's else? Just for the people, I'm not so sure if there's any HDD vendors here, but tape is basically the single recording.

Starting point is 00:13:41 So it's writing half the overlap so that we can write more data. recording. So it's writing hardly overlapped so that we can write more data. So that makes a little difficult to update in the middle. So basically

Starting point is 00:13:58 tape is append-only device. You can keep adding more data until you reformat the tape entirely and you cannot update in the middle. But I can talk about why we need a dual partition for file system. You can see this chart already, right? It's a really good question why this is affecting the cost, but it's an area of density trend of different storage device.

Starting point is 00:14:36 Blue one, red ones are the HDDs, they are saturated, and this line and green lines are the tape. Because the difference between red line and green lines are the tape. Because the difference between red line and green line make as a room to improve the capacity now. And also this blue line is in the research organization, did the demonstration. So this gap is actually telling you how soon you will get the same capacity in future. It's about 7 or 10 years

Starting point is 00:15:12 difference. So we are preparing for 10 years later so that we can get this capacity improvement. If the cost of tape is the same, this will dramatically reduce the cost of tape is the same this will dramatically reduce the cost of storage and there is a room okay so now that the LTFS format so format So, format specification defines how you, this compatible software, have to record the data on the tape. There are two things. One is how we describe the metadata of file, timestamp, and other attributes. And put one big XML.

Starting point is 00:16:13 And record the most recent update at the end. We have two partitions. One is the smaller one called index partition. The other one is data partition. It's about 3% of tape capacity, and the other 95 or 96% are used for data recording. And the index partition is basically have only one index and always overriding with the most recent index.

Starting point is 00:16:55 For the data partition is interleave of user data and index so that you can add more data to the tape. So here's more illustration to the tape. So here's more illustration about the format. So IP, as I said, index partition is overwrite-only mode, and the error data partition is append-only mode

Starting point is 00:17:15 until you reformat the tape completely. And we place the most recent index at the beginning of index partition so that at the time the tape drive mounts the tape, you can get the contents of this tape very easily instead of going through the tape. And the data partition has multiple index. It's like a snapshot.

Starting point is 00:17:45 So every snapshot, we generate the complete index. It's like a snapshot. So every snapshot, we generate the complete index and write at the end, at the time. And start writing the new files and then write another index. That's how the LTFS tape is formatted. Now, the key topic here is, I'm going to turn this mic to David about what are the changes we did in the latest release. Thank you.

Starting point is 00:18:32 Thank you. Can you all hear me? So I'm going to be talking about, as Ishimoto-san said, the latest changes to LTFS. Before I jump into this, Ishimoto-san has given a nice overview of LTFS. Does anybody have any general questions about LTFS before we jump into what's new? Okay, so a picture that Ishimoto-san showed a few slides back is a picture of how indexes or indices, we've had multiple discussions of which is the proper usage, and the dictionary says either works, are laid out on an LTFS tape.

Starting point is 00:19:20 As Ishimoto-san said, in the index partition, at the time a tape is unmounted, there is always the most recent index that has everything that's on the tape written in the index partition so that it's very quick to find at mount time. Because you think of this as a file system, just mounted and unmounted just like a disk or a CD or something would, and you want to be able to read what would be the inodes and such in a disk file system. And so in the case of LTFS, you read the index partition, you find the index, and you build your in-memory data structures that know where to find data on the tape. My original concept of LTFS was that a data partition would have nothing but data blocks in it, but it turned out that that was impractical. So in fact, as we record data in the data partition, we periodically, and Nishimoto-san's last slide showed this better than mine does,

Starting point is 00:20:17 we're writing data blocks, and then we periodically record an index and then data blocks. And the indexes are written in the data partition for two reasons. One is a recovery or a sync point. So as you're writing your files in the file system and you're writing along on the tape, and tape is append only, and it comes time for a file system sync, maybe the application does a sync call, or we have some policies in LTFS that you can set for

Starting point is 00:20:46 don't write more than this amount of data before a sync point, or don't go more than this amount of time. You can even set it at the close of every file. If you were a media and entertainment company who wrote multi-hundred gigabyte files, and every time you wrote one of these, you wanted to make sure that it was committed. You could have a policy that said, I want a sync point at every file close. So it depends on how you use it. But in any case, at various points, we write an index. Now, you say, well, you could go back and write the index on the index partition,

Starting point is 00:21:22 but that would have several bad effects. One is that you'd be rewinding the tape to the beginning, switching partitions and overwriting the index partition, and then having to seek back to where you were and continue, and that would be horribly inefficient. And the other is that you would actually eventually wear out that section of the tape. So that's not a good plan. So instead, we periodically write these indexes in the data partition,

Starting point is 00:21:50 and that has a couple of added benefits. If you happen to be writing data out here, and the tape drive, let's say, loses power due to a data center power outage, at the time you mount the tape later, the system will immediately recognize that it wasn't unmounted cleanly due to some things that we store in a small non-volatile memory, and it will go search backwards on the tape for a file mark and find an index and be able to recover to that last sync point. It also has the interesting capability of giving you a rollback point,

Starting point is 00:22:30 kind of like Time Machine on an Apple, where you can actually go back and say, well, I want to find the closest index to this date and time and go back and open the tape as of that index and recover an old version or a lost version of a file or that kind of thing. So it's just kind of an added benefit of having indexes in the

Starting point is 00:22:51 data partition because tape is append only. You never overwrite. You never delete. So that data is always still there. You can go back and find old data if you need to. Is that by design or is that by policy? That it's never overwrite? That's by tape architecture. find old data if you need to. That's by tape architecture. So what Ishimoto-san was explaining is that tape is written in a shingled fashion,

Starting point is 00:23:15 so you can't, I mean, you could back up to here and overwrite everything forward, and in fact we support that, but you can't just go and say, I'm going to go rewrite this block on that data file. That's because you'd be destroying data that follows. Are there still worm tapes? There are worm tapes, yes. Sorry, that's not the one I wanted to ask. You can only write once? That's right once. Write once, read many. Yes, there are worm tapes, yes. Sorry, that's not the one I wanted. You can only write once.

Starting point is 00:23:45 That's right, once. Write once, read many. Yes, there are worm tapes, yes. Okay. And actually, LTFS has to run in kind of a special mode with worm because it doesn't allow you to overwrite. So actually, you might have more than one index in the index partition in that case. You did say that the index partition is that case.

Starting point is 00:24:07 Yes. Data partition is append only. And except in the case of Worm, index partition is overwrite. Overwrite starting from not the volume labels, but just the actual index itself. We actually have standard Vol 1, old-fashioned 80-character IBM tape volume labels on here, if you can believe it.

Starting point is 00:24:34 Speaking of old-time tapes, I'll just comment that when I started in computing, tape was also half an inch wide, but instead of 6,000 tracks, it had nine. It's just an astounding number. In any case, the other thing about this indexes in the data partition and finding these earlier indexes is that we have this unbroken chain of index back pointers. So whenever we write an index, we write into it a back pointer to the prior index. Whenever we unmount a tape, we take this index,

Starting point is 00:25:09 we write it up, well, we write the last index here. We set its back pointer. We then write that index again in the index partition. We make a pointer here, and that's how indexes are chained. So this is all well and good good except that we have a problem. Tape capacities since LTO5

Starting point is 00:25:30 when LTFS was first released we've gone from about 3 terabytes I'm sorry 1.5 terabytes. That's uncompressed, though, right? So it was 3 terabytes compressed. I'm going to use compressed numbers because the tape drive compresses internally. Unless you turn it off specifically, you're going to get the 3 terabytes kind of number.

Starting point is 00:26:02 At some point along the compression ratio also changed, right? Right. We went from 2 to 2.5. We changed the compression algorithm and the, they don't call it a buffer size, but there's a term for it that I can't remember. In any case, we've gone to about an order of magnitude larger and we're looking at another order of magnitude over the next 10 years.

Starting point is 00:26:26 And so tape capacities are growing essentially exponentially. That increases the number of files on a tape, thus the index size, because every file takes a certain amount of space in the index. Plus, in many cases, people are actually, oddly enough,

Starting point is 00:26:42 using smaller rather than larger files, which also increases the amount of space that the index tapes. So we're getting this corresponding increase in the overhead of recording this index in the data partition. It takes time to go through and build the index into an XML structure, because obviously it's not stored as XML in memory. It takes time to write that on the tape, and of course it takes space on the tape. So this has gotten to the point now where that's become an issue in some installations,

Starting point is 00:27:10 and so we set out to solve that problem. And the way we decided to solve it is probably kind of obvious. We decided that, well, rather than writing a full index every time we write an index on the tape, we'll write what we've decided to call an incremental index, which is just the incremental changes from the last index. So with version 2.5 of LTFS, we now have these incremental indexes that record only the changes to the file system since the prior index.

Starting point is 00:27:39 That prior index could be a full index or it could be an incremental index, but whatever's changed since the last index gets written in the next incremental index. So there's some rules about incremental indexes. First of all, they only appear in a data partition, and they only get written as a result of one of these sync points. They never get written at unmount time. So whenever you unmount the tape, you're writing a full index so that you end

Starting point is 00:28:08 up with the same index at the end of the data and index partition, just like always. And of course, the index partition can only contain a full index. But incremental indexes can be interspersed with full indexes in the data partition. And again, they would only be written as a result of this kind of a sync point function, right? Calling sync or having the policy for when you do synchronization trigger an index in the data partition. So we expect that full indexes would be written periodically, and we're saying kind of as a rule of thumb, we would think that every 5 to 10 indexes,

Starting point is 00:28:51 you'd probably want to write a full index, because if you needed to recover the tape for some reason, you wouldn't want to write one full index and then have the rest of your tape be full of incremental indexes. You'd have to go back to the beginning of the tape and roll forward. It's kind of like an iframe, an MPEG, and then the other frames that follow it. So periodically, we expect that you'll still write a full index, but maybe only every five to ten or so indices on the tape would be a full index. And then from that point forward, you can have incremental indexes. Although an implementation can always choose to write a full index, let's suppose,

Starting point is 00:29:30 for instance, that there have been a huge number of changes to the file system since the last incremental index, and the implementation says, oh, I've got so many changes that it doesn't make sense to write an incremental. I'll just write a full. Or maybe that implementation, as mine did, keeps a log of changes, and finally you say, my log's too big, throw it away, and mark it as a full index next time. So anyway, those are some rules about incremental indexes. As I talked about before,

Starting point is 00:30:04 we have this chain of back pointers. We kind of saw this picture basically before. But now we have two kinds of indexes, so we actually need to have two kinds of back pointers. And this is important for backwards compatibility. And my next slide is going to be all about backwards compatibility because that was a big issue for us, of course. So we now have two kinds of indexes. We have full indexes and incremental indexes. And you can see a picture here of a tape that is in a consistent, what we call a consistent

Starting point is 00:30:36 state, meaning it's ready to be unmounted. And so we have here a full index, of course, in the index partition, pointing to a full index at the end of the data partition, but a couple of incremental indexes and then a full index. By the way, when the tape gets formatted, empty full indices are written at the beginning of each partition. So that's kind of the starting state of the tape. You do a file system create, of course. So now we have these two different back pointers.

Starting point is 00:31:07 We maintain what I have in red here, the red back pointers. So you can always follow the full index chain from this full index to this full index, and if there were more, continuing back. But now we also have an incremental chain, which I've shown in blue. So if there is a prior incremental index, there will also be a pointer backwards to the prior incremental index until we don't find one, and then we use the back pointer. So a version 2.5 and later implementation would always look for a blue pointer, and then if they don't find one, use the red pointer. If you'll excuse my using the colors instead of the terminology. But it'll always follow the incremental chain back, unless there was some reason it wanted to go all the way back to the full index

Starting point is 00:31:58 and look forward for some reason. Something we didn't talk about is that these file marks offset indexes in the partitions. They're not used to separate data blocks and files. They're just all interleaved as they would be on a disk without any file marks. Old fashioned tapes used to use file marks to separate files. We don't do that. So you could actually always get the tape drive to scan forward to the next file mark and find the next index if you wanted to do that kind of processing. But as I mentioned earlier, one of the really important considerations in this design,

Starting point is 00:32:37 because this was kind of a major overhaul, actually, of our format was backwards compatibility. We did not want to create a format of tape that could no longer be used in earlier versions of LTFS. We wanted to create tapes that, when they were complete, could be sent to any other LTFS installation and still read. So what we call a consistent volume is a volume that's ready for unmount. It's had its final index written in the data partition

Starting point is 00:33:10 and copied to the index partition, and the chain's been built so that it's ready to be unmounted and read. So consistent volumes always end with a full index. Therefore, they can be mounted and used without any difference by earlier versions of LTFS. The only thing an earlier version would do is if you were saying, I want to roll back to an earlier version of an index on a tape, an earlier implementation of LTFS would only see the full

Starting point is 00:33:39 indexes and would miss the incrementals, but it would still operate perfectly correctly. What about inconsistent volumes? What about a volume where you were writing incremental indices and power dropped? Well, it turns out that inconsistent volumes are recovered using a utility we call ours LTFS-CK, LTFS-CHECK. An older version of the LTFS-CHEK utility would fail horribly, probably, if it encountered an incremental index where it expected to find a full index

Starting point is 00:34:15 because it wouldn't know what to get. An incremental index would be the same as data. It would just look like data and say, well, I looked at both sides of the tape mark. I couldn't find an index and throw up its hands. However, what's the likelihood you're going to take a tape that's been not correctly unmounted on a 2.5 system and decide to send it to somebody who's running 2.3? Pretty unlikely. If you did work in an environment where for some reason you had two different versions of LTFS running, what you'd want to do is take the newer LTFS-CK version and run it at the older installation and use it for your recovery.

Starting point is 00:34:52 And then you could still recover newer versions of the tape took to do this in terms of LTFS tags and that kind of thing. But I think that rather than doing gory details, we have less than 10 minutes, I think, before we're around 10 minutes. I'll stop and ask questions. And if your questions involve the gory details, then we can go to those slides. So, yes? So that's an interesting question. The tape drive, by its very nature, does shingled writing. So in Ishimoto-san's picture, it showed, but he didn't really go into the detail of a rap.

Starting point is 00:35:48 A rap is, the way the tape drive writes is called serpentine recording. Let's go one more. No, I'm sorry. Somewhere in there you had raps, and I didn't. In any case, serpentine recording means you write down the tape, you move over a little bit, you write back, you move down, you write, you write back. Serpentine, this is... Yeah, jeez. 208 wraps, and so it turns at both ends, it does this 208 times.

Starting point is 00:36:23 As we're writing one of these wraps, we're overwriting the next pair of wraps, interestingly. So now if you would go forward, Ishimoto-san. Thank you. So basically, we have two wraps here. We have two more wraps that are called a guard band, and then we have all these other wraps. So although we are doing shingled writing here because of this guard band, it isn't actually overwriting any of our data and that's how we get away with dual partition tapes.

Starting point is 00:36:56 Thanks. Any other questions? Gory details? Anything like that? Yeah. I haven't heard anything about... Oh, oh, oh, I'm sorry. That's a hardware statement.

Starting point is 00:37:22 This has nothing to do... Oh, I'm sorry. That's a hardware statement. This has nothing to do... Can you repeat it? Oh, I'm sorry. The question was about backwards compatibility of generations of LTO. So LTO... Tell me if I get this wrong, but I think I've got it right. The LTO standard says... So remember that as we're going to different generations of LTO,

Starting point is 00:37:46 the tracks are getting closer and closer together, smaller, and therefore the read-write heads are changing size. So the rule for LTO is that we'll always be able to read hardware. The hardware will always be able to read one or two back.

Starting point is 00:38:02 Two back and one forward. That's up to you. So 828, we can read and write in the previous situation. One or two back? Two back and one forward? Seven. Right. But not six. And that's a hardware statement. That has nothing to do with this backwards compatibility in the format that I've just talked about. That's because the hardware changes to the point where it's impossible to read those wider tracks any longer. So the answer is no, but that's a great question. so that you can use the old stuff to create a new page for the new version.

Starting point is 00:38:45 And later you may decide to copy the old generation page, the newer generation, by using tape to tape drives. That's how you can migrate to the new generation. Yeah. But the use of... My background is, we have a site, the older library, 5 and 6 and 9. So my report is from 8 to 8. And I also have to have 7. It's a lot of straight current.

Starting point is 00:39:17 Yeah. Yeah. And basically, I think Haru kind of alluded to this as well when he talked about reclamation, and he said, well, you would do reclamation when you did this kind of migration to the next generation, combining two tapes into one and reclaiming dead space at the same time. That was kind of the same idea.

Starting point is 00:39:45 Yes? Is the same idea. Yes? Is the latest generation currently available? Currently is 8. Yeah, and that came out last year. So we would expect 9 to come out maybe next year. It's usually every 2 to 3 years. So 17, 18, 19, 20. You might expect something.

Starting point is 00:40:03 I have no idea. I actually am not involved in the hardware, 18, 19, 20, you might expect something. I have no idea. I actually am not involved in the hardware, so anything I say is non-binding. Excuse me? Yeah, it could be, yeah. Any other questions? Okay, well, thank you very much, and enjoy the rest of your afternoon.

Starting point is 00:40:31 I guess we'll get out a little early. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the storage developer conference, visit www.storagedeveloper.org.

Storage Developer Conference - #117: Developments in LTO Tape Hardware and Software

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.