Grey Beards on Systems - 45: Greybeards talk desktop cloud backup/storage & disk reliability with Andy Klein, Director Marketing, Backblaze

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to the next episode of Graybeards on Storage, a monthly podcast show where we get Graybeards storage and system bloggers to talk with storage and system vendors to discuss upcoming products, technologies, and trends affecting the data center today. This is our 45th episode of Greybeards on Storage, which was recorded on May 2, 2017. We have with us here today Andy Klein, Director of Product Marketing at Backblaze. So Andy, why don't you tell us a little bit about yourself and your company? Hey, how are you doing, Howard and Ray? Thanks for inviting me today. Andy Klein, I've

Starting point is 00:00:46 been doing this for I don't know how many years between storage and computer security. Too many, enough to have a gray beard. How's that? Good for you. We are always happy to have a new member to the Graybeards Club. Good to have you, Andy. Well, thanks. Thanks. And so a little bit about Backblaze. We just actually celebrated our 10th year of service. Well, that's pretty good. We've been in business for about 10 years now. Our anniversary, of all things, is on 4-20. Speaking from Colorado.

Starting point is 00:01:18 But of the group, only Ray is in Colorado. Very good. So we started out doing cloud backup, online backup, as it was known back in the day of PCs and Macs. Started, I think, with Macs, did PCs Next, and we did that for a number of years. Charged a good reasonable rate for that, $50 a year for unlimited storage storage and that was great. People really liked it because what it did is it just backed up everything on their computer, their data on their computer automatically. They didn't have to set anything. They didn't have to drag and drop anything. They didn't have to tell us what to back up. We just went and found it and backed it up. And that was, it's been a great business and still is to this day. The interesting part about that, though, is we had a lot of storage.

Starting point is 00:02:06 Over the years, we had built our own storage systems. We call them storage pods. And we actually open sourced the design of those things. Oh, my goodness. Back in 2009 now. Yeah. I remember back in 2009, our friend Robin Harris wrote about your storage pods. Yeah.

Starting point is 00:02:25 And I looked at it, and I saw there were two power supplies, but each one was a single point of failure. Yes. And the drives were standing on their backs, held in by well-engineered rubber band. Well-engineered, that's correct. And we've gotten a little better and i looked at it i said what a piece of crap that was back in 2009 right howard well a a it was back in 2009 but b the point is that that level of excuse crap, is perfectly fine if your software knows that it's running on something of that quality. Yeah, that's actually a really good point. I mean, a lot of folks

Starting point is 00:03:12 miss that. I mean, we'll probably talk, we'll talk later about drives and drive stats and all of that fun stuff. But a lot of point, people miss that. Now, over the years, obviously, we've replaced the rubber bands with brackets and so on and so forth. But you're right about the software. The software knows that failure is a possibility and just lives with that. The other part that you didn't mention was that we not only did all of that, but we used consumer drives. So the ones that you put in your tower sitting at home or on your external drive or whatever, we used those versus enterprise drives. And that really got a lot of people interested in what we were trying to do.

Starting point is 00:03:54 One of the things we found out was that for us, anyway, consumer drives just made much more sense. They were much more economical. We could control, like I said, the failure of all of that. And it was just a whole lot more less expensive and allowed us to grow our storage farm and still make money doing this. You know, that's where we started back in 2009. We've had, I think, six or seven different iterations now where we've made continual improvements. And each time we publish the specs out there. And I have looked at the later ones and would not use the term crap to describe them. I appreciate that. They're certainly... They are minimalist.

Starting point is 00:04:35 Minimalist. I think that's pretty fun. Pretty good. You know, it was interesting. I just got back from a show in Las Vegas called NAB, National Association of Broadcasters, right? And I ran into a vendor there who was, among other things, was selling storage systems. And lo and behold, there was a storage system there that was 60 drives in a chassis, which our new ones are, standing on end, just like that, all plugged into backplanes, and with a back end that held all of the pieces, and then had a single power supply back there. Oh, God. And I went, my goodness, that looks familiar. And the guy says, yeah, yeah, we borrowed the idea from this company named Backblaze.

Starting point is 00:05:24 Oh, you're kidding me. And I flipped my badge around, and we had a good laugh. But did you say, and we'd like it back now? No, no, that's the deal of open source. It's actually been great over the years because lots of folks gave us lots of great ideas. Even initially, even those ideas, the rubber bands, by the way, came from somebody as we started to put this model out there. Somebody said, you know, you really need to do this for vibrations.

Starting point is 00:05:57 We've had lots of suggestions from the community over the years on how to make it better. They're always receptive when we publish the new specifications. We publish a bill of materials and everything. And we basically say, here, you can go build it yourself. And a handful of companies over the years, one that comes to mind is 45 Drives. They've actually taken the design and made it a little more sturdy, I guess is the best way to put it. And they've been doing quite well selling what they call their Storinator product out to folks.

Starting point is 00:06:35 So it's great. It's always great to see a good idea taken to the next level. But what it allowed us to do, like I said,, is put, you know, store a lot of data, store it inexpensively. And about, oh, maybe three or so years ago, we started saying, well, maybe we could offer that up to, you know, to folks to use through some type of a CLI, command line interface, or an API, or even direct web access. And that's where the most current product that we have available, it's cloud storage. It's called B2, right? And that's what that does. It, you know, somebody

Starting point is 00:07:11 like a CloudBerry or a Synology or some of the vendors like that, Retrospect, just to bring up somebody from the past there, have all created interfaces to it. So you just use their product and then it stores the data in B2. The nice part about that, of course, is that they're paying a whole lot less for storage then, whether it's them directly or whether it's their consumers, versus connecting it up to one of our competitors out there these days. So that's what we've been doing. That's what Backplace is all about.

Starting point is 00:07:43 It's hard to talk about that kind of product without mentioning one of your competitors. So B2 is an object store and it's API compatible with the object store from the vendor named after a river in South America? It is not API compatible. Actually, ours is a little bit simpler because you don't have to do a whole bunch of negotiation up front to get to the device that you want to store your data on. So our API is actually a little bit more straightforward. If you've coded one, you can certainly code in ours. But it is not like just swap the code, you know, swap the AZ out for B2 and get everything working.

Starting point is 00:08:30 Okay, so it's your own API. It's our own API. That's correct. So how many storage pods do you currently run? I guess I have no idea. Do you have multiple data centers as well? Yeah, we have a couple of different data centers now. One which has been old reliable out in Rancho Cordova, which is just outside of Sacramento, California.

Starting point is 00:08:52 And then more recently, a data center in Phoenix that we're just in the process of opening and getting all that announced and everything like that. Storage pods. So we probably, my goodness, we lost, it's certainly north of a couple of thousand these days. The biggest ones are now storing about 480 terabytes of data or up to about that much because you can get 68 terabyte drives in there. So you can do the math to store. We probably have north of 300 petabytes of data that we're managing these days. So we're getting pretty good at this. I think after 10 years and that kind of data, we're doing okay. But that's what we do for a living, is store people's data.

Starting point is 00:09:45 Your point earlier about the software is exactly right, and it's what most people miss. If you're in this business, you cannot be afraid of a drive failing. Now, at your scale, it's not if it's going to happen. It's just how frequently. Yeah, that's exactly yeah that's exactly right and so the software works with that we originally started out with a raid six configuration and that was uh that was pretty good and um and that worked really well uh and then when we introduced what we call backblaze vaults vaults are just a collection of 20 storage pods, with each storage pod basically being 1 20th of where the data goes.

Starting point is 00:10:30 So a piece of data comes in and gets spread across 20 different devices. Then we wrote our own encoding algorithms, Reed-Solomon type things, to do the management of all of that. So set up in a 17 data bits and three parity bit kind of thing, spread across those 20 devices. Got it. That parity, of course, rotates around 20 pods and all that stuff. So you can actually handle a three-pod failure, I guess, in this scenario and still continue to operate without data loss, which is quite impressive. It's really nice because what we do is we put each pod in a different rack and then we spread them across the data center.

Starting point is 00:11:16 There's actually no real issues with having them across. When we get big enough, you could put them in 20 different data centers if you wanted to. That's not – the software doesn't care. Right. And the workload you're dealing with, either the backup or the object storage, is much – it's large streams. Yes. And that makes you much less latency sensitive and makes life much simpler. I want to get back to this $50 a year for backing up everything on my computer.

Starting point is 00:11:49 Is that today's price? Yeah. Yeah. Today's price. Yep. Oh, my God. I'm thinking I might have to move over here. You know, for a PC or a Mac, obviously, if you've got a Linux box, then the alternative

Starting point is 00:12:03 for those folks who want to do that or have servers and stuff is to use the B2 software with one of our, you know, our integration partners, like I mentioned earlier, and just use them as a front end. So buy the package that you like. You can, in fact, do it with a drag and drop interface as well as a web UI, which is kind of, which is nice. I actually like that a lot. But for backup and things like that and making it automatic, it's just really easy to use it that way. So if you had a Linux box, that works out great. But for a PC or a Mac, what we find is it's a great set and forget kind of thing. And you get your data back when you need it back. That's the best part.

Starting point is 00:12:45 I mean, over the years, we've restored something north of 22 billion files for people. I'm sorry, 22 billion? 22 billion, yeah. And only half of them were cat pictures. That's exactly right. That's on our Cat Blaze site. That's where we store all the cat pictures. Yeah, exactly right.

Starting point is 00:13:12 Backblaze is all about that. And, you know, like I said, over the years, we've always tried to be an open company. And it started with posting this storage pod blog post and basically saying, you know, this is what we built and how we built it. Go ahead, build your own. And, you know, it started there. But we've been, we went through a process where at one point early on, somebody had tried to acquire us. And we detailed that process without naming names. But just to explain to people what that's like to go through, we explained at the process, we took a little bit of funding. We've only been, we've received over the years about $5 million in funding. Yeah, that's good. That's

Starting point is 00:13:59 great. Talk about lean startup. My God. And so we detailed that process of how we went through and we described it to people. You know, we've done that. There was back in 2012, there was a Thailand drive crisis, 2011, 2012, where hard drives suddenly got really hard to get. We went through a process we called at the time drive farming. I read that. I was pretty impressed. You sent everybody out to go out and buy them. That's exactly right.

Starting point is 00:14:29 External drives, and you popped them open, or you plugged them into USB? I have no idea. Yeah, we did. What you'd find is that if you opened up the shell of an external drive, the hard drive inside was the same as if you just bought the naked internal. But we couldn't get internals through the normal channels. Everything had just dried up, right? And so we did exactly that.

Starting point is 00:14:53 Like you said, we went out to the retail channel. Employees were buying hard drives. And then various places, you know, Costco and such, started limiting the purchase of them to two drives each. And so we suddenly reached out to all of our friends and family and said, would you please go down to your local Costco and buy two drives? And started smurfing hard drives like pseudo. And it was enough to keep us in business, which is, you know, that's what you got to do. You know, and we didn't want to give up anything else.

Starting point is 00:15:31 We didn't want to, you know, have to cut back on customers or put a limit on storage or anything like that. So, you know, we just did that. And then we detailed that process and we told people about it, said this is what it was. I remember that. And I've got a couple of questions for you about that process. Sure, go ahead. Because that was about the same time that a couple of the drive vendors were starting to use SMR drives in their USB devices, figuring the USB bottleneck was so bad that if the drive was slow, it didn't matter.

Starting point is 00:16:07 Right. Did you see that as you were smurfing drives? And does SMR create a problem for your back end? That's a really great question. We suspected that SMR wasn't going to be a good fit for us. And we, as a matter of fact, i'll talk to this in the middle of the middle of the month here it's it's may i think may gonna say may 9th uh we'll do our next drive stats blog post but in the interim one of the things we're going to talk about in there is their experience with smr drive so i'll share that with you right now okay i just realized we should define it so oh yeah yeah uh shingled magnetic resonance or something recording thank you basically it's it's the idea

Starting point is 00:16:53 of overlapping tracks and so it's a like in dimension um on a single platter a single side of a platter actually and uh and so what happens is is writing is pretty efficient okay you write and then you write over at the top and then you write over the top and there's overlapping but everything works fine because there's plenty of room there well as long as you're writing sequentially writing sequentially that's right and for us what we did is we did an experiment here in what we call a mini vault, which in our case, instead of 20 storage pods, is six. And so we did that in our Backblaze Labs groups. And they wrote just like that.

Starting point is 00:17:34 And the writing was pretty good. It wasn't stellar, by the way. It wasn't as fast as you would think it could be. But it was pretty good. And then we went through the process of deleting files and trying to reuse the space. And when you go to delete something, you know, the standard delete process is really easy. Just forget about it, right? But then when you want to recover the space, what you actually have to do is lift up everything that's on top of it, clean that space

Starting point is 00:18:04 up, write what you're going to put in there, and then put everything back down on top. Yeah, for SMR drives. Basically, the SMR drives, there's some set of tracks that makes up a zone, and you have to rewrite the whole zone. You can't just rewrite pieces of it. There's more to it than that, Howard, but yeah, I agree. Yeah, that's a good simple... Yeah, oversimplified, but yeah.

Starting point is 00:18:32 But for us, what it meant was it was like a five or a six to one hit in performance on that second write, effectively, that delete and write again. So drives that were performing reasonably well, suddenly just became bottlenecks, or would have become bottlenecks as we were monitoring them. And our model, you know, is we write stuff and then people either, you know, maybe somebody decides to leave us as a customer or whatever the case may be, and we reclaim that space or somebody deletes some files and we reclaim that space. And it may take a few months, but when we go to reclaim that space, we can't have a bunch of drives doing that.

Starting point is 00:19:12 For us, it wasn't a good fit. I can see the applications. One of the ones they talk about is video recording or archiving, where you're going to record something once. Those are solid applications for SMR, but in our particular instance, it just wasn't going to work. So we backed away from ordering those in any quantity. And I think we still got whatever six times 20, 120 of them out there, wrong math, six times 45, and out there that we use for other purposes. But for us, it wasn't a good fit.

Starting point is 00:19:46 And you mentioned you use consumer drives rather than enterprise class drives. Could you characterize the differences between them and what you found out about them? Yeah, sure. From most of our experiences with consumer drives, right, and we have a handful of enterprise drives that we use for some other purposes and different systems. And what we found is the consumer drives perform for us about as well as the enterprise ones. The enterprise drives are sometimes a little faster, usually are a little faster in recording and things of that nature.

Starting point is 00:20:17 They also come with longer warranties. They come with five-year warranties typically, whereas the consumer ones come with anything from one to three years, depending on, I'm not sure, the day of the week, the size of the drive, I don't know. Which vendor is trying to advertise a longer warranty as their advantage, which week? You know, so, but for us, it's not, warranty is not an interesting thing unless something crashes right out of the box, right? Because the expense of keeping track of a drive that's under warranty after two and a half years, pulling it out, putting it back in a box, right? Because the expense of keeping track of a drive that's under warranty after two and a half years, pulling it out, putting it back in a box, sending it off, and just to save what is effectively potentially going to be, all you're going to get back is a

Starting point is 00:20:55 remanufactured drive. You're not going to get a, never going to get back a new drive. It almost doesn't make sense for us to worry about warranty at that point. But we've seen consumer drives, which, you know, let's say if three years is the warranty, that you would expect some fall off after two or three years after that warranty period. We've seen some consumer drives go for as long as six or seven years and have failure rates less than 2%, you know, per annualized failure rate. We saw some Western Digital one terabyte drives were six plus years old before we finally took them out so we could upgrade them because,

Starting point is 00:21:31 you know, I could replace them with sixes or eights and get a whole lot more space. So those were the drives you really did send back to the farm? So we actually, we took those and they were recycled. I drives we pulled out, yes, exactly, sent back to the farm. I'm a little slow. That's the other thing we've been doing that we've been open about is all of the drive stats. I can quote them here because i i am eminently grateful for that because you know for years you know all we had to go by was the manufacturer's stated mtbf which became a such a ridiculous number i was glad some of the vendors changed afrs because when you say it's got a million hours mtbf some people believe

Starting point is 00:22:23 it should run for a million hours. Yeah. So what's the difference here between AFR and meantime between failure rate? Annual failure rate, right? Yeah. Yeah. Yeah, it's just math. I know it's just math.

Starting point is 00:22:44 So annual failure rate is the percent of the number of drives that would likely fail in one year of operation? A year of operation is 24 by 7 by 52, something like that? Yes. Okay. Now, we do a slightly different – I don't want to say slightly different calculation. We call it an annualized failure rate because we can do failure rates even for over three months. And what we do is we count drive hours, not drives themselves. And we just summarize.

Starting point is 00:23:18 So if there's 3,000 drives, they're all going to have different numbers of hours they were in operation. So we just add up all of those hours and then take the number of failures of that particular group over that period of time. We're evaluating and that's how we get our annualized failure rate. So it allows us to look at something, whether it's three months or one year, 18 months. And we've just, this in April, we crossed over our drive stats for four years. Probably very typical to a lot of places, we collected drive stats. Drive stats are nothing more than the smart stats that are done with the SmartMon tools. Once a day, we run it against our entire farm, and we produce a gigantic log file of all of the smart stats for each of those, and any other related information we think is interesting and we dump it into a log file we were doing that since the beginning because it was a great tool but after 30 days

Starting point is 00:24:11 we were cycling that data out because you know what it was a lot of data so about four years ago this is the internet era and the long tail and old data is all valuable well Well, thank you. Yeah. So back in April of 2013, right? Yeah. And we started saving it. And that became the genesis of the data that we use for the drive stats reports that we do each quarter and report on, you know, and we try to, we do exactly that. Here's the failure. Here's a, here's a model. Here's the failure rate for that model over a given period of time. What are our observations about that? We took a little heat at the beginning of it when we first started doing it because we got a little marketing aggressive and started saying things like, hey, what's the most reliable hard drive out there? Some people took exception to that and they said, well, it's the most reliable hard drive in your data center, but it doesn't necessarily apply anywhere else.

Starting point is 00:25:08 One of the things I like about your stats is I know yours is not the easiest environment in the world for a hard drive to live in. Yeah, it's not bad. The data center is not a bad place. I mean, it's kept in a nice – But I also know from experience from talking to some vendors that the more expensive mechanical engineering does pay off degree questionable. Yep. And leaving drives sitting on their tails means there's going to be a bit more vibration because there's a rigid connection between those drives. So if a given model has a 4% annualized failure rate in your data center, I'm likely to believe that that's a really good test, that it wasn't under perfect conditions. No, it isn't under perfect conditions, but it's a little bit better than stashing it in a tower and then dropping the tower under your desk and then letting cockroaches crawl through it.

Starting point is 00:26:12 Yes, but we are data center guys. Yeah, yeah, yeah. And we have big cockroaches. We do have certain minimum standards. Yeah, yeah. The cockroaches have to be employees then. Okay. Or less. They have to be employees then, okay. They have to be union members. Union members, there you go.

Starting point is 00:26:36 Let's not go there. You know, so we do the drive stats and, you know, like I said, we report on the data themselves and compare them and contrast them and do a little analysis on them and, you know, just try to give people the information that's in there, you know, that we've seen. Your point is well taken, which is there's not a lot of sources outside of the manufacturer's specs. There are occasionally with some sites, you know, Tom's Hardware and a few others like that, that tried to break them down a little bit and put them... Very small sample sizes. Yeah. Yeah.

Starting point is 00:27:07 But it's something. It's some information. And so, you know, you always appreciate that. And all we hope is that people look at the data and they take it in, you know, with all the rest of the information that they might want to glean and move forward. You know, the other reality too is we buy drives based primarily on price. So people ask us all of the time, why don't you do this kind of a drive? Why don't you do that kind of a drive? Because it's too darn expensive. Right. Because you're buying hundreds at a time,

Starting point is 00:27:39 if not thousands, and 10 bucks a piece adds up. Well, that's exactly right. You know, and when we do the math, we, you know, we have wonderful little formulas that we do that make that decision for us. You know, we were, you know, a good example was we really like, you know, the drives that perform quite exceptionally well were the HGST or the Hitachi drives, as they were known at the time, but they were more expensive, you know, than, for example, look, a competitive CEA drive or a Western Digital drive. And so was it worth paying that extra $10? To your point, it adds up. And, you know, it depends on the failure rates we were seeing and how much time it took for somebody to replace a drive and so on and so forth.

Starting point is 00:28:26 It was worth it for a while there to spend that little bit of extra, little bit of extra on those drives. Now HGST is only making enterprise drives, and they're reasonably expensive. Okay, they're quite expensive. So we work with the other vendors that are out there who, to your point, will sell us several thousand drives at a really good price, and we move forward. And then the software, as we pointed out earlier, is where your data is going to be spread out over, in this case, 20 things. And the system is meant when a single drive fails in a unit, that's like a leaf falling off a tree. Certainly, we want to replace it as quickly as possible, but at no point, just because the drive failed, is somebody's data in danger. You know, and then we have a whole set of protocols that we go through.

Starting point is 00:29:20 So, for example, a second drive fails, and so now we know, you know, we have to get down to three. So now what do we do? Well, as soon as we do that, the drive, the whole setup goes into a read-only mode. And we start to figure out, you know, if we drop a third drive, you know, we have to get the data off the other 17. So then we get into that whole process. I mean, so we have a whole set of protocols that our ops team goes through. A lot of it's automated as well that just makes sure that, you know, at the end of the day, you don't lose customers' data because that's, you know, that's what we're here for. That's what we do. And, you know, we've always been that way.

Starting point is 00:29:53 I mean, we've always been open about what we're doing. You know, we try to tell people what's going on inside the company. It's always been that way from the CEO on down. And that sometimes is hard because you make some really hard decisions about, are we willing to go talk about this? I know I've had conversations with a drive vendor who shall remain nameless, where I asked if I could talk to somebody who could explain to me what the difference was between that enterprise and desktop one terabyte drive. And the response I got was, nobody who understands that has been press trained yeah yeah okay but they don't want them either to be press trained right

Starting point is 00:30:36 well yes that's kind of like we arranged so that we could say no. I've been in larger companies, certainly beyond Backblaze, and I understand that mentality. I'm not saying I'm a fan of it, and I hope we hold off from doing that kind of stuff for as long as possible. We greatly appreciate the fact that you're yes unless there's a reason to say no,

Starting point is 00:31:04 not no unless there's a reason to say no not no unless there's a reason to say yes yeah yeah that's that's a that's a good way of putting it i mean we let our uh we let our cto one of our co-founders brian brian wilson go out and and comment on uh reddit blog posts and so on uh reddit posts and things of that nature so we we know we're walking a very thin line with that one. Because Brian, and a lot of this comes from him as well as the rest of the founders, but he's one of those people which is we should just tell everybody everything. We're going to do a good job. We're going to try to treat our customers right.

Starting point is 00:31:42 And if somebody wants to give us money for that, that's great. That's the kind of money we want. Something that's surprising me is you guys have moved to 8-terabyte drives. These are 8-terabyte consumer drives. Is that the sweet spot at this point from your perspective? It is. So one of the things we saw when we first got into the business a long time ago, it was like 11. It might even have been higher than that since a gigabyte or something like that for the the early one terabyte drives, one and a halves. That number kept coming down. So, you know, the threes were a little less expensive and then the fours and so on.

Starting point is 00:32:13 And then we had the whole Thailand drive crisis and that messed up the curve for a while. But then it started to look like it was coming back in where the price of drives was literally going to go to zero at some point, right? We knew that wasn't going to be the case, but we weren't sure where the flat, it was going to get flat. Eventually, the price per drive remains constant and the capacity just increases over time. Yeah, and we're seeing that now. We're seeing it get down to roughly two and a half cents a gigabyte, and then a model changes. So they don't even go up in size in a space.

Starting point is 00:32:49 They just change a model, and it bumps back up to about four, and then it floats down about two and a half, and then maybe there's another larger size, and that pushes it back up to five, and then it goes through that process again. So it looks like right now the floor is someplace between two and two and a half cents a gigabyte is what we're seeing. But eight terabyte ones, to answer the question, is right where that sweet spot is. So we're paying something between two and a half and three cents a gigabyte by the time you do the math and all that and buy in quantity. You can't go down to

Starting point is 00:33:26 your local Best Buy and purchase it for that number. But you can get close, interestingly enough. I mean, you can go down there and get an eight terabyte drive and you might be able to pick one up on sale for $2.69 or something like that. And that's pretty darn close to three cents. So even the consumer channel is pretty inexpensive. So yes, 8 is the current size. We're looking at some 10s. I can't remember if we have them on order or not. The problem is they just haven't come down yet at all.

Starting point is 00:34:00 They're still up in the $400-plus kind of range. Didn't somebody just announce a 12-terabyte drive? It's a helium drive. I guess it's HGST. It must be Enterprise, huh? Yeah. Yeah, chances are. And, you know, one of the things that's interesting is, and we start to think about a little bit too, is do consumers actually need a 10 terabyte drive?

Starting point is 00:34:26 Yeah. It's an interesting question. Yeah, you do reach the point of diminishing returns. And so you're afraid that the larger drives will just be enterprise products and you'll lose the price curve? I suppose that's possible. I'd like to tell you that all of the drive manufacturers keep us up to date on their product plans and roadmaps, but that doesn't quite happen.

Starting point is 00:34:52 We think about that from time to time because we do work with consumer drives. And is there a place where they suddenly say, you know, for a while we're just going to stop. We're just going to stop at six or eight or five or whatever the number is going to be. And then at some point, we'll probably make some bigger drives, but maybe by then it'll be SSD time, or maybe it'll be, you know, maybe there'll be 16 terabyte drives out there. And the technology allows us to make that for a penny and a half so we can sell them to consumer. So, you know, one of the questions I would have, you think Google and those other big guys are driving, are using consumer drives or enterprise drives in their cloud capabilities? It seems like to me they're using consumer drives as well, right?

Starting point is 00:35:32 You know, I don't know the answer to that. I wish I did. I could – I've heard rumors on both sides. Okay. Yeah, yeah. The best story I've heard is that it's the consumer drive in that the enterprise nearline drives have slightly stronger magnets in the positioner, so they settle faster. the web scale guys burn their own firmware that's kind of halfway between consumer and near line in terms of things like how fast it times out you know i i suppose you know i it was big as we are i like to you know at 300 petabytes or whatever the case may be the googles of the world are certainly just a little bit bigger than that. And they can probably dictate almost what should be built.

Starting point is 00:36:27 Now, there have been, I did see a study, not a study, well, it was a paper, it was a positioning paper more than anything that Google put out, maybe it's a year ago now. Yeah, we talked about that, yeah. Saying, you know, this is what you should be building, drive manufacturers. So I don't know if that's a, you know, left hand doesn't know what the right hand's doing, or if, you know, they are in fact, you know, just having to, you know, take whatever the drive manufacturers give them. My friends at the drive manufacturers tell me that

Starting point is 00:37:01 the big five have more influence than just taking what they're given. Ah, it makes sense. I mean, you know, it does certainly make sense. And then, you know, all of our drives to date are three and a half inch drives. Oh, yeah. Okay. You know, so I imagine, you know, those guys might be working with two and a half inches as well. And I've heard Facebook also has a whole data center of SSDs out there. So when you have a few billion dollars laying around, you can do some pretty interesting things. I haven't had a few billion dollars lying around for a while.

Starting point is 00:37:44 Yeah, I lost it the other day. Since last week or something like that. Yeah, I completely understand. You know, so we just – we do, like I said, we work right now within the parameters we get. Your point really early on, like I mentioned, is it's really the software that makes the magic go. And, you know, you can – I don't want to say you can program yourself around anything, but for the purposes at hand here, whether it's a consumer drive or whether it's an enterprise, no difference, no real difference. You know, unless you get some absurd failure rates in a consumer drive, for example,

Starting point is 00:38:18 you're not going to really have anything noticeable from a bottom line point of view. You know, what's the difference between a 1% and a 2% failure rate over the course of a year in a drive farm? Three drives. Oh, my goodness. It's not a big deal. No, but you do have to have some significant scale to make the kind of things you do work. Yeah, that's correct. You're talking about, okay, we have 20 pods is a vault, and the unit of management is the vault. And that means that we should compare a vault to a disk array, not a pod to a disk array.

Starting point is 00:38:54 And it's more than 1,000 drives in a pod, you know? Yeah, yeah. I mean, some of the biggest enterprise-scale devices have maybe 1,000 drives. If you have very large numbers of things, and your software can deal with failures, the quality of the individual things is less important. That's very well put. That's really well put. And that's been the basis of our – we finally got big enough to make it work.

Starting point is 00:39:21 I think probably two vaults is big enough to make it work. We have about half our farm right now is a little bit more than that now is all in the vault configuration. And then the rest of them are standalone storage pods as we described them. We slowly migrate over. No reason to create more work for yourself than you need to, but there's reasons, whether it's the chassis getting old or, for example, when we migrated off our two terabyte drives to eights, that was a great time to start to say, okay, so all of the eights are going to be put into vaults, and then we're going to migrate from storage pods to vaults. Since we have to move this data anyway, let's move it where we want it to be.

Starting point is 00:40:06 Yeah, exactly, exactly. So that's what we do. Now, the good part about all of that is the technology that we use and the fact that we do things inexpensively but still are able to deliver a good quality product is the B2 storage product, right? The cloud storage product. We charge, it's a half a penny a gigabyte a month to store data, which is, you know, anybody else is out there storing it at five, six, nine, ten,

Starting point is 00:40:40 twelve cents. Pick your favorite thing. We store it very inefficiently. Time for me to redirect my CloudBerrys. Yeah. Okay. Do you guys do any compute? Do you have any compute capabilities while you're there and stuff like that? No. We do not.

Starting point is 00:40:59 We do not. And so people have asked us about that, and it's certainly one of the things we want to consider but the truth is we need to make sure we get the storage business right before we introduce that. Just two or three weeks ago we cut the price of our download product the same thing in B2

Starting point is 00:41:20 so it used to be a nickel a nickel a gigabyte like everybody else or thereabouts everybody else so we cut a nickel, a nickel a gigabyte like everybody else, or thereabouts everybody else. So we cut it back to two cents a gigabyte. So you can get your data back for two cents a gig. And the nice part about both of those numbers is those are the numbers. There's no tiering, there's no funny business, there's no if you, you know, like I said, if you use this service or move it over here, you know, tuck it under the covers, you get a better price or whatever. Those are the two numbers.

Starting point is 00:41:50 That's going to make it very hard for you to succeed in large enterprises where purchasing agents are evaluated by the percentage discount column on the PO. We tell them that we'll raise the price for them so they can discount it back yeah yeah yeah okay whatever works all right gents we're getting to almost the end of the podcast andy you have anything you want to say to our listening audience before we leave uh i just just thank you you know i i know uh many of the folks who listen to your your podcast have uh certainly you know paid attention to us over the years and come in and probably chimed in on some of the comments and things like that, good or bad, into that. So we genuinely appreciate that. We appreciate the dialogue.

Starting point is 00:42:37 We appreciate the comments, pluses and minuses, the criticisms and the pats on the back. And, you know, whether you use the service or not, you know, we still appreciate you contributing and reading along with us and going along for the ride. So, hey, thanks. Okay. Howard, any last questions? No, I just want to thank Andy for being with us and thank Backblaze for being so open with the disk performance stuff.

Starting point is 00:43:04 It really makes life easier to have. And the only question I would have, and this is a long, this is probably another 50-minute podcast, but it would be interesting to see some of the IOs per second and latencies and stuff like that that you're seeing across your disk drive base because it's obviously going to be quite variable and stuff like that. And that might be of interest to you, to listeners and your readers. Yeah, but remember, it's a very specific workload.

Starting point is 00:43:31 Oh, I know, I know. But still, you know, you're still doing reads, you're still doing writes, and you're still doing things of that nature. And that sort of thing would be of interest. But anyway, this has been great. Really appreciate it. Andy, thanks very much for being on our show today. Well, thank you, gentlemen, for having me and letting me rattle on for the better part of an hour nearly. So I appreciate that. Next month, we'll talk to another startup storage technology person. Any questions you want us to ask, let us know. That's it for now. Bye, Howard. Bye, Ray. Until next time.

Your Ad Here

Grey Beards on Systems - 45: Greybeards talk desktop cloud backup/storage & disk reliability with Andy Klein, Director Marketing, Backblaze

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.