Grey Beards on Systems - 81: Greybeards talk cloud storage with David Friend, Co-founder & CEO, Wasabi Technologies

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Matt Lieb. Welcome to the next episode of Graybeards on Storage podcast, a show where we get Graybeards storage bloggers to talk with system vendors to discuss upcoming products, technologies and trends affecting the data center today. This Graybridge on Storage episode was recorded on March 21st, 2019. We have with us here today David Friend, CEO of Wasabi. So David, why don't you tell us a little bit about yourself and your company? Sure. I'm a serial entrepreneur. My most recent company is a company called Carbonite that probably most of

Starting point is 00:00:45 your listeners have heard of. It's a big backup company. And the new company is called Wasabi, and we're out to disrupt the cloud storage market with a product that is really identical to Amazon S3, but one-fifth the price and up to six times faster. You can store a petabyte of data in Wasabi for less than the annual maintenance on a petabyte of, say, EMC or NetApp storage too. Yeah, so, I mean, it's a cloud storage solution. So do you have, like, data centers throughout the world and all that stuff? We have three data centers at this point, one on the East Coast, one on the West Coast, and one in Europe. And, you know, there'll be more online before the end of this year.

Starting point is 00:01:34 We have over 6,000 paying customers, about 4,000 more in trial. We've been growing 15 to 20 percent month over month. And so we're, you know, well over 100 petabytes of storage installed at this point. So we've been growing 15% to 20% month over month. And so we're well over 100 petabytes of storage installed at this point and growing very quickly. And you use like the S3 protocol to access the storage and that sort of thing? Yep. We very carefully did a very exact replica of the S3 API so that any software that works with Amazon S3 will work with Wasabi. So we have hundreds of software partners already that we have certified that work with Wasabi. I'm wondering how the application interacts with that data. Is the

Starting point is 00:02:22 application going to reside in AWS or do you put that application somehow with that data? Is the application going to reside in AWS, or do you put that application somehow on Wasabi? No, we don't run, we don't do anything except the storage part. And so you can continue to run your apps in-house, which is what a lot of people do, because most of our business is coming from companies that have, you know, a lot of in-house storage and it's five years old getting to end of life.

Starting point is 00:02:57 And, you know, a hardware salesman is standing there with a purchase order for a million bucks and customers saying, maybe I don't want to do that when I can move all that data to the cloud and at a price that's less than the maintenance on this hardware that they're trying to sell me. Sometimes people run their apps in EC2, in Amazon's cloud, but they store the data in Wasabi. Sometimes they take a second copy of what they have in Amazon and store it in Wasabi so that they're not locked into one vendor. Sometimes they run their apps in other people's clouds. Like we have a partnership with a company called Packet that only does compute in the cloud. We only do storage in the cloud. We have an agreement whereby you can move data back and forth between Wasabi and Packet with no egress charges.

Starting point is 00:03:47 We don't charge egress anyway. I mean, that's one of the things people hate. That's the other thing that S3 makes it expensive. Getting data in is easy, but getting it out is a bit of a challenge cost-wise. My biggest pain for my customers is explaining what egress charges actually mean. It seems to me as if it's an ideal candidate for a cloud backup environment. Yeah, we partner with Commvault, Veeam, many, many other, pretty much all the major backup companies. And it is, it's a perfect place to store your backups.

Starting point is 00:04:24 And almost everybody's backup software today talks S3. And so you, it's very simple to set it up with Wasabi. And there's, you know, in the knowledge base on the Wasabi website, there are plenty of videos and documents that tell you how to use Wasabi with Veeam, how to use Wasabi with Commvault, and so forth. So, yeah, it works great for that. And one of the things that Wasabi has that the other guys don't have is the notion of immutable buckets, and it's kind of like Worm. You know, having come from the backup world,

Starting point is 00:05:01 I know that most of the time when people lose data, it's because they made a mistake, pushed the wrong button, or know that most of the time when people lose data, it's because they made a mistake, pushed the wrong button, or there's been a hacker. Certainly, that's what my problems are. Yeah. Yeah. And so we have this notion of immutable buckets that says, if you know you're going to want to keep this data for a year or five years, put it in an immutable bucket, and then nobody can delete it, nobody can change it until that clock runs out. And so what that means is if somebody comes in, you've got a hacker or you've got somebody you've fired who wants to destroy data on the way out, it's safe. So do you support things like, I call it replication between data centers and that sort of thing, or taking replicas within a data center?

Starting point is 00:05:48 Well, you know, we have the same 11 nines of durability that Amazon and Microsoft have. 11 nines? Yeah, yeah, it is. I actually did the math on this once. And if you gave me a million objects to store, statistically, I would lose one every 659,000 years. 11 nines. I've never heard 11 nines before. Five, six, seven, maybe, but never 11. But that's an enterprise. Amazon, Microsoft, and Wasabi are all 11.9s. And basically you get that by taking the data and spreading it over 20 separate drives. And you'd have to lose five of those. Well, we actually

Starting point is 00:06:33 spread it across 20 servers because that makes it far more durable. But you'd have to lose five of those within a very short window of time. And statistically, the chances of that happening are almost nil. But some people want to have the data replicated in another geography as well. And we can do that as well. So if you can store data in Wasabi in Virginia and say you want to replicate it somewhere else, it'll just automatically in the background replicate itself to our data center in Oregon. And, you know, so that's available as an option. That's great. So as you grow, though, I assume that a user is going to have control over where that data sits in terms of data sovereignty or potentially GDPR type of security?

Starting point is 00:07:30 Yeah, absolutely. And that's why we have a data center in Amsterdam, because when GDPR came in, we had about 20% of our business actually was coming from outside the US. And it was very clear that if we wanted to retain that business, we were gonna have to have a data center in the GDPR to meet the compliance. And we are GDPR certified as well as HIPAA and FINRA and a dozen other certifications. So you mentioned immutable data buckets

Starting point is 00:08:02 or buckets and things of that nature. Do you support things like data compression or encryption and that sort of stuff? Or how does that work in that environment? at rest because we don't want to have anybody, you know, somebody did steal credentials and broke in or something like that, you know, all they'll find is encrypted data. So data is encrypted in transport using, you know, HTTPS and it's encrypted at rest so that even people at Wasabi, you know, can't see what the customer's data is. I get, yeah. So who has the keys in that situation?

Starting point is 00:08:51 Well, the customer, the customer has the keys, right? Now, if they, if they want to give us custodial copies of the keys, we will do that. We do that with a third party. But that's up to the customer. And certainly, you know, most of the big customers who have enough processes in place that they aren't going to lose their keys, you know, would prefer to keep their own keys. And we'd prefer that too, quite frankly. And do you support sort of i don't know what they call it anymore but they used to be you could send a disk to one of these cloud companies and they would they would suck the data off the disk and that way you wouldn't have to uh you know

Starting point is 00:09:34 move it over the network and that sort of thing yeah that's a data seeding process yeah amazon has something called a snowball and we have a wasabi ball, which is basically the same thing. Yeah, but it's a little spicy, right? It's hot. It's hot. Well, I think it was Azure where you could actually send them a disk as long as it was, you know, a SCSI attached disk or a SATA attached disk, it would work or something like that. Yeah. Yeah.

Starting point is 00:10:02 Now, these little appliances, it's surprising. I mean, we have a ton of them out there now because, you know, you get customers, even if they have the bandwidth, it takes a fair amount, if you're moving three or four petabytes of data, something like that, you know, you need it. It takes forever.

Starting point is 00:10:18 And even people who do have, you know, 10 gigs or more to the internet, they don't want to dedicate it all to moving this data. They've got work to do. And so these appliances are actually pretty handy. You know, they show up at the back door. You plug them in. You load them up with data.

Starting point is 00:10:35 And you put them on a truck and send them back. So what's a typical size for one of these appliances? Like a 100-terabyte thing? Yeah, 100 terabytes. And, you know, if someone has got a petabyte to move, we'll send them five of these little appliances. They'll load them up, send them back. They're easy to handle. They're fairly lightweight. And then we'll send them another five, and they'll load those up, and then the job is done.

Starting point is 00:11:01 That's great. And then I assume there's some sort of a syncing mechanism after the initial seed has taken place? Yeah. From that point on, the customer's sending the incrementals just over the internet or through a dedicated pipe, if that's what they want to do. That's an application and or a customer-based process rather than something Wasabi would do automatically, I guess, right? Yeah, same as it would be with Amazon or anybody else. So, you know, usually people are, even people who are sending us, you know, five petabytes of data, their daily updates might be a few terabytes. And, you know, that's easy to send over the wire. And those buckets effectively are accessible anywhere in the world with internet access

Starting point is 00:11:55 and things of that nature? Yeah. I mean, that's the good thing about being in the cloud, which is that, you know, it is accessible from anywhere. And you don't charge any egress charges. So where do you make all your money? The storage stuff? which is that it is accessible from anywhere. And you don't charge any egress charges. So where do you make all your money? The storage stuff? Well, if you look at an Amazon bill,

Starting point is 00:12:14 it's probably got 15 or 20 line items. They charge you for every little thing. They charge you for egress. They charge you for all the API calls, put, get, delete operations. And while Jeff was busy designing all this software, I was out talking to customers. And the very first thing I heard was, we hate all this stuff on the bill because I run an IT shop.

Starting point is 00:12:38 I've got a budget. I know how much data I have to store, but I have absolutely no idea how much egress I use or how many times I touch the data because with my in-house storage, I don't have any way to meter that stuff. And so it makes the bill completely unpredictable. And I know I've seen plenty of customers whose Amazon bill is double what they thought it was going to be because they didn't take into account the amount of egress that they're going to use. So customers hate that unpredictable nature. And so we just decided to not charge for that. So we make all our money on the storage. A wasabi bill literally has one line item, which is how many terabytes of storage do I use this month? And, you know, and it's, we'll have a

Starting point is 00:13:28 few customers who might abuse it, you know, who will take advantage of the free egress. And, you know, we might lose money on 1% of our customers, but it's so worth it to give everybody else this much more friendly experience. Our customers love it. I mean, it's like the first thing that they'll tell you is, I just love the fact that it's so simple. So maybe I can ask an architectural question. I had a customer talking to us about Wasabi recently. And what they were looking to do is remove all storage from their home environment

Starting point is 00:14:08 and point... I know, I know. And pointing to Wasabi as their primary file store, as, you know, literally everything within their data center in terms of storage. Is that a viable option? And how would that work? Yeah.

Starting point is 00:14:28 Oh, David, come on. No, no way. Well, you know what? Before Wasabi came online, my whole desk was cluttered with NAS devices and all kinds of junk. And I got rid of all of that stuff. In fact, I took a little piece of software called Cloudberry Cloud Drive, and I put it on my PCs and it presents wasabi. It presents

Starting point is 00:14:55 an object store as a mounted drive. So I basically have a W drive on my computers. And that W drive on my computers and that W drive looks just like the NAS that used to sit on my desk, but it's actually all the data stored in the cloud and it works great. And you can do that for target mechanisms for your databases or whatever. So we got, okay, so we got to talk performance. We have to talk eventual consistency. We have to talk, you know, how these sorts of things work in a block environment, as well as a file environment, those sorts of things. So it actually works pretty well. I mean, one of the things that Wasabi does that Amazon does not do is we're immediately consistent. So if you write an object to Wasabi, it's there instantly, you know, whereas Amazon's eventually consistent. Got to be careful here.

Starting point is 00:15:58 If I'm going to replicate that data across multiple regions, eventual consistency says it'll eventually get there. But immediate consistency says before you respond to that REST request, I guess, that data is everywhere. Well, the replication to the second site, you're not accessing that. You're only accessing the primary site. The only time you'd ever access the mirror would be if the first site fails. And that, so far, we've never had that happen. So, you know, that's kind of a backup scenario.

Starting point is 00:16:34 But if you're writing to a Wasabi bucket, and then one millisecond later, you try to look to see if that data is there. It's there. And you don't get an acknowledgement from Wasabi saying that we've received the data until it's actually available. And so it is immediately consistent in the same way that it would be if you were running an EMC storage server right next to your computer or something like that. It's protected across the 20-server environment at that point. Now, you know, physically, the data is probably still in Flash because it maybe hasn't been written out to disk yet. But from the user standpoint, you wouldn't care about that because, you know, data comes into Wasabi at such an enormous rate, you can't write it directly to disk, you have to buffer it in flash memory.

Starting point is 00:17:33 So, but it's available instantly. And that's another one of those things we learned because when we went out talking to programmers, you know, they hate the eventually consistent thing because then they have to build these sort of timing loops into their applications saying, I write this data, now I got to go see if it's there. If it's not there, I have to wait a second, then I have to try again. And you don't know when the heck it's going to show up. It could be milliseconds, it could be literally seconds. And, you know, so we fix that in Wasabi because we know that that's a just a bad design aspect of it. But anyway, to your point, you know, there's lots of there's dozens and dozens of software programs that talk the S3 protocol. And I was just mentioning CloudBury because it's kind of a low-end product that works

Starting point is 00:18:28 really, really well. And it's suitable for a small server or individual PCs. And you can get, we have a similar kind of thing that we offer for free on the Wasabi website. You can just, there's a Mac version. It's called Wasabi for Mac and Windows. To be honest, that's not where most of our business comes from. Most of our business comes from large enterprises who are moving very large amounts of data and have much more sophisticated ways of doing it. Sure.

Starting point is 00:19:00 But in the goal of removing, you know, physical devices from a network, in this case, SAN or NAS, it's viable. Well, yeah, even our big customers, you know, who maybe have a petabyte or more, they may be shutting down a whole room full of, you know, NetApp NAS devices, you know, or storage servers. So that is the goal in many cases. The only time, you know, I would advise against it is, you know, if you have to look at the latency to the WCABI data center. And, you know, I don't know where you guys are, but, you know, from Boston here, yeah, so in Boston, we're like 15 milliseconds from the Virginia data center.

Starting point is 00:19:57 And that's fast enough for anything I would want to do. But, you know, we have a couple of people who are doing like live video editing and things like that, and they really need sub one millisecond. And so the solution for people like that is run your app in the same data center cluster down there in Virginia that we're in, where you are under a millisecond. And, you know, your terminal may be further away, but that data stream can move quick enough. So you can run apps in your data center? Well, there's lots of other vendors.

Starting point is 00:20:34 We don't provide compute in the cloud. Ah, you're not the only data center. Okay, I got you. You know, if you came to me and said, I want to do this, I would recommend Packet or any number of a few other companies that are offering very good, very high quality compute in the cloud services that we have interconnected to. So that we do get this very, very fast sub millisecond kind of latency. Yeah. So what about a block versus a file? I mean, a file typically is, you know, is a bucket, you know, like element, but, or an object rather, but a block you can be five, you know,

Starting point is 00:21:18 4k or something smaller and that sort of stuff. If you have those sorts of size, if you're going to convert a block storage device to an object storage back end. I wouldn't do that. I think you're right about that. The, you know, and we don't really care about block storage stuff that much because it tends to be small. You know, it tends to be row and column kind of oriented data. It tends to be alphanumeric data, which generally doesn't amount to a hill of beans. The kind of stuff that we like are, you know, clearly media, like movies, TV shows, surveillance cameras, genomics. You know, every time somebody gets their genome sequenced today, it's terabytes of data. You know, we get a lot of scientific stuff, satellite imagery, you know, outputs from telescopes, particle colliders.

Starting point is 00:22:19 So a bunch of unimportant stuff, basically. Yeah, but that stuff is big. It's file-oriented. It's unstructured. And all those things, they're typically either read or written, but they're not updated kinds of stuff. That's right. That's right. If you're talking about a transactional system, something that's being used for trading stocks or order entry, A, the performance would not be good if very small relative to the amount of in and out, you know, so it doesn't really fit with our business model. So on another architect, because I'm very nuts and vol a big part of the storage mode that you're using. which is where we eventually store the data, you really have to stage it before you write it to disk because, first of all, you know, over the Internet, things don't come in in sequence and so forth.

Starting point is 00:23:51 So you have to reassemble your object somewhere before you write it to disk. So you're basically just buffering as stuff comes in the door in very high-speed memory, and then you're organizing it. And, you know, we actually, I mean, some of the key to the Wasabi cost and speed is we actually write the software that goes and controls the movement of the heads on the disk drives. So we don't rely on Linux or Windows or any other operating system to do that kind of stuff. It's very... You got your own disk driver? I'm impressed.

Starting point is 00:24:26 Yeah, it's the only way you can do what we do. And, you know, and Linux and Windows are block oriented file systems, and you will never get either the cost or the speed that we achieve by using a block oriented file system. So, you know, this is stuff that we've learned how to do over 15 years of doing it at Carbonite and now doing it at Wasabi. And it's an arcane end of software. You probably will never meet a software engineer who's ever had to worry about the physical placement of bits on disk. Trust me, I've been there.

Starting point is 00:25:05 You know, so it's not something you're going to really learn how to do in college or something like that. So it takes a lot of experience and know-how. And we've, you know, developed all these algorithms for how to do it to minimize head movement and maximize throughput and all these other things. So the challenge with doing something like that, David, is that, you know, different drives come along. They've got either more heads or less heads. Their organization is, you know, multi-head per platter versus single head per platter. There's a shingle magnetic recording coming out versus conventional. I mean, so there's a lot of sophistication in these devices as they start to emerge.

Starting point is 00:25:46 That sort of thing would have to be almost tailored to every one of those drives. Yeah. You know, we started out with SM, has to be certainly tailored to the, the kind of drive that you're using. Um, but you know, since, since we've been in business, uh, we started out with four terabyte drives, then we started getting eight terabyte drives. Now we've got 15 terabyte drives and, uh, Western Digital and Seagate are waiving 20 terabyte drives. Now we've got 15 terabyte drives. And Western Digital and Seagate are waiving 20 terabyte drives now in test. And one of the vendors has even demonstrated in the lab 100 terabyte drive. And what happens is every year the cost per bit drops by 20% or so.

Starting point is 00:26:47 And we just ride that curve. We've been riding it since we started Carbonite when a 500 gigabyte drive was a big deal. Yeah, yeah, yeah. No, I understand. Another question I would have is other vendors seem to have multiple tiers of object storage. And I'm not sure object storage is the right term for it, but, you know, frozen tier versus non-frozen tier kinds of things. Do you guys support, you know, different costs, levels of object storage? I happen to think that's a really dumb approach.

Starting point is 00:27:26 I actually would agree with you, David. Well, look, when you're talking about this price point, and not to interrupt, but when you're talking about this price point and how much less it is than AWS or really most any S3

Starting point is 00:27:42 provider, you're really talking about, why should I architect an additional system to support instead of, what is it, 0.006% of AWS to 0.007? I think that's throwing engineering in the wrong place. Yeah. I mean, I guess my approach is, you know, Wasabi is cheaper than Glacier when you take into account typical EGS, and it's faster than S3. So why do you need the seven other tiers that are in between? And, you know, I went to a Gartner conference and the Gartner storage analyst gets up and says,

Starting point is 00:28:24 well, if you're going to move your data to the cloud, Gartner storage analyst gets up and says, well, if you're going to move your data to the cloud, the first thing you need is a $200,000 head count to sit down and figure out what data ought to be in which tier. And there's a whole industry of consultants out there who literally their payment is going to be the amount of money that they save you by figuring out cheaper ways for you to move your data from one tier to the next. I mean, it's just not. You don't have three plugs in your wall for great electricity, okay electricity, and crappy electricity. No, actually, I do have for, you know, different device appliances and stuff like that.

Starting point is 00:29:00 But I mean, you know, there'll be edge cases, but for the most part, you know, what people want is good, solid, you know, storage as a utility. It's fast, it's cheap, and it does 99% of all see that that's going to be the way storage is going to be in five years or 10 years from now. That's interesting. You shot down the whole model, I guess. Sounds like you're arguing, Ray. You think that's it? It's a question of petabytes, exabytes, yottabytes. You know, it's how much storage you want to talk about. And every little cent here is going to add up. If we're talking about exabytes of storage, it's a lot of money. Yeah, but, you know, still, you know, you can store an exabyte in Wasabi for the same price or less, and you can store an exabyte in Glacier. So why do you need all these tiers?

Starting point is 00:30:10 It's all it is, is it's another level of complexity and you spend a heck of a lot of time just shuttling data from one thing to another. And let's also add to it that when you want to retrieve data from Glacier, you've got a lag time that's absolutely absurd. It's not live. Yeah, you get hours. I mean, it's just not live. And everything you're storing is live data that can be extracted at any moment's notice and for no egress charges. I think that model works. I was just going to say, there's almost no reason why you would want to use a product like Glacier if you have an alternative like Wasabi. Let's say you were able to take your technology and you've already got multiple tiers, just add another dozen tiers or something behind that, and then supply that storage at one-tenth the cost of normal Wasabi. You don't think customers would buy it? How are you going to store it? I mean, ultimately you have to store that data somewhere. And, you know, and we get our efficiency by

Starting point is 00:31:11 more efficient. I mean, we've used 99.5% of the physical bits on disk. If you're using a block oriented file system, you're probably going to maybe get 60 or 70% of the physical bots on there. If you're lucky. So, you know, if there were a way to store data at a 10th of the price of wasabi, it would have to be on some completely different medium. And, you know, I don't know what that doesn't exist today. And if it does exist, we'll certainly be the first ones to embrace it. Because that's our business. Our business is to be the Walmart of

Starting point is 00:31:45 cloud storage. There's no point in shopping around because the Wasabi brand stands for always the high quality storage at the lowest price and nobody will ever beat us. Well, I was going to say, how are you going to shave your margins any further to support something like a Glacier in an effort to support the, like you referred to it, an edge customer who wants that sort of more archival type storage. It sounds to me like your margins are as lean as they can be. Yeah, the only thing that's cheaper on a per bit basis is putting stuff on a tape, putting it in a cardboard box, and then sticking it in a warehouse somewhere. And it's only cheaper if you're never going to look at it again, because the cost of taking a tape out of a box, out of a warehouse, is pretty high. But if your goal is for some regulatory reason, I have to keep all this data

Starting point is 00:32:46 for the next 20 years and I'm probably never going to look at any of it again. Still, the cheapest way to do that is put it on a tape, put the tape in a cardboard box and stick it in a warehouse. Yeah, but if you need it 20 years later, are you going to still support the same tape drive? No, no. Those tape drives will all be on a junk heap in China somewhere. Well, you guys haven't convinced me yet, but I'll agree to let this one lie to some extent. But so what are some of your bigger software partners that customers might recognize? Because, I mean, it's really, do a lot of your customers directly access Wasabi or do they use some of your software partners to access it?

Starting point is 00:33:31 Well, you have to talk to Wasabi across an API. So you've got to have some piece of software, you know, sitting in front of it. So, you know, that could be anything from backup systems like Veeam and Commvault to video editing software. It could be archival software. There's a lot of software out there that we partner with that is content management software that put MRIs and X-rays and CAT scans and all kinds of other stuff up there. There's software that takes surveillance cameras and puts all the video up in the cloud. So we have partners in every conceivable industry. I mean, it's been really easy for us to develop those because most of the time when somebody's developing a new product today that requires storage, one of the options always is an S3 interface.

Starting point is 00:34:38 And so, you know, we have yet to find one that didn't work with Wasabi. So, you know, we do have a lab here where we actually validate and test everything. And I think we have, if you go on our website, I think there's like 240 partners now on the website that are certified to work with Wasabi in across a wide range of applications and industries. So you're more of a horizontal play rather than vertical play and let your software partners do the vertical sales kinds of things? Is that how you would say it? Yeah. I mean, we see storage as a commodity like bandwidth or electricity.

Starting point is 00:35:17 It's something that's there and with the disruptive price that wasabi has um you know people are coming up with all kinds of new things that didn't used to make sense because the storage costs were too high so um you know when anytime you cut the cost of something by almost an order of magnitude it's going to unleash a lot of creativity yeah you should see should see my DVD library. I'm thinking about trying to get that online someplace, but the cost would be exorbitant. For me, it's music. You can get a terabyte on wasabi for five bucks a month. I'm not sure we want that. So the other use of cloud storage I've seen is for things like sharing files and things of that nature or collaboration services with files. You know, my problem is I've got a laptop, I've got an iPad, I've got, you know, a desktop. I'm trying to get those same sorts of files across all those.

Starting point is 00:36:17 Do you have software partners that support that sort of service? Yeah, there are. And, you know, a lot of people, when they think about cloud storage, they think, oh, well, you must be talking like Dropbox or something like that. And, you know, when Dropbox first came on the market, they stored all their data in Amazon S3. And, you know, nobody knew that, nobody cared. But that's the position we're in so there are other companies out there that make products that are similar to dropbox that are storing their data in wasabi and so you know i think if you want that kind of thing you might go on the website and look and see what what we have for partners you mentioned performance is another differentiator you want to

Starting point is 00:37:03 just tell us a little bit or give us a little hint on how that's done in your environment versus the competition? Well, the way we store data on disk, it's a sequential file system, meaning that if you give me an object, I'm going to write the entire object in one go. A block-oriented file system spreads your data out in available blocks all over the disk. So if you have a disk and, you know, the manufacturer of the disk says, okay, the theoretical throughput of this drive is, say, 400 megabits per second or something like that. With a block-oriented file system like Linux or Windows, you're probably lucky if you get 25% or 30% of the theoretical throughput because the head is flying all over the drive, picking up pieces of your object, right? And you have to, each time you move to a different track on the drive, you have to wait for the

Starting point is 00:38:07 disk to spin around until the bit you want to read is actually under the head. Rotational latency, seek latency, those sorts of things. Yeah, yeah. Exactly. So with Wasabi, when we write an object to disk, we write it all in one big long track. And if the object is bigger than one track on the drive, we move to the adjacent track. So there's almost no head movement. So this allows us to read and write data to the drive at very close to the theoretical throughput of the drive itself. The other thing is we spread that data over large arrays of drives,

Starting point is 00:38:46 so that we're reading and writing from large arrays of drives simultaneously. And so you take the throughput that we get on the individual drive, and you multiply that by the fact that we're reading and writing in parallel across a large number of drives, and you can get enormous throughput. And that's how we get the speed. So we really architected it very differently from the beginning to get the kind of speed because, you know, people have things that are generating data at very high rates. The surveillance cameras, you know, those things generate lots of storage bandwidth requirements. And you can feed a surveillance camera series into Wasabi directly almost?

Starting point is 00:39:33 Straight into Wasabi, even at 4K or 8K resolution. And, you know, and that's a big differentiator. I mean, you can't – we publish – if you look on our website, there's actually some test suites that we ran that are kind of industry standard test suites for evaluating cloud storage. And we publish the specs on GitHub so you can actually – the code actually for the tests on GitHub. So you can run these tests yourself if you want to validate the results. But we've published, we've run a bunch of tests on, you know, how many threads, how many objects, what size objects, and all these different combinations and permutations.

Starting point is 00:40:17 And, you know, that's all published on the website. And, you know, there are use cases where the write speeds on Wasabi are seven or eight times faster than S3. And you would say that's because of the way you've decided to, you know, understand the disk layout and lay the data out in an optimum fashion on those disks and that sort of thing. Yeah, we architected it for speed from day one. And we knew how to do this from Carbonite because Carbonite, you know, when I left Carbonite, we were taking in a half a billion files every day. And, you know, it was a struggle

Starting point is 00:40:58 over the 10 years I was there. It was a struggle to stay ahead of the fire hydrant of data that was coming in. And so, you know, we learned, developed and learned a lot of these techniques. Okay. Hey, Matt, any last questions for David? No, I think David's covered this beautifully. Far more information than I was able to cull from the website, to be honest with you. And I was really happy to be able ask some some questions that had arisen. Okay David anything you'd like to say to our listening audience before we close? No it just you know if you know if you're running a lot of in-house storage and the cost of that is a pain point for

Starting point is 00:41:40 you it's worth taking a look at moving that data to the cloud even before that equipment wears out because you're probably paying annual maintenance, putting aside the depreciation, you're probably paying annual maintenance that may be more expensive than what it would cost you to store your data in the cloud. Well, this has been great. Thank you very much, David, for being on our show today. Yeah, pleasure. Next time, we'll talk to another system storage technology person. Any questions you want us to ask, please let us know.

Starting point is 00:42:12 And if you enjoy our podcast, tell your friends about it, and please review us on iTunes and Google Play, as this will help get the word out. That's it for now. Bye, Matt. Bye, Ray. Bye, David. Okay, take care, guys.

Starting point is 00:42:25 Until next time.

Grey Beards on Systems - 81: Greybeards talk cloud storage with David Friend, Co-founder & CEO, Wasabi Technologies

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.