Grey Beards on Systems - GreyBeards talk global storage with Ellen Rubin CEO & Laz Vekiarides CTO, ClearSky Data

Starting point is 00:00:00 Hey everybody, Ray Lucchese here and Howard Marks here. Welcome to the next episode of Greybeards on Storage, a monthly podcast to show where we get Greybeards storage and system bloggers to talk with storage and system vendors to discuss upcoming products, technologies, and trends affecting the data center today. Welcome to the 25th episode of Graybridge on Storage, which was recorded on October 8, 2015. We have with us here today Ellen Rubin, CEO and co-founder, and Laz Vecchiarides, CTO and co-founder of ClearSky Data. Why don't one of you tell us a little bit about yourselves and ClearSky Data? Good morning. How are you doing? We're good. That was pretty good, Ray. I have to say that was

Starting point is 00:00:48 pretty close on the last name. You know, it's always a challenge. Thank you for having us join you guys today. So I'll just give a quick overview and then let's just do it, you know, as interactive and questions as you guys want to go to. Laz and I are co-founders and we started the company about two years ago. Both of you know because I think we definitely were hanging out with you at VMworld. We launched the company back at the end of August, and then we're at VMworld exhibiting and speaking, and it's just a very exciting time for the company.

Starting point is 00:01:16 So what we have is a global storage network that is a service for enterprises with large amount of primary storage. So traditionally, they would be EMC, NetApp, Dell types of customers. And what we're doing, we believe, is offering kind of a radically different approach to how enterprise storage can be consumed and delivered. So the idea of a global storage network is that we will always be able to deliver the customer storage to them wherever they are and very,

Starting point is 00:01:45 very close within a very, very small amount of latency at the edge next to where they're located. And the philosophy that we have is that we really want to deliver the performance and availability and security and latency of a traditional storage array that would be sitting in a customer's data center, but we're doing it as a fully managed service, SLA-based, with the scalability and the economics of the cloud. And I'm sure we'll talk a lot about how we do it, but we think it's kind of a unique and a differentiated approach where essentially the goal is to always be within two milliseconds of our customers

Starting point is 00:02:18 and to handle the full lifecycle of the data from primary to backup to disaster recovery, where we can guarantee five nines of availability, hundreds of thousands of IOPS, and very low latency that the customers need, so that they could run traditional enterprise workloads or could run workloads in the cloud, whatever they need to do. And we do it at a third the cost of what they would be spending today. Gosh, it looks like you're trying to take on the whole industry in one shot. Well, I'll let Laz talk about his background because that's not new for him. Yeah, I guess. I had early days at Natiza where we challenged a bunch of incumbents as well. But Laz, you want to chat a little bit about taking on the storage industry?

Starting point is 00:02:54 Well, he's certainly not lacking for ambition, but if you're going to do something, go big, right? Oh yeah, yeah, I agree. So you're working under the pinky in the brain theory? What are we going to do tonight to take over the world? Perhaps you have to start thinking like a Bond villain. I have a white cat in case you were wondering, Howard. That explains much. I guess. The fact is that one of the problems that most of the customers that we talk to, especially at the size that we're talking about, that they have, is really just the ongoing lifecycle management of all this gear. And if you look at what's new in storage, you really just have more of the same thing.

Starting point is 00:03:39 And I keep harping about that in casual conversations. Anyone who will listen to me, people just want to build new boxes that do something slightly incrementally better. So you constantly have that. The real fundamental problem is that your data footprint continues to grow. Storage, it really should be a service if you think about it, because all this gear, it just more or less expires after a couple of years anyway. So you're constantly in this treadmill of replacing it. And so if you think about it that way, the storage industry has not changed ever, really. And a lot of vendors are enjoying basically selling the same thing over and over and over again for the same capacity for only incremental new benefits to the customer. And I think what we're proposing

Starting point is 00:04:21 here is a completely different way to consume storage, basically acknowledging the fact that you have these lifecycle issues with your data. Your data is immortal. Your gear is not. Data is immortal. Gear is not. I like that. I wish it weren't true because the solution to so many data problems is really deciding to throw it away. But it is true.

Starting point is 00:04:44 It's true. If you think about it that way, we offer something that's very, very different from an economic and operational standpoint. So you don't really deal with all this gear. It's all hidden from you. It's inside of our network and it's in the cloud. And after all, you know, the big new innovation of the last five years that everyone is excited about. I'm here at Amazon Reinvent, and we have 20,000 people almost here. You never would have guessed that five years ago. $7 billion business for Amazon alone. Jesus, yeah. That's right. And people want cloud models for IT infrastructure.

Starting point is 00:05:22 That is what the main theme of everything that's going on here is. And so this is... But if that's entirely true, then why do I want storage on-premises? You know, I want to just move everything to AWS, don't I? Not true. You know, so there's this notion of latency that you have to deal with. And as you know, Howard, we've been working very hard at increasing the speed of light to no avail. 186,000 miles per second. It's not just a good idea.

Starting point is 00:05:51 It's the law. It's exactly. We're going to live with it forever. Right. And so you can, for example, in Boston, where we are, and be able to access remote storage over the internet. It's just not feasible for any type of primary workload where you are expecting 10 millisecond or less in latency or, you know, if you have a

Starting point is 00:06:13 flash array, you want, you know, sub millisecond latencies. You can't do that with remote storage. And what we do here at ClearSky is to apply our networking and caching technologies to make that possible. So you don't have to have storage on-premises in order to have high-performance workloads running locally. And that's what the global storage network really does. So Howard, I wanted to comment as well, because I think, especially this week, the week of reInvent, it's a fair thing to ask, well, isn't everything going to the cloud? Isn't everything going to be in the cloud? So know, so who cares anymore about customers having data centers or, you know, connectivity and stuff? And I'm a cloudy from 2008. You know, my last company

Starting point is 00:06:52 was totally focused on hybrid cloud and, you know, very much, you know, a believer. But I think what's true for the class of customers that we're interested in that are these, you know, sort of high end of medium up into large enterprise is they got data centers and they're trying to get out of their data centers and they're trying to embrace cloud. But if you really look at what's going on, a very tiny percentage of what they have has made its way to the cloud. You know, so they've got a little bit of SaaS. They've got, you know, some footprints, you know, some test devs, some stuff that's just getting thrown out there because people decided to do it. But the core IT infrastructure and applications that are running, which could be, you know, hundreds, it could be dozens, really the majority is still some combination of, you know, VMware or, you know,

Starting point is 00:07:35 private cloud and traditional models. And what we're really doing is sort of honing in on customers who really are going to live in that type of a world for the foreseeable future and helping them as they more and more put new things into the cloud. That's where we're targeted. So how does this all work, Laz? I mean, you know, if you're going to have data that's sitting in the cloud and a data center that's sitting at, let's say, Boston and the cloud is in Virginia, for instance, let's say Amazon, how does all this hang together with sub two millisecond response time? Well, you have to avoid talking to the cloud as much as possible. And if you do talk to the cloud, you have to talk to the cloud over a dedicated connectivity that is very low latency. So we do

Starting point is 00:08:15 both. We have built a caching network where we have lots of very small footprint points of presence in various metros. And in those points of presence, we keep caching infrastructure and a relatively small amount of high endurance flash durable storage. So what we've done in our architecture is to minimize the amount of durable storage that we need to keep. Remember, the really vexing thing about primary storage is that you have to keep all that data durable. You can't lose a disk and lose data. I keep trying to explain that to the VMware guys. All the VX VMware guys are here this week,

Starting point is 00:08:58 so I can relay that message to them if you want. So the big trick that we've applied to bend the economics is that the amount of durable storage in our network is very, very minimal. It's a write-back cache that sits in that point of presence, and it's made of... A write-back cache, not read. That's interesting. That's right. The read is sitting in the edge, which is separate from the pop. When you say edge, you're talking about the data center itself. Yes, the data center itself. We also do have in that POP a large warm cache, which is basically an overflow for the edge. And it's also multi-tenant. So we're trying to bend the economics there as well by not having to provision storage for any one particular customer. And so you have this bucket of cache and you have this

Starting point is 00:09:47 small bit of durable storage, which is the write-back. Everything else, all the rest of the data that's sitting in our network is cache data and it's backed by the cloud. So eventually all your data goes to the cloud and the cloud, if nothing else, has amazing durability. So six or seven copies, 11 nines of durability, you're not going to get that in a physical array that you have on premise unless you really want to have six or seven copies or some really elaborate erasure coding that might be able to approximate that for you. So Laz, you have devices at the edge as well as the pop, and then the storage is backed by the cloud itself. Is that how I understand this? Yes. And then one last key component of the architecture is backed by the cloud itself. Is that how I understand this? Yes. And then one last key component of the architecture is the networking. We build and run our own network. It's a private network with private connectivity down to, in the east, it's Ashburn, Virginia for Amazon East.

Starting point is 00:10:40 And then we're going to have a similar connection arrangement when we open our Las Vegas pop momentarily, I guess, with Amazon West. And so in those cases, you basically can get to Amazon within a couple of milliseconds, 10 milliseconds in Boston. And then from the pop to the edge, this is where the sub-two millisecond thing comes in. So because we're in metros, we can reach out and touch pretty much any data center with a private line in Boston, Philadelphia. And we have certainly in Boston, we're actually at sub millisecond latencies. And these are all private lines, metro Ethernet, which is very abundant and surprisingly cheap. And it's all included in the service. So the connectivity is something that we provide for the customers without them

Starting point is 00:11:25 really having to manage it. We provision it, we deal with it, we deal with the carriers, we make sure that there's diversity, et cetera, et cetera. That's it. That's the architecture. So how much cash is at the edge versus the pop? In historical days, there would be some sort of a percentage that would be cash versus backend storage. Do you have that sort of a percentage that would be cash versus back-end storage. Do you have that sort of relationship or ratio in your system? Yeah, we do, actually. So we have these rules of thumb, but they're going to evolve over time as we learn more and more about how data behaves over extended periods. So we've been in beta for a while.

Starting point is 00:11:58 We sized everything according to these guidelines. So the edge we wanted to be roughly 10% of the overall active footprint, so online lunge that a customer is using. And then the middle tier shouldn't be more than 30% or so. And that is sort of the sizing guidelines. Now, it turns out that we tend to over-provision a little bit just because we can, and it doesn't hurt. Flash is cheap. Reputation is expensive. Exactly. And so, when you're looking at our edge devices that we provide our customers, it's six to eight terabytes in general to start. All our beta customers are in that range, and they're doing really, really well

Starting point is 00:12:45 with that. So they're getting excellent, very, very close to flash performance. When you say six to eight terabytes, you're talking DRAM cache or flash? It's flash. Flash, flash. And then there's some sort of DRAM cache in there as well, I assume. Oh, yes. So the Edge appliance is actually a very interesting box. It is a 2U. It's a storage array chassis, but it's just a cache. So we don't have to deal with RAID. And because of that, we can optimize for capacity, which is really what you want with a cache.

Starting point is 00:13:16 With 24 slots, we have the potential to get pretty large, obviously. We keep all the metadata in RAM, so there's a ton of RAM on that box, and there's a ton of compute. The other thing that we do in order to optimize the network is to compress and deduplicate the data before we send it out to the network. We also encrypt it, of course, obviously, if it's going to go out of premise. Our customers expect sort of rock-solid security and encryption. But that box actually does all of that and does all the coordination. So it's really a compute-heavy workload at the edge. So when you say encrypt, is it SSL kinds of encryption across the network or data at rest encryption or both? Well, I keep calling it belt and suspenders because it really is both. So we have self-encrypting SSDs, first of all, so we encrypt at rest.

Starting point is 00:14:06 The minute we ingest a piece of data after we hash it and match it suitably, before it goes out the back into the network, it gets encrypted with AS256. Using keys that are actually physically present, we actually are using the TPM technology that intel motherboards have so we have tpm modules uh on both ends of this wire so both entities can identify each other using that technology and just to be even more paranoid even though it's a private line we're using uh you know tls to encrypt the entire communication channel between the edge and the POP. And in fact, even beyond that, the networking itself, once you get into the POP, every customer is isolated. And so there's

Starting point is 00:14:54 networking, network level layer to isolation as well. So it's a very, very sort of paranoid, isolated, very secure environment where each customer only gets to see very, very specific things. And we make sure that there's no crosstalk between customers. I got a couple of questions. So the data is encrypted via AES before you send it to the POP. And I, as the customer, own that key and you don't know it? That's exactly right. We don't want to know it. Yeah. I don't want you to and you don't know it? That's exactly right. We don't want to know it. Yeah, I don't want you to and you don't want to.

Starting point is 00:15:29 That's right. But that would mean that in the POP device, which is multi-tenant, you can't dedupe across multiple customers because they're encrypted with different keys, right? That's absolutely correct. And that's always been a stated non-goal for this company. Each company's dedupe domain is their own. And that's established at the edge then? Yes. So can I jump in for one second? So there's such an extensive set of things that enterprise customers need from us as a service provider, taking their data outside the firewall. And

Starting point is 00:16:03 a lot of them can be addressed with the key management and encryption. That's like, you know, table stakes and critical. And then there's a whole set of things, which I'm sure, you know, we won't have time to get into a lot of detail on that have to do with just, you know, physical security and operational security and personnel security and all that kind of stuff that. That's why the guys at the super nap like to show off their assault weapons. Exactly. There you go. It makes everybody feel safer. That's how I like to describe it.

Starting point is 00:16:28 Oh, God. Assault weapons don't make me feel safer. Yeah, this is like our customers are already in beta. They were already compliance and regulatory sensitive, right? We have financial services and healthcare and biopharma already and just a lot of, you know, just that's it. You don't get to work with customers on that type of data until you've proved that you've pretty much gone, you know, through the, you know, the heavy lift and the belt and suspender stuff that Les was talking about. So we just did it straight on from the beginning.

Starting point is 00:16:58 The elephant in the room is the eventual consistency. How is that being dealt with in your system, Les? So the POP actually does that. What we're doing is, you know, this is why we have a write-back cache. We accumulate writes, and then we push out a whole bunch of large amounts of data, really, into the cloud at predefined points in time. And we use—we actually— there are ways to do this. You can never rewrite an object and that's how you get tripped up with

Starting point is 00:17:29 eventual consistency. Instead, we've used other tricks like versioning in order to make that possible. And we've handled the problem, but we're definitely not going to be, we're not going to try and do something crazy like being synchronous to the cloud. We're point in time consistent to the cloud, which is one of the things that I expect.

Starting point is 00:17:46 We're here at reInvent. I keep explaining that to customers. You're synchronous to the POP and point-in-time to the cloud. And that's really how we solve it with our architecture. So when you blast out this periodic destage to the cloud effectively, you're intermingling all the data from that POP, I guess, and from all the edge systems out there. Is that how I read that? Each customer has his own software workload that does that independently for them. And then we size

Starting point is 00:18:16 the infrastructure, including the networking, so that they could all be on at the same time, pushing their images out at exactly the same time. But it's done on a customer-by-customer basis. So like a bucket at Amazon would be associated with one customer effectively? Yes, exactly. That's the sort of thing. In fact, Amazon doesn't support that many buckets, so we wish there was an architecture like that. But we're the namespace that suitably separates everything, and then we added access controls,

Starting point is 00:18:43 just customer stuff. So is it iSCSI or Fiber Channel at the edge? Well, today, in our first release, we're iSCSI, but in the first year, we definitely have plans to extend this to Fiber Channel and NFS, and

Starting point is 00:19:00 SMB as well is in there. So we expect this to be a multi-protocol edge. Anything that talks SAN protocols or file protocols will be included inside the box. Okay, now you got me wanting sync and share from the pop. Sync and share. I've gotten to the point where I think of sync and share

Starting point is 00:19:19 as a protocol, not an application, but I'm weird, so. It's a Dropbox infection here or something. If you're going to support SMB at the edge, and so everybody, when they're in the office, can access all those files, then it's already at the pop. Yeah. Encryption and all that stuff

Starting point is 00:19:37 needs to be carried through the sink and share. It's non-trivial. Absolutely. Absolutely. You know, it's great. Every time I talk to Howard, I get a new requirement. Yeah, this could be a problem. That's why Ellen doesn't let me talk to Laz more than once a month.

Starting point is 00:19:55 Exactly. You have to talk to me. That way we can talk about all of the cool use cases that you think we could be tackling. I think the real issue, of course, for us as a startup is, you know, we're dealing with large enterprise, like they want fiber channel, right? That's something that's pretty urgent versus maybe when we were starting, we thought, oh, we can hold off on that one, for example, for a little while. We're just, you know, we're just kind of taking it as it comes from the customer input. Yeah, no, I think that's very clear. I think Nimble demonstrated that,

Starting point is 00:20:23 you know, very dramatically. As soon as they introduced Fibre Channel, their large customer sales went up dramatically. Exactly. So you mentioned Philly and Boston and Vegas. Are there other pop locations currently available? Or I know you probably have a roadmap. Yeah, not yet. The launch was for the three initial pops, but the plan is to be in every major city in the United States. And, you know, the good news for us, of course, is that things tend to cluster, right? You know, there are sort of the obvious places you'd want to be, you know, New York and San Francisco and Seattle and Denver and Texas and a couple of different locations.

Starting point is 00:20:59 Like, those are all obvious and definitely on plan. But the other thing is, you know, it's a global storage network. So even in 2016, I think we'd like to, you know, sort of be out with at least an initial location in Europe. And, you know, the nature of our customers is that many of them are at least multi-site, if not multinational. And, you know, it just, we need to be where the customers are. Okay. Let's talk for a minute about how the system works multi-site. Because it seems relatively clear that, you know, I've got an edge device that I connect my servers to iSCSI and it talks to the POP. What happens when I have, well, what happens when I fill that edge device up? Let's start with the easy one.

Starting point is 00:21:40 You know, knowing where Laz came from at Equalogic logic i suspect i know the answer to this well essentially we started expunging data um and destaging more yeah yeah we basically move all of you so all of your data you know it flows it it's sort of like a spill and fill it flows out into the um into the pop and so um what ends up happening uh in POP is you have an even bigger cache where you can... I actually meant more, you know, what happens when I need more cache at the edge. Oh, well, when you need more cache at the edge, we have a couple of different strategies for you. Obviously, we have a lot of extra slots in these boxes if you have only 6 to 8 terabytes. So we can simply just walk up to one of them and put in more devices.

Starting point is 00:22:23 Of course, we wanted to be able to scale past that. So we also have scale out at the edge, which means bringing another device and having it sort of transparently cluster with the first device so you can manage them together as a single logical entity. So we have scale out. And when you say scale-out, Laz, you mean the LUNs can actually span both edge devices, let's say. Exactly. So the edge devices, all it does is there's a private network, obviously, that goes to the POP. But on the back end, outside of the SAN network, these devices talk to one another. So effectively what you're doing is you're creating an even bigger tier of cache on the edge, and the devices talk to each other. So effectively what you're doing is you're creating an even bigger tier of cache on the edge

Starting point is 00:23:05 and the devices talk to each other. So the miss path changes to be more interesting, whereas one device, if it is a cache miss, it'll go ask its friend before it goes across the wire. Right. And then if I have multiple offices in the same metro, I understand it does interesting things. Yes. So it's very easy to exploit the fact that the POP is sort of, as you say, RPO0. Because there's only one metadata master in a metro area for every LUN, that means that all these caches are effectively synchronously replicated to one another. This means you can do things like synchronous replication, geoclustering. Even with something like vVols, which is really interesting, effectively we create this logically consolidated metro-wide storage array where vVols can surface in any edge location. So you can

Starting point is 00:24:01 basically shut VM off in one place and bring it up in another place without losing any data. But in this case, it's going through the POP. So it's the edge to the POP and the POP back to the other edge, I guess. That's right. That's right. The trick there is that you have all of the metadata coordinated in the POP. And so everyone is seeing the same image of the LUN. So hence, you know, it's all synchronized. And you mentioned vVols.

Starting point is 00:24:28 Do you guys support vVols? That's right. We do. I like to say that I've managed to implement vVols twice in my career, which is twice as many times as most people. Yeah, good thing. I think. And then we came here and we did it and in fact you know we are uh we're in the process of getting certified right now before we ga so

Starting point is 00:24:50 we're uh we're quite proud of the fact that we managed to pull that off our team did that once uh at equalogic and we're doing it here again and in this case we're doing it in a in a much more scalable cloud-like fashion which it really fits the original architecture of vVault. Oh, good, because the Equalogic implementation is not one of my favorites. Sorry. Hey, Howard, you know what I like to say? I like to say that this company is so Laz can fix anything he didn't like about the

Starting point is 00:25:17 things that he did in Equalogic. Laz Company, yeah. He always can improve, right? Yeah. It's all fun. So if I have, you know, my development office in Brooklyn where rents are cheap and my main office on Wall Street, then when an application is ready for dev, I can just vMotion it, not even storage vMotion it from one site to another. That's right. All right.

Starting point is 00:25:40 So on the request list that you're building, I want a pre-worm function. Yes, yes, I know. Yes, so do our customers. I want to say, I'm going to move this on Wednesday and have the system be smart enough to figure that out. Right. There's some interesting effects of caching, especially in a dedupe cache, that as long as you have active workloads in both locations, it's a large amount of similarity between what's being cached in the two edges just because there's a similarity in the workloads

Starting point is 00:26:13 if everything's a Windows virtual machine or a Linux virtual machine. But yes, pre-warmed caches are certainly on the roadmap. We were thinking about that as another feature that I think can be very, very interesting for customers. Even if we try to do a cross-metro, not just intra-metro, but also inter-metro, pre-warming is a very, very useful thing to be able to do. Whoa, whoa, whoa. All right. So intra and inter metro, so across metropolitan areas between Philly to Boston, something like that?

Starting point is 00:26:50 It's, you know, remember the origin of all your data. It's all cache. It's all cache. And your canonical copy of your metadata is actually sitting in the cloud with point-in-time consistency. So there are use cases where that's very possible. So you could just move things across the country. Say

Starting point is 00:27:11 it's not Brooklyn, Howard, because Brooklyn and Manhattan have the same weather, and the developers might like Miami. So it's almost a disaster recovery service as well, to some extent. Exactly. So when we think of the lifecycle, we're really thinking about not just the lifecycle of the equipment, but the lifecycle and all the care and feeding of all your data. So disaster recovery is built into this architecture. In fact, that's one of

Starting point is 00:27:45 the things you don't have to worry about. You get synchronous replication to our metro pop, and then you get point-in-time consistency to the cloud. And so you have a whole set of disaster recovery scenarios that are covered by that. What's the point-in-time granularity? Obviously, in the pop, the point-in-time granularity um, well, there is no granularity. That's every transaction. Yeah. In, uh, in the, uh, in the cloud we are, so you set up snapshot policies for your, your LUNs and VMs and your VBALs, uh, as you ordinarily would. Uh, and we. The thing that is interesting is that we push all that stuff out every 10 minutes. So you have all of your snapshots up to about 10 minutes ago,

Starting point is 00:28:34 or perhaps less, depending on when the last time the last push completed. Okay. But that would enforce a 30-minute RTO SLA. Right. Which for a lot of applications is fine. That's right. Yeah. If I can make a quick comment just before you guys move on, which is, you know, I think when we were starting this and we said, okay, well, primary data obviously is hardest thing, has, you know, the most, you know, heavy requirements. But if we don't really tackle some of the additional parts of the lifecycle with backup and DR, in a way we've left the customer still

Starting point is 00:29:04 handling infrastructure. And our goal is for them really not to have to handle infrastructure at all. That requires us to really think pretty holistically about this. And, you know, over time, the hope is that these customers will say, yeah, we don't, you know, we don't need a separate set of software or gear or whatever it is to do the other pieces of this because it's just automatically in the service. And the backup solution is a snapshot versioning kind of thing? Exactly, exactly. So we have a VMware vCenter plugin that does VM consistent snapshots, and we expect to be delving further and further into the application stacks that are living inside those VMs in order to manage their state when that snapshot is taken.

Starting point is 00:29:47 Yeah, once we start talking about backups, there's the state information problem, and then there's the catalog problem. That's right, that's right. And both those things are on the table for ClearSky. We're going to start picking away at that. The problem, from my perspective, with backups to the cloud and primary data in the cloud, and all this stuff is flowing to the cloud is that it's all in the same medium.

Starting point is 00:30:10 If Amazon goes down like they just did, you know, your data is sort of not there anymore. Well, so there's more than one way to get to Amazon. And Amazon, one of the things that it does, I think, really well is that it scatters your data across multiple facilities as well. So we certainly have options for customers that want to deal with that problem with making copies in different regions. So we can actually move copies of your data cross-country to West, if that's what you want. And that would usually when I haven't seen an outage where all of Amazon across the entire country goes down. So you have this east-west thing. And again, a lot of this can be solved

Starting point is 00:30:57 with networking. What we're designing is a system where, you know, things like data centers and buildings are also part of the redundancy schemes that we have to think about in order to keep the availability up. Well, it's certainly simpler than storage networks who would buy any array you wanted and put it in the cloud data center next to yours. That's right. Actually, that was the cloud in 1999. So when you say your POP devices are obviously high availability, high capacity. You mentioned the Edge devices, 6 to 8 terabytes, typical. What's a Boston POP look like device or set of devices look like today? So in the POP, it's really not a lot of gear.

Starting point is 00:31:44 So it's a half rack to start gear uh so it's a half rack to start maybe actually not even a half rack um in fact i don't think we're still yet at a full rack of gear at the pop and we have quite a few customers in beta trials right now and so you know it's it's basically commodity servers two of them you know some uh JBot. And then we have the caching infrastructure, which is also increments of 2U. So you can get quite a bit of cache that represents a huge amount of data in that footprint. But it's all commodity stuff, and it's also highly available. So we have two of everything there. We have a shared storage architecture very similar to what's happening in the edge.

Starting point is 00:32:30 It's happening in the pop, except it's denser because it's multi-tenant. And we're taking advantage of commodity economics a little bit more there for a number of reasons, operational most of them. But basically, ClearSky is building is a software company. And if you actually just looked at the rack, you'd say, oh, that's servers and JBODs. Shouldn't be surprising, but, you know, it's all... All that storage gear looks alike nowadays. Yes.

Starting point is 00:32:59 How is it priced? I mean, does the customer have to pay for your service and then Amazon services separately, or is all kind of buried within your bill, I guess? Yeah, we feel like it's really important that it be one integrated price for the customer and that they never have to be thinking about the pieces and parts that we've put together for this. You know, in the end, we're an SLA to the customer. So, you know, it's capacity-based, you know, per gig per month. We have minimum buy-in in terms of, you know, kind of 20 terabytes and, you know, a year commit. But we feel like the customer should just see that as, you know, any piece of, you know, the architecture that's involved in terms of the edge appliance or the networking or the use of Pop or Cloud or, you know, any of that stuff. That's just all in as well as the operations and support.

Starting point is 00:33:46 You know, a lot of the goal here is the simplification of the model for consumption of storage. So from our perspective, like if we have to, you know, whatever it is, you swap out, you know, drives or, you know, expand or scale or patch or all that stuff, that just automatically happens for the customer versus them ever having to be involved with that. Interesting. So, and behind the S3, I mean, it seems like Amazon has multiple tiers now of storage capabilities. Well, right. And so we're using S3.

Starting point is 00:34:19 We're using the basic S3 service. And, you know And we didn't want the reduced durability stuff. Obviously, durability is the reason we use the cloud in the first place. There may be some use cases for some of the newer things. They have this infrequent access

Starting point is 00:34:36 form of S3, which may be out there, but we're still looking at it. All these things, because we're basically a single price per gig per month. You know, all these things are sort of included in our pricing model. So it's kind of, it's opaque to the customer. But as our costs go down, this is one of the great things about being a service.

Starting point is 00:34:59 As our costs go down, the price to the customer just automatically ratchets down over time anyway. And so I could see Ellen Shutter from here at that thought. Well, we live in a storage reality, which is that everyone assumes that costs will continue to go down. And we're right, you know, look, we're right in the commodity curve, just like everybody else is. So, you know, that's a given. You know, that's one of the bad things that comes along with buying huge amounts of physical storage and putting it in a data center is you're paying today's prices for tomorrow's capacity when you know tomorrow it's going to be cheaper.

Starting point is 00:35:36 You just know that because the history is self-evident, right? Yeah, that's one of the things that always annoys me when I see somebody say, yeah, we bought that VMAX fully populated so that we don't have to touch it for three years. Exactly, exactly. And so in three years, you basically depreciated a lot more money than you had to in order to accomplish that goal. Yeah, the sad part is that I've been in cases where at the end of the three years, they still hadn't put any data on it. Wonder about those guys. So you're obviously tied to Amazon pretty tight. You know, the way we think about the cloud is that just like, you know, Flash and just like Metro Ethernet and just like other things that we're using, it's a component. And, you know, that's something that, you know, Lats has really stressed very highly in terms of the way we've architected stuff. And so, you know, my feeling about this is there are

Starting point is 00:36:28 reasons why a different backend cloud will make sense for different, you know, customers, regions, times in the, you know, evolution of the company and stuff like that. And so we never want to be in a situation where we've, you know, sort of tightly bound ourselves to anybody because I am sure over time we'll be using multiple clouds as backend, you know, even just in terms of, hey, there are parts of the country or the world where the best and closest, you know, public storage cloud may not be Amazon, and we can't worry about that. So... As a truly paranoid customer, I want my data sent to two clouds. For sure. Or we've had a couple of retailers tell us that they would not like their data to go to the Amazon cloud because Amazon is a competitor.

Starting point is 00:37:09 So how about that? Well, and that was the AES-256 is for? Among other things. I actually just have to make the disclosure that I spent some time up in the Clear Sky offices in July and tested the system in one of its beta states. And how did that run? It ran as expected, which, you know, for early beta is always a good sign. Yeah, I would have to say. That, you know, read performance felt like an all-flash array because I was hitting the edge cache, and write performance felt like it was arrays doing synchronous replication

Starting point is 00:37:47 so i you know had that additional latency but it was consistent additional latency you know we're not i'm not saying anything about this particular performance because it was an early beta version and as i was walking out the door last gave me 10 minutes on the next beta version that was a lot faster so it just doesn't make any sense but you know that this if you think of the system as you know a virtual array that synchronously mirrors back to the pop that's how it behaves and you know I can see a lot of use cases you know kind of the whole city of Greenwich and Stanford with all those little hedge funds seems like a great place for this kind of thing. And, Howard, it's time for you to come back and visit us again because, you know, obviously a lot has evolved since you were here in July. And, you know, we're coming to the end of our beta testing completely and ready to head this thing out the door.

Starting point is 00:38:42 So, you know, we look forward to you doing some to you doing some more checking in on how performance looks. Yeah, a whole bunch of questions I would have in the networking, but I'm not a networking expert, so I'm not even sure I'd understand what your answers are. But it's a private network at the metro level, and then it's also a private network to the cloud? Right. That's pretty impressive. Yeah, so it's AWS Direct Connect from the POP to AWS East.

Starting point is 00:39:08 And then it's just amazing how many vendors many cities have for Metro Ethernet. I mean, when I was ending my consulting career in New York, you had your choice of Time Warner Telecom and Con Ed Telecom and Verizon and three or four other vendors in a lot of buildings in Midtown and downtown, you know, in the basement, there were multiple sets of fiber you could connect to. And this is a data center customer where you could have hundreds of carriers, right? You know, we're not doing office buildings. So, it just that that's really um available to us and it's a great component yeah okay uh howard any final questions then no i think we got it ellen last is there anything you'd like to add to the discussion no i always feel with you guys we really uh you know get right into the heart of stuff so i think i think we're uh pretty pretty well covered and just really happy that we could spend time today. We have to admire how Howard doesn't hold back any.

Starting point is 00:40:09 We know Howard's going to tell it like it is, so that's always good. I got a reputation to uphold. Well, this has been great. It has been a pleasure to have Ellen and Laz here with us on our podcast. Yes, it has. Next month, we'll talk to another startup storage technology person. Any questions you want us to ask, please let us know. That's it for now.

Starting point is 00:40:30 Bye, Howard. Bye, Ray. Until next time, again, thanks, Ellen and Laz. Thanks. Thanks for having us.

Grey Beards on Systems - GreyBeards talk global storage with Ellen Rubin CEO & Laz Vekiarides CTO, ClearSky Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.