Grey Beards on Systems - 47: Greybeards talk Storage as a Service with Lazarus Vekiarides, CTO & Co-Founder ClearSky Data
Episode Date: June 13, 2017Sponsored By: In this episode, we talk with ClearSky Data’s Lazarus Vekiarides, CTO and Co-founder, who we have talked with before (see our podcast from October 2015). ClearSky Data provides a sto...rage-as-a-service offering that uses an on-premises appliance plus point of presence (PoP) storage in the local metro area to hold customer data and offloads this data … Continue reading "47: Greybeards talk Storage as a Service with Lazarus Vekiarides, CTO & Co-Founder ClearSky Data"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Howard Marks here.
Welcome to a sponsored episode of Greybeards on Storage, a monthly podcast show where we
get Greybeards storage and system bloggers to talk with storage and system vendors to
discuss upcoming products, technologies, and trends affecting the data center today.
This Greybeards on Storage podcast is brought to you today by ClearSky Data
and was recorded on May 25, 2017. We have with us here today
Lazarus Vickioridis, CTO and co-founder of ClearSky Data.
So, Lazarus, why don't you tell us a little bit about yourself and your company?
Sure. So, I'm why don't you tell us a little bit about yourself and your company? Sure. So I'm the CTO, co-founder, and chief bartender at ClearSky Data.
We have been in business now since the beginning in 2014,
and we built a storage service, which is backed by object storage.
In the initial iteration, it's object storage that's actually in the public cloud.
And we provide, through a network, very high-performance, low-latency block on the front end.
And the back end durability is done via object storage.
We have all of the capabilities that you'd expect for a high-end enterprise class
storage array, except we're really a network and we do all our magic through caching and a network
of POPs and private lines that go back and forth between the cloud and the customer's data center.
You mentioned that you've got a backing store, which is the cloud, and your front end is an appliance at
the data center. And then there's a POP. What's a POP? Presence? It's a point of presence.
Point of presence. It's a point of presence. Back in the day when we were first pitching the idea,
we were calling ourselves the Akamai of private data storage for obvious reasons, which is that Akamai does the same thing.
That would make sense to a VC.
Yes. You have to be the something of something in order to get funding.
The Uber for dog walkers or something like that, right?
That's a good idea.
I think it's already been done, so don't take it too seriously.
I don't live in New York anymore. There's not enough dog walkers. Yes. So, you know, the point of presence, we set it up inside of a essentially a tier three,
tier four data center in the city where we're actually selling the service.
And it's intended to be a latency backstop. So I think one of the less known secrets of what we do is that once you're inside a highly connected building in a metro and you're using private lines for roughly 100 miles around you, you can get very, very low latency connectivity via these private lines.
So in Boston, for example, we are sub millisecond to all our customers who are averaging between 500 microseconds and 700 microseconds.
Wait a minute. Wait a minute. You can't do sub-millisecond timing across a network like that?
Sure you can. This is how it works.
There's a lot of fiber under those streets that got laid before the economy turned south.
That's right.
Oh, I got you.
Believe it or not, so you can't do this on the internet because the internet, it necessitates hops and you have peering points and you have a lot of
uncertainty. But if you have a private line, you can do this. So in my data center, I've got some,
you know, since you're using POP, which is a data comm term, I've got some customer premises
equipment. Yes. That acts as a cache? That's right. It's a
highly available appliance. It is, you know, to you. It's all flash. It has 24 slots. It can scale
up to, you know, a single appliance can support up to about 32 terabytes of flash. Until somebody
makes bigger SSDs for you. Yeah, actually you can't fit that much flash on it right now,
but I'm told that sooner or later that's going to happen,
so we kind of plan for the future.
And we can scale these guys up, so we can have up to four of them
if you really have that much hot data,
but it's really intended to be the amount of data
that you touch over the course of a week.
So it's a substantially large hot cache,
but what it does effectively is to give you the performance of an all-flash environment.
But the durability comes from the data protection that is inherent in our network and the cloud.
So that flash is really just a cache.
We don't have to do RAID.
We don't have to worry about losing data on-prem.
If you sledgehammer that edge, all your data is safe in the POP, and it's also safe in the cloud.
So when you say all of my data is safe in the POP,
that would imply to me that it's a write-through cache, not a write-back cache.
Absolutely. So we basically write through to the POP, and we're taking advantage of this low latency that we're getting through the private lines.
And one of the side benefits of this is that if you have two data centers in a metro connected to the same POP, you're effectively at RPO zero.
And this has been one of the interesting selling points. We have a number
of customers that are taking advantage of this to do DR across data centers in a metro.
This was one of the things I liked best about your solution because it's metro cluster without
all of the detrius that you have to build around it. I know, I don't need to buy V-Plexes.
I don't need to manage the lines.
That's right.
I just have two offices.
How does this RPO zero work?
So I'm data center one.
I'm reading and writing data, and that data is automatically being written backstop to the POP in the local metro.
And I've got data center two.
Are you replicating the data to data center two in real time as well?
No, no, because data center two is just another cache, but the source of truth is that POP.
And so as that POP acknowledges rights to the original writer, it logs them in the journal and then flushes them out to the cloud every few minutes.
But it actually has sort of the metadata map for a particular LUN.
So if you were to do a failover to the other data center, that metadata map, which is actually less than 1% of the overall data footprint of that line,
and we're thin provisioned, so it's not a whole lot of data.
That could get quickly sent through these lines to the other side,
and you're up and running.
So the recovery point is zero.
The recovery time is actually very quick as well if it were a cold cache.
What our customers do is to swap back and forth
between the two data centers from time to time
to keep both caches reasonably warm,
and then your recovery time is also helped by that.
Okay, hold on there.
So I've got two data centers in the same metro,
and the appliances in both those data centers look like one array to the servers in those data centers in the same metro and the appliances in both those data centers look like
one array to the servers in those data centers absolutely to the extent that i can run a
clustered file system like vmfs on top of them yes and it really just looks like one array
so all i'm failing over is the compute that That's right. So they literally vMotion workloads back and forth just when they're not busy.
Wait, wait, wait, wait.
So the cluster file system across the two data centers,
the two appliances, and the single pop would imply that the right data
that's data center one is writing is being replicated
to data center two's caching appliance.
No, it's the final font of truth.
So in actuality, they're both just windows to the pop,
I'll call it the cloud, the pop array in this point.
Exactly.
That is very interesting.
Yeah, the only hard part is letting location 2 know that something in its cache is dirty because location one overrode it.
Yes, and that's actually not hard to do because we just have to invalidate the metadata maps, and we do that very quickly.
Right.
I was almost going to say, why couldn't you do this over longer than metro distances?
But the challenge there is that having this
metadata map in a single array.
And you add the right latency.
And all that stuff across multiple pops is a different game.
As Howard always reminds me, speed of light is not just a good idea.
It's the law. So there are other use cases that we can implement across the network where the latency is much higher, but it's not.
But you could still have an RPO of zero in this case because the data is sitting in the POP.
Right.
That's right.
And it's backed in the cloud.
It's just the recovery time is going to be longer, and you couldn't have the same array across multiple data centers beyond that one pop.
Well, here's what happens.
When you go, if you're going to go across metros,
like say it's Boston to Ashburn, Virginia,
which is something that we're talking a lot about now because we have this notion of a
cloud edge. Gee, I wonder who's in Northern Virginia. Yeah, take a guess. Okay, go ahead.
You know, say you were to fail over to the cloud. What would happen is that we would go back to your
originating pop, if it's still alive, and pick up the metadata from that.
And we could just pick up from that same recovery point zero, but you couldn't have anything online at that remote POP.
That's one of the latency sort of gotchas that you have to deal with. So you have to be active, but you have to coordinate the metadata
in such a way so that you're not reaching across long distances very frequently. Otherwise,
things slow down, things get ugly. So you don't really want to do a stretch cluster from Washington,
D.C. to Boston. It's not a good idea. No, but what you're really saying is when we stretch the
cluster to continental sizes that there's an explicit storage failover.
That's right.
From POP, you know, we're going to failover from POP1 to POP3.
And, you know, that's true for almost everything.
That's right. That's right.
You have to do that.
And, you know, there's some, we do have some interesting caching tricks so that you can take a snapshot of a piece of a lawn and have it available independently in another location. But you can't have the same lawn.
But the RPO zero across continental distances is extremely unusual, isn't it?
Yeah.
I think RPO zero is part of just a standard offering is a bit unusual, I think, as far as we can tell. The implication there is that synchronous write activity, and that means that it's within 100 kilometers, and you're going across 1,000 kilometers or more.
It's only RPO zero over that distance.
If the origin pop still exists.
Exactly.
Exactly.
It's RPO zero.
It's not RTO zero.
I understand that.
And there's a lag,
but where the pop you wrote to has to write back to the object store.
Right.
But it's seconds lag. Yes. It the object store. Right. But it seconds lag.
Yes, it's not hours.
Right.
When we say RPO0, we're thinking synchronous replication,
and so we're also thinking that there's no lag.
And this is an interesting midpoint.
Yeah, it gives you a lot of the benefits without any of the cost.
And so it changes the calculus for DR and also gives you this. call it monthly because you're a service, not a purchase, is going to be roughly equal for equal capacity for your service
and for an on-premises array of similar capacity?
That's correct.
So there's a slight upcharge for an extra access point,
but it kind of goes away as you have larger capacities. So if you're in the
hundreds of terabytes, it's de minimis. Okay. Yeah. But you're saying that I have
Metro cluster level, you know, if I'm going to recover to the cloud and the cloud edge virtual
appliance, you're saying I've got RPO comparable to
synchronous replication and not only do I
not need the Vplexes
and all the add-ons to make it Metrocluster,
I don't even have to pay for the
second array. Not only that,
you don't have to pay the write penalty.
Exactly.
Which is the other side of this coin.
I mean, we're paying some of it.
I don't know.
Well, rights aren't act until they're at the pop.
So that –
Yeah, okay.
That's right.
So that one millisecond on the Metro E3.
500 microseconds was the word that was used earlier.
It's a 500 – I'm actually stating my own survey of –
So in fact, the lowest we've gotten in Boston Metro is 400
microseconds. It's Boston. Yes, but that's your office where I can see the pop out the window.
Well, no, no, actually our office is lower than that, but I don't consider that an actual
customer site. Boston Dunedin is 400 microseconds. Round trip. Round trip. Wow. That's like,
you know, when I did DR consulting in New York, it was, you know, okay, you have to get across a
river. Wait, wait, wait, wait, wait. Let's talk. So the data has to get from the host to your
appliance and from the appliance to the pop. You mean the time from the host to the pop acknowledges the write is 400 microseconds?
The time we're measuring is from our appliance to the pop.
That's the round trip.
So anything else is additive.
But if you think about that, I've gone through all the math for latency or like a typical network switch. It's not a significant add on top of what you'd
ordinarily see in a reasonably complex network. So it's not noticeable. Let's put it that way.
I was going to say, I think you guys ought to go to NVMe here.
NVMe in the pop?
Well, yeah. Well, NVMe, not in the pop, but in the appliance.
So that's definitely a roadmap item.
It's interesting where the server roadmaps have been going and the architectures have been going
because that appliance is a clustered appliance.
It's highly available.
And to do the same thing with NVMe, you need a shared NVMe environment.
Yeah.
Those chassis are just coming to market.
Yes.
I've seen a couple.
But the dual-ported NVMe SSDs only have two lanes per port,
and so the advantage starts to fade with limited bandwidth.
That's correct.
But we are waiting for the ability to sample some of that stuff.
It does sound intriguing.
And one of the great things about being a software company, really, is that we'll just port it over and it should just run and run faster.
Yeah, and it should be great for things like bad SQL queries.
Exactly.
That are heavily read intensive and they'll just hit the cache.
It's often easier to just throw money at your hardware than fixing your SQL queries.
Yes.
Okay, so now we've got this stuff
and you've got the standard set of data services like snapshots, right?
That's correct. So you can do things that you would get in not just a primary storage array,
but because we're backed by the cloud,
you don't really run out of space. So you don't have the space constraints that you would in a closed system. So you can keep snapshots for indefinite periods. Hence, you have sort of
this implied backup. We are essentially leveraging that for some of our backup functionality that we
recently announced. So you can set policies to keep snapshots for extended periods of time,
you know, one year, two years.
And we keep a catalog of what we know is in the snapshot if it's VMs.
And that is one of the benefits of this.
The other interesting benefit is that in the...
Okay, wait, wait, run that back a second, Les.
Yes.
You keep a catalog of VMs?
Yes. So is this
via, you know,
vVols and VASA support, so you know what all
the VMs are?
We use the...
Actually, it's a combination of all
the vSphere APIs, so we have a
plug-in, and when
we take snapshots of VMFS LUNs, we keep track of
everything that's inside of them. Okay. And that's searchable somehow? Yes. It's a database I can get
to? Yes. You can go into vCenter and we keep a number of different metadata fields on your name of VM, all sorts of the time the snapshot was taken, all manner of interesting things.
And so you can search by all these criteria.
And all the snapshots are really residing in the cloud someplace, right?
That's right.
So you have this location independence.
So it's not just restoring it to the original location.
But if you wanted to do DR in the cloud with that cloud edge, you could do it that way as well.
What's the cloud edge?
So if we take that edge software and we run it in that aforementioned northern Virginia location as an you can uh you can essentially create a nice guzzy sand
inside your uh your amazon vpc and connect ec2 instances to it oh my lord this is cloud dr yes
well it's cloud dr and it's it's really cool for cloud bursting yes Yes. Because you can take a snapshot, you know, you can generate
your data from your primary database that runs on-premises because you have 4,000 applications
that need it. Take a snapshot, mount it in EC2, and shove it over to Elastic MapReduce to do
analytics on it in 40 lines of script for take a snapshot and mount that from over here.
Absolutely.
And, you know, this is bursting is one use case.
Cloud DR is another use case.
If you actually even wanted to migrate, you could do that.
Someone would want to migrate off of ClearSky?
Why would they do that. Someone would want to migrate off of ClearSky? Why would they do that?
No, they wouldn't do that because we offer them data services that are second to none.
So that's a lot of the story.
We've been busy building a lot of the application orchestration infrastructure, and we're going to continue to do that.
But some of the recent announcements have been around the plug-in for VMware to automate
all these things.
And the other thing that we're waiting for, which will make it sort of the missing link
in the solution, which will make everything much easier, is when you can actually have
a bare metal vSphere server or ESX server sitting in the cloud.
Coming, I hear.
Right.
You may have heard about that.
But what this does is it makes it possible for you to pay for your data once and then use it in the cloud and on-prem.
And so definitely big economic
benefits. So there's a lot of interest from the field right now for folks that are relatively
small that would like to not have a second data center and would prefer to use Amazon
as their DR strategy. And this would make it very easy to do.
So the pricing for your service is on, I guess it's on a monthly basis, and is it capacity?
Yes, yes.
It's some cents per gig per month.
And, you know, overall, if you do a 36-month depreciation of a typical hybrid storage array,
and perhaps even some of the all-Flash players
were maybe a little bit cheaper.
And then certainly a lot cheaper
if you count the fact that
you don't have to buy two of them and connect them.
I don't have to buy two
and I don't have to pay $1,500 to $3,000 a month a rack
for Colospace to hold the second one.
Exactly.
Speaking of Colospace, where are you guys located?
So you've got Pops in Boston and Northern Virginia.
And New York, soon to have one in Northern California and Dallas.
And everyone always asks me, where does this stop?
We think there are about 15 or so major metros in the continental United States that make sense,
that would largely cover the market.
And that's what we're working towards as we grow the business.
Huh. That's very interesting. Well, we've come to just about the end of the show.
Laz, is there anything you'd like to say to our listening audience? Final thoughts?
Well, you know, I just started to just scratch the surface on some of the things that are, you know, unique about the service.
But, you know, I invite anyone to just come to our website, www.clearskydata.com.
If you actually want to try the service, we have a hands-on lab that we allocate on a time-slice basis, a few days at a time for people that want to, you know, try it out, see if it makes the performance that they need for their primary or secondary storage,
and, you know, try it out and see how all these use cases work for them.
Just, you know, poke around the website, download the white paper, and try it out.
Howard, any last questions?
No, I think we've covered the basics.
You know, Dave McCrory has been talking for years about data gravity and how data attracts applications and it's hard to move.
And ClearSky is one of the few solutions to that problem for applications like cloud bursting.
I think it's great stuff.
Yeah, interesting.
Data anti-gravity. I don't think it's anti-gravity,
but it certainly takes a lot of
the weight out of data, let's say
that much. Well, this has been great,
Laz. Thanks for sponsoring our podcast today.
Our next monthly podcast, we will talk to
another startup storage technology person.
Any questions you want us to ask,
please let us know. That's it for now.
Bye, Howard. Bye, Ray. And thanks again, Laz. Thank you, guys. ask, please let us know. That's it for now. Bye, Howard.
Bye, Ray.
And thanks again, Laz.
Thank you, guys.
It was great talking to you.