Grey Beards on Systems - 34: GreyBeards talk Copy Data Management with Ash Ashutosh, CEO Actifio
Episode Date: July 18, 2016In this episode, we talk with Ash Ashutosh (@ashashutosh), CEO of Actifio a copy data virtualization company. Howard met up with Ash at TechFieldDay11 (TFD11) a couple of weeks back and wanted anoth...er chance to talk with him.  Ash seems to have been around forever, the first time we met I was at a former employer and he … Continue reading "34: GreyBeards talk Copy Data Management with Ash Ashutosh, CEO Actifio"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here, and Howard Marks here.
Welcome to the next episode of Greybirds on Storage, a monthly podcast show where we get
Greybird storage assistant bloggers to talk with storage assistant vendors to discuss
upcoming products, technologies, and trends affecting data centers today.
This is our 34th episode of Graybridge on Storage, which was recorded on July 6, 2016.
We have with us here today Ash Ashitosh, founder and CEO of Actifio.
Please tell us a little bit about your company, Ash.
Hey, Ray. Hey, Howard. Thank you for this wonderful podcast.
I'll give you a quick summary of Actifio.
We were founded in 2009 to deliver a very disruptive mission, and that was to make copy
data a very valuable business asset. And over the last seven years now, we've managed to
do that very well. We're well over 1,200 of some of the largest enterprises, and 60
of the largest global service providers
are powered by Actifio.
We have changed the economics of how organizations make and manage data.
And then, more importantly, we have transformed many of these businesses to be digital enterprises.
So now the obvious question is, what is copy data?
And maybe you're going to ask me that next.
So what is copy data, and what is copy data? And maybe you're going to ask me that next. So what is copy data and what is copy data management all about?
So go back to how business applications produce and process production data.
You have applications, your CRM, your all kinds of business applications that truly go out and produce data.
They process them.
Your business runs on that. And it turns out businesses also create many, many copies of this production data, sometimes
for protection, sometimes for availability, business resilience, compliance, developers
need copies, QA, tests, analytics folks.
In fact, the model is pretty simple, right?
Every time you want to run an application that requires data, you make an independent copy of it,
stand up an entire infrastructure, and then you basically run your application.
And the next result here is, according to IDC, there's about 13 to 150 copies of the exact same data
and all the infrastructure that supports these redundant copies, to the point where $48 billion last
year, 2015, $48 billion was spent on managing redundant data, which is a pretty interesting
opportunity to go back and tackle.
When I was at a prior company, we did a study on how many copies there were, and this was
back in the middle 90s.
And even at that time, there were nine copies of data, typically, on average.
Yeah, yeah.
And as businesses became more and more typically on average. Yeah, yeah.
And as businesses became more and more dependent on data, those copies increased,
partly because people are using copy data for a lot more things.
People are running businesses on almost hourly analytics now.
There's a lot more development going on now.
And this copy data problem became a massive problem.
And the old model of just making a copy, running infrastructure in a day and age where everything needs to be instant in a much more available environment just did not work.
In the old days, these copies were typically spread across multiple devices.
There were backup copies.
There were storage copies.
There were disaster recovery copies.
Each of these was a separate and distinct product,
as far as I could tell. Oh, yeah. That portfolio of products proliferated even more when developers wanted a copy of the database for development. Isn't that the problem snapshots were supposed
to solve for us? Well, I mean, snapshots is only part of the solution, right?
If a developer wants a copy, I'll just take a snapshot. It's just metadata. Who cares?
That's right. That's right.
And I think storage systems were a source of fixing some of those issues
by having snapshots to provide better restore, for instance, slow backups,
snapshots for dev and test.
But the reality was a couple of things.
As applications got more sophisticated, you're dealing with larger amounts of data.
And access to data wasn't just making a copy of a volume or a snapshot of a volume,
but actually much more sophisticated scrubbing of data that's required,
the removal of sensitive information before I made it available to developers.
And sometimes we are sharing data with organizations that are not even part of the same company. And so this whole notion of better managing data became a big issue. And
really there were two constituencies. There was the constituencies of operations who were the
custodians of data, and their job was to meet the governance and compliance requirements, backup,
resilience, business continuance, and make sure it is highly available.
And then the whole new set of folks emerged who were the consumers of data.
And these were the developers, the analytics folks.
These were the people who were doing QA.
These were the people who were trying to meet compliance requirements.
As long as those requirements were not that big, it was okay 10 years ago.
Then Uber showed up to prove that you don't need cars
to be the biggest taxi company.
Airbnb showed up to show you don't need to be owning
a hotel industry to be the biggest hotel company.
And next thing you know, digital businesses became
an absolute important part of any organization.
You see a sharing economy as part of the data world?
I mean, I don't understand that, but I can see it.
I was talking to a guy for Twitter, and they have effectively a data bequest.
You can ask for, give me all the Twitter data for this keyword over the last two years or something like that,
and they'll look at it, and they may give it to you or not,
but it's really a collaboration on part of their data.
Yeah, absolutely.
I absolutely believe, end of the day, any application that anybody develops is probably,
if it's successful enough, it's going to become open source because it's so successful.
And the only thing an organization has as a sustaining competitive asset is the data.
This data relevant to the domain.
In many cases, you look at some of the most successful companies today, they are run by
the fact that they have more data about you and about their customers than anybody else
does.
Oh, yeah.
Yeah.
Absolutely.
Right.
And so there are organizations that know more about me than I do because they have a lot
more data about me than I do.
And that's the reality.
And that's the reality of every organization.
Everybody's trying to get there.
Well, everybody wants to be Target
and send the pregnant girl discount for her folic acid
before she tells dad that she's pregnant.
Exactly.
But that's where the data masking
and context-sensitive nature of the Actifio starts to come in.
That is the exact difference between taking a snapshot for backup and development versus Actifio starts to come in. That is the exact difference between taking a snapshot for backup
and development versus Actifio Coffee data, where it's all about context.
So you guys actually scrub the data, scrub sensitive information out of data
and stuff like that?
Yeah, so at the end of the day, what we've done is there were three big tipping points
that happened sometime around 2008, 2009.
One was adoption of virtualization on the server side.
On the server side, VMware became a predominant way
to consume the compute resource.
And people weren't afraid to use servers
on even most business-critical production applications.
Second was replacement.
I think you had Brian Biles and Hugo earlier
in one of your podcasts.
Those two folks pioneered the whole displacement of tape with disk.
And the last one was, you know, storage became an absolute commodity.
It was easy to come back and leverage disk in many different ways or random access media.
And we took the opportunity to introduce virtualization technology into this massive copy data silos and completely change.
And we began with starting at the root. How do you capture copy data silos and completely changed. And we began with starting at the root.
How do you capture copy data?
How do you manage it throughout its lifecycle with a single SLA?
Because ultimately businesses are trying to manage an SLA for the application data.
And how do you infinitely reuse the same data with the appropriate context?
Now, reuse doesn't mean just make a snapshot copy of it, but give the appropriate context. Now reuse doesn't mean just make a snapshot copy of it,
but give the appropriate context. When Ray is in Singapore, the data that Ray sees as a developer
would be very different than when Ray is in Denver, because the laws of Singapore are very different.
So that context awareness is from a user's perspective what makes it relevant, but from an
operations perspective, we've dramatically changed just the nature of what it means to manage this
massive amount of copies to something as simple as what VMware brought to servers.
So effectively, we've brought in the same paradigm.
This is sort of this, I'm not sure if the term is data governance, but there's a, you
know, like you can't take German data outside of Germany and, you know, English data outside
of England, stuff like that.
Absolutely.
Absolutely. And we do that. Absolutely. Absolutely.
And we do that.
And that's part of the SLA.
The service level agreement for an application defines a lot of things.
It defines, obviously, what is the frequency which the data needs to be captured, how long
I need to retain.
But more importantly, what boundaries can I cross?
And who are the people who can access this?
And when somebody accesses it, what are the components of it that I need to mask or scrub
based on either just the nature of the data or the nature of the location or the nature
of the person?
And I think there's a lot of this stuff about storage, the domain that we all used to live
in and grew up in.
There's a lot of this stuff that is done by operations people on a daily basis.
We just combine what used to be the whole process of managing data
and decoupled it from storage and allowed people to come back
and truly run the management of data as it is supposed to be,
you know, through an SLA.
And it became even more important when cloud emerged.
At that point, all I know is I know there's an application running
that creates data, and I need to access and manage that data. I have no idea what storage it runs on. In fact,
I don't even know what data center it runs on. So does Actifio, I mean, like I said, I came from
the storage world, obviously, but copies were always on storage or on tape or, you know,
somewhere else. But I mean, does Actifio work with storage products and take copies of the data and scrub them and provide governance and compliance on top of them? And how does it work
with the cloud? Because the cloud is a whole different animal. Yeah, what we started out with
is the fact that there are only two things people care about, the applications that you're running
a business on and the data that these applications are consuming. The rest of everything is just an
API. The rest of the infrastructure is just an API.
And so what we did was to treat infrastructure like an API
and then be the middleware that captures data
directly from the application.
So we captured it from VMware, from Oracle, from SAP, from SQL.
We have no idea if this Oracle is running on a storage system
or if it's running on a tin can connected to a string. We have no idea if this Oracle is running on a storage system or if it's running on a tin can connected to a string.
We have no idea if it's running on AWS.
We really have nothing.
We have no clue.
We just assume infrastructure underneath is an API
and that we really cannot access it.
We can access the application.
So it begins with capturing the data.
And this is the first change we made
when Brian and the data domain team decided that, hey, tape is no longer
needed, we also realized, hey, transformation from random access to streaming format is also
not needed anymore. Let's capture data in its application native format, keep it, and instead
of writing to disk all the time, you're capturing it every so often. And we are a software that
captures it. Underneath us, we have pools of storage.
So yes, we use storage, but we use any storage.
With the user, through the API, you might give us different classes of storage.
You might say for shorter-term retention or for people doing performance testing,
they need a fast pool, and you can add anything to that pool.
There's some for medium-term retention.
There may be some for a longer-term DR site and then eventually for the vault.
But you are now converting storage into quality-of-service pools that you can keep adding, deleting, and I can access through an API.
We are the middleware between applications and the underlying infrastructure, managing the entire lifecycle.
And when you need, so when an application needs a certain,
you want SAP from three days ago, we make SAP.
We know what SAP is because we've captured blocks of data
from the application, from SAP in this case.
And then we added a whole bunch of contextual information.
We added information about the fact that this block belongs to this application.
It belongs to this application from this time, and
then there's another pin that says what's the SLA for this. End of the day what we look like is this massive distributed
object file system where the objects are data surrounded by its entire context. In many ways if you go back to
20-30 years ago when we all built storage systems and storage protocols, the one thing we didn't do was we didn't provide any context to the data.
And we didn't provide any routability to the paths.
Storage was always something that very tightly connected to servers.
And somebody else had the context.
We were responsible just for dumb blocks of files.
So Actifio is middleware in this environment.
Absolutely, we are middleware.
And you use various storage and various, you have
I mean,
POSIX compliant storage system
or storage middleware?
Underneath us, we use
any block storage or file storage
whether it's, we virtualize
the underlying storage as pools
of storage for us to use.
The best analogy we can get everybody very quickly to understand Actifio is VMware for data.
You think about all the data that's out there, we virtualize all that stuff,
and then we use the underlying infrastructure in a virtualized form to not only place the data in the right place,
but also make it accessible to any application instantly. Because we know the context.
There's billions and billions of objects floating around, and they're going everywhere.
They're going between Germany and the EU, and now that Brexit happened,
maybe some of this data from the UK doesn't go to the EU.
Certainly some of the data from the EU won't go to the UK anymore.
That's probably true. In fact, it's a phenomenal opportunity for us, right, for people using Actifio.
They just turn the SLA off to say, okay, you're just going to turn off the SLA that says don't go to these locations or these data centers.
We call them profiles.
And suddenly, you know, data stops.
I'm not going out and changing a whole bunch of wires.
And this is the power of an underlying infrastructure of virtualization.
So we are a middleware.
We are, like I said, technically, if you want to get to the guts of it,
we are a massive distributed object file system with one single global namespace.
You can reach out into ActiveIO from any place, call an API or call a UI, and you want to get 14 copies of your database that is 50 terabytes big in some remote location.
Within a few minutes, you see the entire virtual copies of the database show up, or you want to bring up the entire data center.
In fact, we are one of the world's fastest business resiliency solutions that SunGuard has been shipping almost 18 months ago,
bring up the entire data center in 20 minutes or less, and used to take them weeks before.
This ability to virtualize data, make it completely independent of the underlying infrastructure,
manage the operations, the entire lifecycle with a single application.
But data always has gravity. I mean, you can't move terabytes of data around in a minute.
You could virtualize it maybe, I suppose.
Yeah, yeah.
So that's a very good point.
So let's take the example of a database with 100 blocks.
You are at location A.
First time we capture all the 100 blocks, and you say, hey, here's my SLA.
The SLA is that after four months, really, you don't need
the data locally and need it to be off to the remote side. Now, month five comes along and you
say, I really want to go back to that old five-month-ago database that's 100 blocks big.
Normally, what you would do is you would have to fetch the entire 100 blocks back from the remote
side. Remember, we are a distributed object file system,
and we have one big namespace.
What we do is we know that five-month-old image of that database
has 100 blocks.
It turns out 75 of those blocks are still here
because they're still changing, and they're still local.
All I have to do is fetch the remaining 25 blocks.
And it's this notion, it might not be just one location,
it might be other locations.
The ability for us to truly have one big namespace that allows us to recreate the object that you
want, whether it's a data center, a file system, or a database, by just fetching the least amount.
And that is the one thing you'll see with ActiveView. Every part, whether it's how we capture
data, how we move it on the network, how we restore it, how we reuse it.
There is nothing more efficient because you just put it under one big namespace
and we know the context of every one of those locations that we're managing.
Okay.
So, gosh, this is pretty complex.
Well, ask Howard.
First, it's the idea and there are some implementation issues
there might be a couple of other ways to solve the problem but by moving completely away from
the concept of we're backing up to tape now if we're not backing up to tape, and unfortunately for 90% of the world, even the smarter end of the at some point goes, oh, right, this is
that disk thing that's pretending to be a tape. Let me create a tarball. By moving the protection
concept up to the data, now everything is just deltas and metadata. It's not just protection.
This is real storage. They're talking about the application gets their data through Actifio.
Absolutely. Copy data.
It's not just copy data. It's the primary data, right?
No, no, no. This is important, Ray.
So we're not touching anything that is production data.
So we're basically saying your SAP Oracle is running now.
At least we are saying that.
But the reality is when 400
developers or 8,000 developers now that we have at this bank are developing their applications,
we are the production data, but it is still not the real production data. It's not what your
business is running on. We're not signing up to say, please run your business on Actifio. We're
saying, guys, there is more business happening on copy data than on the production data.
There are more users running on copy data inside the organization.
There are more people running analytics.
There are more expensive people doing development, a whole bunch of people doing QA, test, and
backup.
Okay, okay, okay.
So it's really protection data that's being managed as copies of data, but much more sophisticated management.
This whole compliance scrubbing and all the governance stuff is really unusual, right?
Yeah, and it doesn't have to be protection.
We now have probably some of the largest financial institutions who don't buy ActiveView for backup at all. They buy it purely as a way to allow their
developers to deliver Oracle as a service, to deliver MongoDB as a service, to deliver SQL as
a service. So that I literally have, so think about the developer. The developer comes in,
logs into this portal and says, okay, I want to check out my source code. And I've integrated
that whole checkout process in GitHub with ActiveView.
So for the first time,
you're checking out your source code,
and automatically your data that goes with that version
also gets checked out.
Then when you build your stuff and check it back in,
continuous integration begins,
where all the data required to run the compilations at night,
all of that integration starts.
Then there's an entire test process.
You made, and this is a case where there's nothing to do with data protection, but this
is about making data the new API that...
Wait a minute, you guys are plugged into GitHub?
I mean, you're plugged into Jenkins and all the DevOps testing and all that shit?
You're kidding me.
Absolutely.
We have solutions that entire development team can run the
development chain without ever calling an operations guy or asking for more storage this is actually
this is what we all end of the day if you assume that storage box is an api where i can just store
blocks what you need is something that is that is actually aware data is the other like how would
and i talk about
there's a reason they call it a data center
because that's where the data is. They don't
call it a storage center. They don't call it a
compute center. Well, we used to.
We used to, yeah. That's right.
Ash, you're clean-shaven
but, you know.
You've been around a little while.
I'm pretty confident that there's some
white going to come in there should you let it grow.
It does.
It does.
It does.
Not gray beer, but yeah, white.
So I think what we've done is to make data the new infrastructure.
And because of its API, because it's context-aware, I can just integrate with anything.
And this is the reason why people are able to
build their own entire orchestration tools. There are three new startups now that are built on top of Actifio because you can just call an Actifio API and get any data, any application, any system
of record for the enterprise. There's a new security company that's built around the concept
that I don't need to
secure firewalls and networks. I'm just going to protect data. I'm just going to call ActiveEO
and use that as a single source of truth for all the data in the enterprise.
There's one of our customers who has built an integrity checks service. He sells an integrity
check service that validates that all your applications are truly recoverable seven years from now and you're able to meet the compliance requirements.
This is outside, obviously, in Europe.
And so you have all these people who are now building applications, knowing that data is available through an API.
Very similar to think about how wonderful it is for all these guys who go to AWS, call an API, and just build their application.
Now combine that with they call an API and get the data they want,
production data or copies of production data.
That's what we've done.
It is.
It is fascinating.
I mean, we haven't been sitting on our hands for seven years.
No, no, I understand.
You haven't, but you have been somewhat quiet.
We have. That is definitely...
I think Howard and team were giving us feedback, Ray,
that, hey, guys, this is some excitement going on here.
If we look at things just on the protection side,
and certainly protection is the least interesting thing you do,
although we will get to some parts of what you do for protection that I really am fond of.
Yeah.
But if we look just at the protection side, 10 years ago, I would get up at seminars, you know, when I was doing backup school for TechTarget, and say that the backup application is the stickiest thing in the data center. Absolutely. That nobody wants to change and people use the backup application they know and loathe.
Yep.
But over the past five years or so, you've come around, you're doing, you know, you're
a private company and not saying exactly how much business you're doing, but you're doing
hundreds of millions of dollars a year.
Veeam is doing hundreds of millions of dollars a year.
So people are a lot more willing to make a change there
than they were in the past.
Yeah.
Because you're changing from net backup to networker
was changing from a 57 Chevy to a 57 Baniag
didn't make a lot of difference.
I think there was an item on Storage Newsletter today about Cohesity doing real well.
Yeah.
And so, yeah, there's plenty of players in this space that have taken on this data protection side of things.
But, I mean, this is beyond that.
Well, it's now that we have you know
we have to make an external copy for protection yes yes and so when i you know when people talk
about copy data management there's the external copy data management that that activio does
which solves the protection problem as well and then there's the in place data protection which is i'll take
advantage of snapshots yeah and if i am a clever guy and i have jenkins and i have a disk array
that has a cinder driver then i could set it up so that when my developer logs into github
he can make a clone of the snapshot of the Oracle database from last night and get similar functionality.
Certainly not the PII filtering and the like.
But I still need to make a separate backup because if that whole array blows up, I'm a dead man.
Exactly.
And that's why I think there are two different approaches. The approach we took was to go after the root cause,
which is why the copy data problem originated to begin with,
because there were multiple different sources of where these copies were being created.
So we attacked right at the source.
Now you have two different approaches.
You have the storage guys saying, just like what you said,
hey, I'll give you a snapshot.
And, of course, multiple attempts have been made to redefine snapshot as a backup,
now as a DevOps, and the problem persists,
which is when the production system goes down,
so do the 8,000 developers twiddling their thumbs
waiting for the snapshot to come back up.
Yeah, and the other side of that coin is you're responsible for performance
for all my developers now.
Yeah, that is the other big part.
So you're, you know, I got
some storage devices behind your
appliances, but you're in-band
and how fast it actually
works depends on how fast your
metadata object store is.
Yep, I agree.
And I think, Ray, to your point about why
all these backup companies are
coming up, and they really are, Howard's point was very well taken,
which is for a while there were about five companies that owned the market share,
and they kind of just had a very steady market share because I swapped him out for this one,
the other one swapped out for the other one, and it was pretty same.
Now what happened was one.
And everybody fought for the guy who was outgrowing backup exec.
Exactly, exactly.
And now what happened was the sheer size of the data
just cannot support the old backup model.
In fact, I have an article, Howard,
I wanted to see if you or Frank,
maybe co-author here,
tipping the hat to Brian and Howard Patterson.
The article says dot, dot, dot, and now backup sucks.
Well, I mean, here's the thing I was most impressed by at Tech Field Day
was when you guys said you could do change block tracking backups
without being in a VMware environment.
Absolutely.
Because when I spent a lot of my life in the backup world,
and when I talked to Curtis Preston,
who still spends his whole life in the backup world
because it's kind of all he can do,
there were two intractable problems.
The first was I have a NetApp filer.
It has six million files on it.
Walking the file system to figure out what's included in the incremental takes four days.
Yeah, what's changed, yeah.
Yeah, just the determination of what's changed took two days.
And the other problem was, I have a four petabyte Oracle database.
You just can't copy it.
There is no window big enough to make a four petabyte oracle database you just can't copy it there is no window big enough to make a
four petabyte copy yeah and change block tracking is just the right way to address both of those
problems yeah by saying i'm going to do change block tracking on a file system i don't need to
walk the file system anymore i just back up the change blocks now all of a sudden i can do
incremental backups on my databases where you know database vendor said you can do an incremental backup,
but it was really just a log dump restoring from it.
And apply.
Yeah.
Restoring from it was okay.
So we need the storage guys and the DBAs all night with a lot of pizza.
Talk about how you manage that a little.
You're not doing continuous data protection per se.
No,
you're not. You're doing change block tracking. So se. No, we are not.
You're doing change block tracking.
So at the point in time when they take a copy, the first copy, you create an object element for this storage copy.
And then as further copies are created, you are doing a delta of that to determine the change block.
Is that how this works?
It does.
But the question I think Howard was going after is how do you efficiently do those deltas
because the data size gets big.
And what we've done is to combine two things.
And one is you need a way to have a consistent copy being made of the application, which
means all the requirements of queries in the application makes sure saying, okay, I'm going
to now make a copy of it. But there are various ways to do that. You
have to do it with VSS and all kinds of other places to go do that. But the important part
is, okay, now that I have a consistent copy of this, what in this copy changed? So what
we have is we combine that with a bitmap that we keep of blocks that are changing. And it's more of a flipping a bit rather than CDP,
which would be extremely expensive.
You're capturing every transaction.
And so combining those two,
a bitmap that's constantly being flipped each time a block is written,
and that consistent copy of the data allows us to be very, very efficient.
So, wait, wait.
You're not just a backup product here.
You're actually storage middleware that's maintaining a bitmap of change blocks.
You're allowing the stuff to go to primary storage without further ado,
but if it needs to be copied, that's when you get invoked?
Yeah.
The way I understand it, there's an agent that sits on my database server.
That agent builds a change block journal.
Not the data, just block seven changed.
And, of course, if block seven changed 400 times, it's still one entry because block seven changed.
Then we quiesce the application, and we take a storage system snapshot.
And then we mount the storage system snapshot and grab the blocks that are in the journal.
And that's our copy.
The agent's really lightweight.
All it has to do is keep the change block journal.
So it's not in the data path.
This agent's not in the data path as much as sitting alongside the data path monitoring change block traffic.
It's in the data path because it's the only way it can get the LBAs,
but it's grab request, save LBA, pass request on.
The latency implications are really minimal.
Right.
So we're not terminating a SCSI request or a file request.
We are watching a block go by and say, there you go, that's block seven, flip that one.
Okay, there's block nine.
We're not adding any latency by terminating a request and restarting a new request
as if you were an in-band, in-the-path driver.
And I think that is important because ultimately there are enough tools.
And a lot of things have changed.
Applications have become much more manageable.
The awareness of applications
to be better protected and better backed up has become even NoSQL databases now. As they've come
out, they've realized the need for them to be protected better right from the start, right?
People have realized that data is important. If you go back 20 years ago, you had to work through
a lot of work when backup was the last thing people considered. It's a brilliant concept
that the people developing
database engines consider
that they have to be backed up.
Yeah, yeah, exactly.
You're very facetious.
Well, no, no, I'm not.
I'm remembering
the days when
open file management
was a problem because somebody left an Access database application open overnight and couldn't back that file up because it was locked.
So to be able to go, gee, the backup guys actually, you know, the database guys actually thought about backups and their only thought wasn't the DBA will periodically perform an ASCII dump to a disk,
increasing the amount of disk needed by a factor of four.
And the backup guys can back that up.
I like that idea a lot.
Yeah, and then if you look at what's happened, Ray, over the last three, you know, six years,
out of the six years of, let's say, four and a half is when we really started shipping in volume.
First four years were shooting fish in the barrel,
going after the entire data protection market.
It was just basically saying,
hey, how big is your database?
10 terabytes? How long does it take you to restore?
How about... A week.
A week to eight days?
How about I bring it up to you in like two minutes?
Are you kidding me?
No, let me show you.
And you lost the sale because he couldn't believe
you and you went back a couple of months later and said,
we can do it in half an hour.
There is some of that.
There are customers who used to say,
hey, we can instantly make
entire data available to
users. And the guy would say,
no, can you slow it down so you can only do it
four times a day because I don't want
to just bring it up a hundred times all over the place?
Or disaster recovery, same thing.
For us, this notion of disaster recovery or business resilience is nothing more than I've got this distributed file system.
Data is flowing from one location to another.
You're just accessing data or reorchestrating an entire data center or specific applications on the remote side instantly.
It's not like a PhD. You don't need a PhD anymore to go back and run businesses.
And by the way, this is across VMware. A lot of the companies you see today
in the copy data space jump straight to VMware, whether it is, you were mentioning
some of these new companies that have come along. A lot of them
jump straight to VMware because everybody uses the VMware API.
The VMware API makes life a lot easier
except for the fact that they keep finding bugs in it.
Amazing number.
Not only bugs, there are lots of, you know,
stun issues, application stun issues,
database stun issues,
almost making it impossible to use.
Yeah, but I mean, we're not talking about NATs.
We're talking about giant ants from nuclear tests like in them.
Oh, yeah, absolutely.
Absolutely.
I think we've made our contributions over the years on some very intractable issues
that have become major issues in the customer base. So I think the important part was to make sure that we just didn't cover VMware only.
It was important that the last AS400 sitting in the back room was also managed the exact same way.
That old machine with AIX was also managed, and those four HPUX machines.
Okay, I can't say anything about the HPUX machines.
It was absolutely with all the time i spent in the casino business those as400s and rs6000s
ain't going away yeah absolutely they're not and little old ladies get upset when the slot
machines don't update right right so let me try to understand so the agent is sitting there i
understand it can maintain a change block tracking but let's say I want to do a restore at this point.
So at that point, the agent becomes more or less a primary access to this virtualized storage device, right?
No. So the agent has nothing to do with the restore part.
So now what happens is the agent is there to capture the data, give me the bitmap, and said, from then on, think of us like a virtual storage system.
We have this entire system of record history for the last seven years of every application,
every change that has been captured.
Now we have billions of objects floating around across multiple locations.
You take one of those access points, and what we've done is not try to go back and do the
same mistake backup guys did. Instead we provided a standard IO interface,
a fiber channel, iSCSI or an NFS interface or a NAS interface
into Actifio. So you go to that Actifio interface point
and say, I need this Oracle database from yesterday.
But think of, it's no different than accessing a storage system except the
volume or the data that you see on the other side of the wire is something we synthesize immediately at that point.
So we just took those.
So you basically say, present me a LUN that is this Oracle, the LUN, this Oracle database was on as of this backup point.
Yeah, and then we synthesize that.
It doesn't exist, right? Right. It doesn't exist in its full Yeah, and we synthesized that. It doesn't exist, right?
Right.
It doesn't exist in its full format,
so we synthesize it, we put it back into...
It exists as much as it does in any deduplicated data store.
Yeah, yeah.
It's all about metadata magic and assembled pieces.
Exactly, and then because we've kept it
in application-ready format,
we're able to present it instantly as opposed to the old tape business of reconverting it back and putting it back into some formats.
And so this is where we become the production system.
At that point, the VM, the server is actually talking directly to ActiveIO.
And this is where it becomes very, very interesting.
So now, at that point, people are beginning to pick and choose what pools I want this
restore to happen.
We have SAP developers.
They're doing dev and test and QA.
They have one pool.
This SLA is set up to say, go to these pools of storage, which are lower performance.
And pre-prod actually goes to a flash pool of storage, what they call high performance.
It could be any flash.
In fact, I could combine 10 terabytes of Xtreme IO,
and then Pure gave me a phenomenal deal.
I had 20 terabytes of those guys, and somebody else came in.
Another guy gave me a phenomenal and gave it to me for free,
and I've combined that into a pool.
To us, it's a pool with a certain quality of service.
But based on the role, so Ray comes in as the person who's going to do performance testing
on pre-prod right before you release.
Suddenly the data that Ray sees, you won't even know,
but it shows up in this pool because that's where the operations guys
have set the SLA to.
Data shows up there, you're running your tests, and you're off.
And we've synthesized the data in the right places.
I mean, underneath this, and we talk about a distributed file system,
but this is more than a traditional file system
as we knew it now.
Typically, the file systems we knew
were all about how do you lay out data.
But this includes a whole bunch of workflow
of moving data around, mobility.
There's an entire mobility around a protocol
called D-Loop Async Replication,
which is very, very efficient.
It gives you the same capability like an async
replication from a storage system, except it never puts a block that's already on the
other side, on the wire, and collapses what used to be three different networks. If you
think about an array replication, application replication, and a backup replication, you
combine all those three into one. And then from a user's perspective, it looks like I'm able to take my application
that's running on AWS and somehow magically make it come up on an EMC system on my site.
Or who cares?
Whatever.
We'll go from one cloud to another.
And this is where the hybrid cloud part,
and this is the demo that we were showing Howard and others,
where people are able to just magically move applications, restore data, and they have no idea which underlying platforms they're running on, which is great for developers and people consuming data. skeptical. But the problem isn't the back end. The problem is
the limitations of the cloud platforms
and the unique snowflakishness
of enterprise
applications.
You say, we can send this to AWS.
You go, yeah, but that application
requires four Ethernet adapters on the
VM.
AWS only allows one.
Yeah, so to the extent that the underlying infrastructure...
Yeah, if the applications haven't been snowflaked to death,
it's a great idea.
Yeah.
I'm also constantly afraid of the,
oh, we'll set up hybrid cloud and run applications
and then you ignore the data gravity problem
and start running compute in AWS
and storage in your data center.
This morning we had a service provider.
We have over 60 of some of the largest service providers in the world.
We have one of these service providers who's been a customer of ours for a while now.
They've been selling disaster recovery business continuity services.
Classic story. Today
we're working with another very large
system vendor who is their supplier
and we're going to be announcing
a DevOps as a service.
This is to an entire
user base, mid-range
user base. 50 to 100 developers
don't own any of their
data centers. Going to Amazon becomes
very, very expensive,
especially if you're running on a 24 by 7 kind of a mechanism.
It's better to run on-site,
but I really don't want to run on-site.
So now they have the exact same technology,
business resilience, business continuity as a service.
They're going to charge, I don't know,
20 more cents per user or whatever metric. In fact, that's the other part that they're going to charge, I don't know, 20 more cents per user or whatever metric. In fact,
that's the other part that they're doing.
People are moving towards per user
based as opposed to per terabyte
or per CPU or none of that stuff because
in the DevOps world, I know I have 100
developers. Let me charge
per user annually or per
month. And the
phenomenal part is these guys
leveraging the exact same investment
with no additional infrastructure,
are able to deliver more services
to the same customers,
increasing their margin,
speeding up the development of the end user.
Just overall, win-win.
It's the same phenomenon.
We get excited because it brings about the same phenomenon,
in fact, more phenomena than what VMware did on the server side.
If you have data available for all kinds of things.
Now, there is one more service provider.
We may have mentioned this, Howard.
They work in the financial services business.
In addition to doing business resilience,
they also have been selling the monthly federal stress test service.
Because I have your data.
Why do you want to run it on your expensive storage system?
In fact, why do you want to over-provision those?
Let me run it here and deliver to you a monthly report that says you comply with the federal stress test.
And it's fascinating what people are doing.
People are running virus, antivirus stuff as part of the service.
People are running compliance checks. Like this guy told you,
one of the customers is actually, as part
of our marketplace that we're going to be announcing,
people are building applications on top of
this platform to give you
more things you can do. In fact, I think
one of the things, at some point,
I'll come back to you off this air,
is some
other interesting stuff on the analytics
stuff that the CIOs want to say.
Hey, if I had a system of record.
I was just thinking that.
Yeah.
Data Gravity did some interesting things, but they've fallen on hard times.
Oh, you're talking about the company Data Gravity?
Yeah.
Oh, yeah, okay.
And Humulo's doing some interesting things. But you guys are like in the perfect position to do that kind of file system analytics-based stuff from the copy where it won't affect the performance of the primary.
And I don't care that it's based on midnight yesterday.
Absolutely.
And the good news is you can not only, we have retail companies
and another rail car logistics company that not only does analytics for yesterday,
they actually do the analytics on the data from the entire week,
the data for the entire year.
So this becomes kind of the source of a data warehouse.
I can literally rewind the data for the entire year. So this becomes kind of the source of a data warehouse. I can literally rewind the data for the entire year in one place in ActiveView
and say, show me this application for the whole year
and let me run the analytics on this place.
God, I think we could talk hours about this technology.
I mean, its potential here is amazing.
And to Howard's point, I need some help on figuring out
how we get this out in a simpler way than how long it took
for two of the smartest guys. Well, we can talk about that offline. We're both in this business and we'd be happy to
help, but we are kind of running out of time. Howard, is there any last questions you would
like to ask Ash? No, I got in mine. I probably got a dozen questions, but I'm not sure anyone
would specifically work. Ash, is there anything you'd like to say to our audience? I think the key here is to think about data as the new lifeblood.
Data is a valuable business asset. And if you're in the business of transforming your business to
a digital enterprise, then the only two things that matter are how do I deliver applications
faster and how do I make sure my data is available to my users
and to my internal constituency as fast as possible?
And rather than keep mucking around with infrastructure, we just tell people just active it.
Okay. Well, this has been great.
It's been a pleasure to have Ash here with us on this podcast.
And next month, we'll talk to another startup storage technology person.
Any questions you want to ask, please let us know.
Please, if you're interested, you could review us on iTunes and let us know how we're doing.
That's it for now.
Bye, Howard.
Bye, Ray.
And until next time, thanks again, Ash.
Thank you, Ray.
Thank you, Howard.