Grey Beards on Systems - 34: GreyBeards talk Copy Data Management with Ash Ashutosh, CEO Actifio

Starting point is 00:00:00 Hey everybody, Ray Lucchese here, and Howard Marks here. Welcome to the next episode of Greybirds on Storage, a monthly podcast show where we get Greybird storage assistant bloggers to talk with storage assistant vendors to discuss upcoming products, technologies, and trends affecting data centers today. This is our 34th episode of Graybridge on Storage, which was recorded on July 6, 2016. We have with us here today Ash Ashitosh, founder and CEO of Actifio. Please tell us a little bit about your company, Ash. Hey, Ray. Hey, Howard. Thank you for this wonderful podcast.

Starting point is 00:00:42 I'll give you a quick summary of Actifio. We were founded in 2009 to deliver a very disruptive mission, and that was to make copy data a very valuable business asset. And over the last seven years now, we've managed to do that very well. We're well over 1,200 of some of the largest enterprises, and 60 of the largest global service providers are powered by Actifio. We have changed the economics of how organizations make and manage data. And then, more importantly, we have transformed many of these businesses to be digital enterprises.

Starting point is 00:01:20 So now the obvious question is, what is copy data? And maybe you're going to ask me that next. So what is copy data, and what is copy data? And maybe you're going to ask me that next. So what is copy data and what is copy data management all about? So go back to how business applications produce and process production data. You have applications, your CRM, your all kinds of business applications that truly go out and produce data. They process them. Your business runs on that. And it turns out businesses also create many, many copies of this production data, sometimes for protection, sometimes for availability, business resilience, compliance, developers

Starting point is 00:01:56 need copies, QA, tests, analytics folks. In fact, the model is pretty simple, right? Every time you want to run an application that requires data, you make an independent copy of it, stand up an entire infrastructure, and then you basically run your application. And the next result here is, according to IDC, there's about 13 to 150 copies of the exact same data and all the infrastructure that supports these redundant copies, to the point where $48 billion last year, 2015, $48 billion was spent on managing redundant data, which is a pretty interesting opportunity to go back and tackle.

Starting point is 00:02:33 When I was at a prior company, we did a study on how many copies there were, and this was back in the middle 90s. And even at that time, there were nine copies of data, typically, on average. Yeah, yeah. And as businesses became more and more typically on average. Yeah, yeah. And as businesses became more and more dependent on data, those copies increased, partly because people are using copy data for a lot more things. People are running businesses on almost hourly analytics now.

Starting point is 00:02:59 There's a lot more development going on now. And this copy data problem became a massive problem. And the old model of just making a copy, running infrastructure in a day and age where everything needs to be instant in a much more available environment just did not work. In the old days, these copies were typically spread across multiple devices. There were backup copies. There were storage copies. There were disaster recovery copies. Each of these was a separate and distinct product,

Starting point is 00:03:30 as far as I could tell. Oh, yeah. That portfolio of products proliferated even more when developers wanted a copy of the database for development. Isn't that the problem snapshots were supposed to solve for us? Well, I mean, snapshots is only part of the solution, right? If a developer wants a copy, I'll just take a snapshot. It's just metadata. Who cares? That's right. That's right. And I think storage systems were a source of fixing some of those issues by having snapshots to provide better restore, for instance, slow backups, snapshots for dev and test. But the reality was a couple of things.

Starting point is 00:04:02 As applications got more sophisticated, you're dealing with larger amounts of data. And access to data wasn't just making a copy of a volume or a snapshot of a volume, but actually much more sophisticated scrubbing of data that's required, the removal of sensitive information before I made it available to developers. And sometimes we are sharing data with organizations that are not even part of the same company. And so this whole notion of better managing data became a big issue. And really there were two constituencies. There was the constituencies of operations who were the custodians of data, and their job was to meet the governance and compliance requirements, backup, resilience, business continuance, and make sure it is highly available.

Starting point is 00:04:46 And then the whole new set of folks emerged who were the consumers of data. And these were the developers, the analytics folks. These were the people who were doing QA. These were the people who were trying to meet compliance requirements. As long as those requirements were not that big, it was okay 10 years ago. Then Uber showed up to prove that you don't need cars to be the biggest taxi company. Airbnb showed up to show you don't need to be owning

Starting point is 00:05:13 a hotel industry to be the biggest hotel company. And next thing you know, digital businesses became an absolute important part of any organization. You see a sharing economy as part of the data world? I mean, I don't understand that, but I can see it. I was talking to a guy for Twitter, and they have effectively a data bequest. You can ask for, give me all the Twitter data for this keyword over the last two years or something like that, and they'll look at it, and they may give it to you or not,

Starting point is 00:05:41 but it's really a collaboration on part of their data. Yeah, absolutely. I absolutely believe, end of the day, any application that anybody develops is probably, if it's successful enough, it's going to become open source because it's so successful. And the only thing an organization has as a sustaining competitive asset is the data. This data relevant to the domain. In many cases, you look at some of the most successful companies today, they are run by the fact that they have more data about you and about their customers than anybody else

Starting point is 00:06:12 does. Oh, yeah. Yeah. Absolutely. Right. And so there are organizations that know more about me than I do because they have a lot more data about me than I do. And that's the reality.

Starting point is 00:06:23 And that's the reality of every organization. Everybody's trying to get there. Well, everybody wants to be Target and send the pregnant girl discount for her folic acid before she tells dad that she's pregnant. Exactly. But that's where the data masking and context-sensitive nature of the Actifio starts to come in.

Starting point is 00:06:42 That is the exact difference between taking a snapshot for backup and development versus Actifio starts to come in. That is the exact difference between taking a snapshot for backup and development versus Actifio Coffee data, where it's all about context. So you guys actually scrub the data, scrub sensitive information out of data and stuff like that? Yeah, so at the end of the day, what we've done is there were three big tipping points that happened sometime around 2008, 2009. One was adoption of virtualization on the server side. On the server side, VMware became a predominant way

Starting point is 00:07:07 to consume the compute resource. And people weren't afraid to use servers on even most business-critical production applications. Second was replacement. I think you had Brian Biles and Hugo earlier in one of your podcasts. Those two folks pioneered the whole displacement of tape with disk. And the last one was, you know, storage became an absolute commodity.

Starting point is 00:07:30 It was easy to come back and leverage disk in many different ways or random access media. And we took the opportunity to introduce virtualization technology into this massive copy data silos and completely change. And we began with starting at the root. How do you capture copy data silos and completely changed. And we began with starting at the root. How do you capture copy data? How do you manage it throughout its lifecycle with a single SLA? Because ultimately businesses are trying to manage an SLA for the application data. And how do you infinitely reuse the same data with the appropriate context? Now, reuse doesn't mean just make a snapshot copy of it, but give the appropriate context. Now reuse doesn't mean just make a snapshot copy of it,

Starting point is 00:08:05 but give the appropriate context. When Ray is in Singapore, the data that Ray sees as a developer would be very different than when Ray is in Denver, because the laws of Singapore are very different. So that context awareness is from a user's perspective what makes it relevant, but from an operations perspective, we've dramatically changed just the nature of what it means to manage this massive amount of copies to something as simple as what VMware brought to servers. So effectively, we've brought in the same paradigm. This is sort of this, I'm not sure if the term is data governance, but there's a, you know, like you can't take German data outside of Germany and, you know, English data outside

Starting point is 00:08:42 of England, stuff like that. Absolutely. Absolutely. And we do that. Absolutely. Absolutely. And we do that. And that's part of the SLA. The service level agreement for an application defines a lot of things. It defines, obviously, what is the frequency which the data needs to be captured, how long I need to retain.

Starting point is 00:08:56 But more importantly, what boundaries can I cross? And who are the people who can access this? And when somebody accesses it, what are the components of it that I need to mask or scrub based on either just the nature of the data or the nature of the location or the nature of the person? And I think there's a lot of this stuff about storage, the domain that we all used to live in and grew up in. There's a lot of this stuff that is done by operations people on a daily basis.

Starting point is 00:09:22 We just combine what used to be the whole process of managing data and decoupled it from storage and allowed people to come back and truly run the management of data as it is supposed to be, you know, through an SLA. And it became even more important when cloud emerged. At that point, all I know is I know there's an application running that creates data, and I need to access and manage that data. I have no idea what storage it runs on. In fact, I don't even know what data center it runs on. So does Actifio, I mean, like I said, I came from

Starting point is 00:09:54 the storage world, obviously, but copies were always on storage or on tape or, you know, somewhere else. But I mean, does Actifio work with storage products and take copies of the data and scrub them and provide governance and compliance on top of them? And how does it work with the cloud? Because the cloud is a whole different animal. Yeah, what we started out with is the fact that there are only two things people care about, the applications that you're running a business on and the data that these applications are consuming. The rest of everything is just an API. The rest of the infrastructure is just an API. And so what we did was to treat infrastructure like an API and then be the middleware that captures data

Starting point is 00:10:33 directly from the application. So we captured it from VMware, from Oracle, from SAP, from SQL. We have no idea if this Oracle is running on a storage system or if it's running on a tin can connected to a string. We have no idea if this Oracle is running on a storage system or if it's running on a tin can connected to a string. We have no idea if it's running on AWS. We really have nothing. We have no clue. We just assume infrastructure underneath is an API

Starting point is 00:10:52 and that we really cannot access it. We can access the application. So it begins with capturing the data. And this is the first change we made when Brian and the data domain team decided that, hey, tape is no longer needed, we also realized, hey, transformation from random access to streaming format is also not needed anymore. Let's capture data in its application native format, keep it, and instead of writing to disk all the time, you're capturing it every so often. And we are a software that

Starting point is 00:11:22 captures it. Underneath us, we have pools of storage. So yes, we use storage, but we use any storage. With the user, through the API, you might give us different classes of storage. You might say for shorter-term retention or for people doing performance testing, they need a fast pool, and you can add anything to that pool. There's some for medium-term retention. There may be some for a longer-term DR site and then eventually for the vault. But you are now converting storage into quality-of-service pools that you can keep adding, deleting, and I can access through an API.

Starting point is 00:11:57 We are the middleware between applications and the underlying infrastructure, managing the entire lifecycle. And when you need, so when an application needs a certain, you want SAP from three days ago, we make SAP. We know what SAP is because we've captured blocks of data from the application, from SAP in this case. And then we added a whole bunch of contextual information. We added information about the fact that this block belongs to this application. It belongs to this application from this time, and

Starting point is 00:12:25 then there's another pin that says what's the SLA for this. End of the day what we look like is this massive distributed object file system where the objects are data surrounded by its entire context. In many ways if you go back to 20-30 years ago when we all built storage systems and storage protocols, the one thing we didn't do was we didn't provide any context to the data. And we didn't provide any routability to the paths. Storage was always something that very tightly connected to servers. And somebody else had the context. We were responsible just for dumb blocks of files. So Actifio is middleware in this environment.

Starting point is 00:13:05 Absolutely, we are middleware. And you use various storage and various, you have I mean, POSIX compliant storage system or storage middleware? Underneath us, we use any block storage or file storage whether it's, we virtualize

Starting point is 00:13:21 the underlying storage as pools of storage for us to use. The best analogy we can get everybody very quickly to understand Actifio is VMware for data. You think about all the data that's out there, we virtualize all that stuff, and then we use the underlying infrastructure in a virtualized form to not only place the data in the right place, but also make it accessible to any application instantly. Because we know the context. There's billions and billions of objects floating around, and they're going everywhere. They're going between Germany and the EU, and now that Brexit happened,

Starting point is 00:13:56 maybe some of this data from the UK doesn't go to the EU. Certainly some of the data from the EU won't go to the UK anymore. That's probably true. In fact, it's a phenomenal opportunity for us, right, for people using Actifio. They just turn the SLA off to say, okay, you're just going to turn off the SLA that says don't go to these locations or these data centers. We call them profiles. And suddenly, you know, data stops. I'm not going out and changing a whole bunch of wires. And this is the power of an underlying infrastructure of virtualization.

Starting point is 00:14:29 So we are a middleware. We are, like I said, technically, if you want to get to the guts of it, we are a massive distributed object file system with one single global namespace. You can reach out into ActiveIO from any place, call an API or call a UI, and you want to get 14 copies of your database that is 50 terabytes big in some remote location. Within a few minutes, you see the entire virtual copies of the database show up, or you want to bring up the entire data center. In fact, we are one of the world's fastest business resiliency solutions that SunGuard has been shipping almost 18 months ago, bring up the entire data center in 20 minutes or less, and used to take them weeks before. This ability to virtualize data, make it completely independent of the underlying infrastructure,

Starting point is 00:15:17 manage the operations, the entire lifecycle with a single application. But data always has gravity. I mean, you can't move terabytes of data around in a minute. You could virtualize it maybe, I suppose. Yeah, yeah. So that's a very good point. So let's take the example of a database with 100 blocks. You are at location A. First time we capture all the 100 blocks, and you say, hey, here's my SLA.

Starting point is 00:15:42 The SLA is that after four months, really, you don't need the data locally and need it to be off to the remote side. Now, month five comes along and you say, I really want to go back to that old five-month-ago database that's 100 blocks big. Normally, what you would do is you would have to fetch the entire 100 blocks back from the remote side. Remember, we are a distributed object file system, and we have one big namespace. What we do is we know that five-month-old image of that database has 100 blocks.

Starting point is 00:16:13 It turns out 75 of those blocks are still here because they're still changing, and they're still local. All I have to do is fetch the remaining 25 blocks. And it's this notion, it might not be just one location, it might be other locations. The ability for us to truly have one big namespace that allows us to recreate the object that you want, whether it's a data center, a file system, or a database, by just fetching the least amount. And that is the one thing you'll see with ActiveView. Every part, whether it's how we capture

Starting point is 00:16:40 data, how we move it on the network, how we restore it, how we reuse it. There is nothing more efficient because you just put it under one big namespace and we know the context of every one of those locations that we're managing. Okay. So, gosh, this is pretty complex. Well, ask Howard. First, it's the idea and there are some implementation issues there might be a couple of other ways to solve the problem but by moving completely away from

Starting point is 00:17:15 the concept of we're backing up to tape now if we're not backing up to tape, and unfortunately for 90% of the world, even the smarter end of the at some point goes, oh, right, this is that disk thing that's pretending to be a tape. Let me create a tarball. By moving the protection concept up to the data, now everything is just deltas and metadata. It's not just protection. This is real storage. They're talking about the application gets their data through Actifio. Absolutely. Copy data. It's not just copy data. It's the primary data, right? No, no, no. This is important, Ray. So we're not touching anything that is production data.

Starting point is 00:18:18 So we're basically saying your SAP Oracle is running now. At least we are saying that. But the reality is when 400 developers or 8,000 developers now that we have at this bank are developing their applications, we are the production data, but it is still not the real production data. It's not what your business is running on. We're not signing up to say, please run your business on Actifio. We're saying, guys, there is more business happening on copy data than on the production data. There are more users running on copy data inside the organization.

Starting point is 00:18:49 There are more people running analytics. There are more expensive people doing development, a whole bunch of people doing QA, test, and backup. Okay, okay, okay. So it's really protection data that's being managed as copies of data, but much more sophisticated management. This whole compliance scrubbing and all the governance stuff is really unusual, right? Yeah, and it doesn't have to be protection. We now have probably some of the largest financial institutions who don't buy ActiveView for backup at all. They buy it purely as a way to allow their

Starting point is 00:19:26 developers to deliver Oracle as a service, to deliver MongoDB as a service, to deliver SQL as a service. So that I literally have, so think about the developer. The developer comes in, logs into this portal and says, okay, I want to check out my source code. And I've integrated that whole checkout process in GitHub with ActiveView. So for the first time, you're checking out your source code, and automatically your data that goes with that version also gets checked out.

Starting point is 00:19:53 Then when you build your stuff and check it back in, continuous integration begins, where all the data required to run the compilations at night, all of that integration starts. Then there's an entire test process. You made, and this is a case where there's nothing to do with data protection, but this is about making data the new API that... Wait a minute, you guys are plugged into GitHub?

Starting point is 00:20:16 I mean, you're plugged into Jenkins and all the DevOps testing and all that shit? You're kidding me. Absolutely. We have solutions that entire development team can run the development chain without ever calling an operations guy or asking for more storage this is actually this is what we all end of the day if you assume that storage box is an api where i can just store blocks what you need is something that is that is actually aware data is the other like how would and i talk about

Starting point is 00:20:45 there's a reason they call it a data center because that's where the data is. They don't call it a storage center. They don't call it a compute center. Well, we used to. We used to, yeah. That's right. Ash, you're clean-shaven but, you know. You've been around a little while.

Starting point is 00:21:02 I'm pretty confident that there's some white going to come in there should you let it grow. It does. It does. It does. Not gray beer, but yeah, white. So I think what we've done is to make data the new infrastructure. And because of its API, because it's context-aware, I can just integrate with anything.

Starting point is 00:21:22 And this is the reason why people are able to build their own entire orchestration tools. There are three new startups now that are built on top of Actifio because you can just call an Actifio API and get any data, any application, any system of record for the enterprise. There's a new security company that's built around the concept that I don't need to secure firewalls and networks. I'm just going to protect data. I'm just going to call ActiveEO and use that as a single source of truth for all the data in the enterprise. There's one of our customers who has built an integrity checks service. He sells an integrity check service that validates that all your applications are truly recoverable seven years from now and you're able to meet the compliance requirements.

Starting point is 00:22:10 This is outside, obviously, in Europe. And so you have all these people who are now building applications, knowing that data is available through an API. Very similar to think about how wonderful it is for all these guys who go to AWS, call an API, and just build their application. Now combine that with they call an API and get the data they want, production data or copies of production data. That's what we've done. It is. It is fascinating.

Starting point is 00:22:37 I mean, we haven't been sitting on our hands for seven years. No, no, I understand. You haven't, but you have been somewhat quiet. We have. That is definitely... I think Howard and team were giving us feedback, Ray, that, hey, guys, this is some excitement going on here. If we look at things just on the protection side, and certainly protection is the least interesting thing you do,

Starting point is 00:23:04 although we will get to some parts of what you do for protection that I really am fond of. Yeah. But if we look just at the protection side, 10 years ago, I would get up at seminars, you know, when I was doing backup school for TechTarget, and say that the backup application is the stickiest thing in the data center. Absolutely. That nobody wants to change and people use the backup application they know and loathe. Yep. But over the past five years or so, you've come around, you're doing, you know, you're a private company and not saying exactly how much business you're doing, but you're doing hundreds of millions of dollars a year. Veeam is doing hundreds of millions of dollars a year.

Starting point is 00:23:51 So people are a lot more willing to make a change there than they were in the past. Yeah. Because you're changing from net backup to networker was changing from a 57 Chevy to a 57 Baniag didn't make a lot of difference. I think there was an item on Storage Newsletter today about Cohesity doing real well. Yeah.

Starting point is 00:24:15 And so, yeah, there's plenty of players in this space that have taken on this data protection side of things. But, I mean, this is beyond that. Well, it's now that we have you know we have to make an external copy for protection yes yes and so when i you know when people talk about copy data management there's the external copy data management that that activio does which solves the protection problem as well and then there's the in place data protection which is i'll take advantage of snapshots yeah and if i am a clever guy and i have jenkins and i have a disk array that has a cinder driver then i could set it up so that when my developer logs into github

Starting point is 00:25:01 he can make a clone of the snapshot of the Oracle database from last night and get similar functionality. Certainly not the PII filtering and the like. But I still need to make a separate backup because if that whole array blows up, I'm a dead man. Exactly. And that's why I think there are two different approaches. The approach we took was to go after the root cause, which is why the copy data problem originated to begin with, because there were multiple different sources of where these copies were being created. So we attacked right at the source.

Starting point is 00:25:37 Now you have two different approaches. You have the storage guys saying, just like what you said, hey, I'll give you a snapshot. And, of course, multiple attempts have been made to redefine snapshot as a backup, now as a DevOps, and the problem persists, which is when the production system goes down, so do the 8,000 developers twiddling their thumbs waiting for the snapshot to come back up.

Starting point is 00:25:57 Yeah, and the other side of that coin is you're responsible for performance for all my developers now. Yeah, that is the other big part. So you're, you know, I got some storage devices behind your appliances, but you're in-band and how fast it actually works depends on how fast your

Starting point is 00:26:15 metadata object store is. Yep, I agree. And I think, Ray, to your point about why all these backup companies are coming up, and they really are, Howard's point was very well taken, which is for a while there were about five companies that owned the market share, and they kind of just had a very steady market share because I swapped him out for this one, the other one swapped out for the other one, and it was pretty same.

Starting point is 00:26:42 Now what happened was one. And everybody fought for the guy who was outgrowing backup exec. Exactly, exactly. And now what happened was the sheer size of the data just cannot support the old backup model. In fact, I have an article, Howard, I wanted to see if you or Frank, maybe co-author here,

Starting point is 00:27:02 tipping the hat to Brian and Howard Patterson. The article says dot, dot, dot, and now backup sucks. Well, I mean, here's the thing I was most impressed by at Tech Field Day was when you guys said you could do change block tracking backups without being in a VMware environment. Absolutely. Because when I spent a lot of my life in the backup world, and when I talked to Curtis Preston,

Starting point is 00:27:34 who still spends his whole life in the backup world because it's kind of all he can do, there were two intractable problems. The first was I have a NetApp filer. It has six million files on it. Walking the file system to figure out what's included in the incremental takes four days. Yeah, what's changed, yeah. Yeah, just the determination of what's changed took two days.

Starting point is 00:27:58 And the other problem was, I have a four petabyte Oracle database. You just can't copy it. There is no window big enough to make a four petabyte oracle database you just can't copy it there is no window big enough to make a four petabyte copy yeah and change block tracking is just the right way to address both of those problems yeah by saying i'm going to do change block tracking on a file system i don't need to walk the file system anymore i just back up the change blocks now all of a sudden i can do incremental backups on my databases where you know database vendor said you can do an incremental backup, but it was really just a log dump restoring from it.

Starting point is 00:28:30 And apply. Yeah. Restoring from it was okay. So we need the storage guys and the DBAs all night with a lot of pizza. Talk about how you manage that a little. You're not doing continuous data protection per se. No, you're not. You're doing change block tracking. So se. No, we are not.

Starting point is 00:28:45 You're doing change block tracking. So at the point in time when they take a copy, the first copy, you create an object element for this storage copy. And then as further copies are created, you are doing a delta of that to determine the change block. Is that how this works? It does. But the question I think Howard was going after is how do you efficiently do those deltas because the data size gets big. And what we've done is to combine two things.

Starting point is 00:29:15 And one is you need a way to have a consistent copy being made of the application, which means all the requirements of queries in the application makes sure saying, okay, I'm going to now make a copy of it. But there are various ways to do that. You have to do it with VSS and all kinds of other places to go do that. But the important part is, okay, now that I have a consistent copy of this, what in this copy changed? So what we have is we combine that with a bitmap that we keep of blocks that are changing. And it's more of a flipping a bit rather than CDP, which would be extremely expensive. You're capturing every transaction.

Starting point is 00:29:53 And so combining those two, a bitmap that's constantly being flipped each time a block is written, and that consistent copy of the data allows us to be very, very efficient. So, wait, wait. You're not just a backup product here. You're actually storage middleware that's maintaining a bitmap of change blocks. You're allowing the stuff to go to primary storage without further ado, but if it needs to be copied, that's when you get invoked?

Starting point is 00:30:22 Yeah. The way I understand it, there's an agent that sits on my database server. That agent builds a change block journal. Not the data, just block seven changed. And, of course, if block seven changed 400 times, it's still one entry because block seven changed. Then we quiesce the application, and we take a storage system snapshot. And then we mount the storage system snapshot and grab the blocks that are in the journal. And that's our copy.

Starting point is 00:30:52 The agent's really lightweight. All it has to do is keep the change block journal. So it's not in the data path. This agent's not in the data path as much as sitting alongside the data path monitoring change block traffic. It's in the data path because it's the only way it can get the LBAs, but it's grab request, save LBA, pass request on. The latency implications are really minimal. Right.

Starting point is 00:31:15 So we're not terminating a SCSI request or a file request. We are watching a block go by and say, there you go, that's block seven, flip that one. Okay, there's block nine. We're not adding any latency by terminating a request and restarting a new request as if you were an in-band, in-the-path driver. And I think that is important because ultimately there are enough tools. And a lot of things have changed. Applications have become much more manageable.

Starting point is 00:31:44 The awareness of applications to be better protected and better backed up has become even NoSQL databases now. As they've come out, they've realized the need for them to be protected better right from the start, right? People have realized that data is important. If you go back 20 years ago, you had to work through a lot of work when backup was the last thing people considered. It's a brilliant concept that the people developing database engines consider that they have to be backed up.

Starting point is 00:32:12 Yeah, yeah, exactly. You're very facetious. Well, no, no, I'm not. I'm remembering the days when open file management was a problem because somebody left an Access database application open overnight and couldn't back that file up because it was locked. So to be able to go, gee, the backup guys actually, you know, the database guys actually thought about backups and their only thought wasn't the DBA will periodically perform an ASCII dump to a disk,

Starting point is 00:32:46 increasing the amount of disk needed by a factor of four. And the backup guys can back that up. I like that idea a lot. Yeah, and then if you look at what's happened, Ray, over the last three, you know, six years, out of the six years of, let's say, four and a half is when we really started shipping in volume. First four years were shooting fish in the barrel, going after the entire data protection market. It was just basically saying,

Starting point is 00:33:10 hey, how big is your database? 10 terabytes? How long does it take you to restore? How about... A week. A week to eight days? How about I bring it up to you in like two minutes? Are you kidding me? No, let me show you. And you lost the sale because he couldn't believe

Starting point is 00:33:26 you and you went back a couple of months later and said, we can do it in half an hour. There is some of that. There are customers who used to say, hey, we can instantly make entire data available to users. And the guy would say, no, can you slow it down so you can only do it

Starting point is 00:33:41 four times a day because I don't want to just bring it up a hundred times all over the place? Or disaster recovery, same thing. For us, this notion of disaster recovery or business resilience is nothing more than I've got this distributed file system. Data is flowing from one location to another. You're just accessing data or reorchestrating an entire data center or specific applications on the remote side instantly. It's not like a PhD. You don't need a PhD anymore to go back and run businesses. And by the way, this is across VMware. A lot of the companies you see today

Starting point is 00:34:15 in the copy data space jump straight to VMware, whether it is, you were mentioning some of these new companies that have come along. A lot of them jump straight to VMware because everybody uses the VMware API. The VMware API makes life a lot easier except for the fact that they keep finding bugs in it. Amazing number. Not only bugs, there are lots of, you know, stun issues, application stun issues,

Starting point is 00:34:40 database stun issues, almost making it impossible to use. Yeah, but I mean, we're not talking about NATs. We're talking about giant ants from nuclear tests like in them. Oh, yeah, absolutely. Absolutely. I think we've made our contributions over the years on some very intractable issues that have become major issues in the customer base. So I think the important part was to make sure that we just didn't cover VMware only.

Starting point is 00:35:11 It was important that the last AS400 sitting in the back room was also managed the exact same way. That old machine with AIX was also managed, and those four HPUX machines. Okay, I can't say anything about the HPUX machines. It was absolutely with all the time i spent in the casino business those as400s and rs6000s ain't going away yeah absolutely they're not and little old ladies get upset when the slot machines don't update right right so let me try to understand so the agent is sitting there i understand it can maintain a change block tracking but let's say I want to do a restore at this point. So at that point, the agent becomes more or less a primary access to this virtualized storage device, right?

Starting point is 00:35:56 No. So the agent has nothing to do with the restore part. So now what happens is the agent is there to capture the data, give me the bitmap, and said, from then on, think of us like a virtual storage system. We have this entire system of record history for the last seven years of every application, every change that has been captured. Now we have billions of objects floating around across multiple locations. You take one of those access points, and what we've done is not try to go back and do the same mistake backup guys did. Instead we provided a standard IO interface, a fiber channel, iSCSI or an NFS interface or a NAS interface

Starting point is 00:36:31 into Actifio. So you go to that Actifio interface point and say, I need this Oracle database from yesterday. But think of, it's no different than accessing a storage system except the volume or the data that you see on the other side of the wire is something we synthesize immediately at that point. So we just took those. So you basically say, present me a LUN that is this Oracle, the LUN, this Oracle database was on as of this backup point. Yeah, and then we synthesize that. It doesn't exist, right? Right. It doesn't exist in its full Yeah, and we synthesized that. It doesn't exist, right?

Starting point is 00:37:06 Right. It doesn't exist in its full format, so we synthesize it, we put it back into... It exists as much as it does in any deduplicated data store. Yeah, yeah. It's all about metadata magic and assembled pieces. Exactly, and then because we've kept it in application-ready format,

Starting point is 00:37:24 we're able to present it instantly as opposed to the old tape business of reconverting it back and putting it back into some formats. And so this is where we become the production system. At that point, the VM, the server is actually talking directly to ActiveIO. And this is where it becomes very, very interesting. So now, at that point, people are beginning to pick and choose what pools I want this restore to happen. We have SAP developers. They're doing dev and test and QA.

Starting point is 00:37:54 They have one pool. This SLA is set up to say, go to these pools of storage, which are lower performance. And pre-prod actually goes to a flash pool of storage, what they call high performance. It could be any flash. In fact, I could combine 10 terabytes of Xtreme IO, and then Pure gave me a phenomenal deal. I had 20 terabytes of those guys, and somebody else came in. Another guy gave me a phenomenal and gave it to me for free,

Starting point is 00:38:17 and I've combined that into a pool. To us, it's a pool with a certain quality of service. But based on the role, so Ray comes in as the person who's going to do performance testing on pre-prod right before you release. Suddenly the data that Ray sees, you won't even know, but it shows up in this pool because that's where the operations guys have set the SLA to. Data shows up there, you're running your tests, and you're off.

Starting point is 00:38:40 And we've synthesized the data in the right places. I mean, underneath this, and we talk about a distributed file system, but this is more than a traditional file system as we knew it now. Typically, the file systems we knew were all about how do you lay out data. But this includes a whole bunch of workflow of moving data around, mobility.

Starting point is 00:38:56 There's an entire mobility around a protocol called D-Loop Async Replication, which is very, very efficient. It gives you the same capability like an async replication from a storage system, except it never puts a block that's already on the other side, on the wire, and collapses what used to be three different networks. If you think about an array replication, application replication, and a backup replication, you combine all those three into one. And then from a user's perspective, it looks like I'm able to take my application

Starting point is 00:39:27 that's running on AWS and somehow magically make it come up on an EMC system on my site. Or who cares? Whatever. We'll go from one cloud to another. And this is where the hybrid cloud part, and this is the demo that we were showing Howard and others, where people are able to just magically move applications, restore data, and they have no idea which underlying platforms they're running on, which is great for developers and people consuming data. skeptical. But the problem isn't the back end. The problem is the limitations of the cloud platforms

Starting point is 00:40:08 and the unique snowflakishness of enterprise applications. You say, we can send this to AWS. You go, yeah, but that application requires four Ethernet adapters on the VM. AWS only allows one.

Starting point is 00:40:24 Yeah, so to the extent that the underlying infrastructure... Yeah, if the applications haven't been snowflaked to death, it's a great idea. Yeah. I'm also constantly afraid of the, oh, we'll set up hybrid cloud and run applications and then you ignore the data gravity problem and start running compute in AWS

Starting point is 00:40:45 and storage in your data center. This morning we had a service provider. We have over 60 of some of the largest service providers in the world. We have one of these service providers who's been a customer of ours for a while now. They've been selling disaster recovery business continuity services. Classic story. Today we're working with another very large system vendor who is their supplier

Starting point is 00:41:10 and we're going to be announcing a DevOps as a service. This is to an entire user base, mid-range user base. 50 to 100 developers don't own any of their data centers. Going to Amazon becomes very, very expensive,

Starting point is 00:41:26 especially if you're running on a 24 by 7 kind of a mechanism. It's better to run on-site, but I really don't want to run on-site. So now they have the exact same technology, business resilience, business continuity as a service. They're going to charge, I don't know, 20 more cents per user or whatever metric. In fact, that's the other part that they're going to charge, I don't know, 20 more cents per user or whatever metric. In fact, that's the other part that they're doing.

Starting point is 00:41:48 People are moving towards per user based as opposed to per terabyte or per CPU or none of that stuff because in the DevOps world, I know I have 100 developers. Let me charge per user annually or per month. And the phenomenal part is these guys

Starting point is 00:42:04 leveraging the exact same investment with no additional infrastructure, are able to deliver more services to the same customers, increasing their margin, speeding up the development of the end user. Just overall, win-win. It's the same phenomenon.

Starting point is 00:42:21 We get excited because it brings about the same phenomenon, in fact, more phenomena than what VMware did on the server side. If you have data available for all kinds of things. Now, there is one more service provider. We may have mentioned this, Howard. They work in the financial services business. In addition to doing business resilience, they also have been selling the monthly federal stress test service.

Starting point is 00:42:44 Because I have your data. Why do you want to run it on your expensive storage system? In fact, why do you want to over-provision those? Let me run it here and deliver to you a monthly report that says you comply with the federal stress test. And it's fascinating what people are doing. People are running virus, antivirus stuff as part of the service. People are running compliance checks. Like this guy told you, one of the customers is actually, as part

Starting point is 00:43:08 of our marketplace that we're going to be announcing, people are building applications on top of this platform to give you more things you can do. In fact, I think one of the things, at some point, I'll come back to you off this air, is some other interesting stuff on the analytics

Starting point is 00:43:23 stuff that the CIOs want to say. Hey, if I had a system of record. I was just thinking that. Yeah. Data Gravity did some interesting things, but they've fallen on hard times. Oh, you're talking about the company Data Gravity? Yeah. Oh, yeah, okay.

Starting point is 00:43:40 And Humulo's doing some interesting things. But you guys are like in the perfect position to do that kind of file system analytics-based stuff from the copy where it won't affect the performance of the primary. And I don't care that it's based on midnight yesterday. Absolutely. And the good news is you can not only, we have retail companies and another rail car logistics company that not only does analytics for yesterday, they actually do the analytics on the data from the entire week, the data for the entire year. So this becomes kind of the source of a data warehouse.

Starting point is 00:44:24 I can literally rewind the data for the entire year. So this becomes kind of the source of a data warehouse. I can literally rewind the data for the entire year in one place in ActiveView and say, show me this application for the whole year and let me run the analytics on this place. God, I think we could talk hours about this technology. I mean, its potential here is amazing. And to Howard's point, I need some help on figuring out how we get this out in a simpler way than how long it took for two of the smartest guys. Well, we can talk about that offline. We're both in this business and we'd be happy to

Starting point is 00:44:51 help, but we are kind of running out of time. Howard, is there any last questions you would like to ask Ash? No, I got in mine. I probably got a dozen questions, but I'm not sure anyone would specifically work. Ash, is there anything you'd like to say to our audience? I think the key here is to think about data as the new lifeblood. Data is a valuable business asset. And if you're in the business of transforming your business to a digital enterprise, then the only two things that matter are how do I deliver applications faster and how do I make sure my data is available to my users and to my internal constituency as fast as possible? And rather than keep mucking around with infrastructure, we just tell people just active it.

Starting point is 00:45:36 Okay. Well, this has been great. It's been a pleasure to have Ash here with us on this podcast. And next month, we'll talk to another startup storage technology person. Any questions you want to ask, please let us know. Please, if you're interested, you could review us on iTunes and let us know how we're doing. That's it for now. Bye, Howard. Bye, Ray.

Starting point is 00:45:55 And until next time, thanks again, Ash. Thank you, Ray. Thank you, Howard.

Your Ad Here

Grey Beards on Systems - 34: GreyBeards talk Copy Data Management with Ash Ashutosh, CEO Actifio

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.