Grey Beards on Systems - GreyBeards talk data-aware storage with Paula Long & Dave Siles, CEO&CTO DataGravity

Starting point is 00:00:00 Hey everybody, Ray Lucchese here and Howard Marks here. Welcome to the next episode of Gray Bridge on Storage, a monthly podcast show where we get Gray Bridge Storage and system bloggers to talk with storage and system vendors to discuss upcoming products, technologies, and trends affecting the data center today. Welcome to the 19th episode of Greybirds on Storage, which was recorded on March 24, 2015. We have with us here today Dave Siles and Paul Long, CTO and CEO slash co-founder of Data Gravity. Why don't you tell us a little bit about yourselves and your company? Hi. So first, thank you for hosting this. We're looking forward to it, and we know it'll be fun.

Starting point is 00:00:52 So my name is Paula Long. Again, I'm the CEO and co-founder of Data Gravity. I've been in storage almost as long as you guys, not quite, but I do get into the old part of the gray hair. I'm renaming you guys. Let me tell you a little bit about Data Gravity. So Data Gravity's mission is you've got valuable assets in your storage, but you have no visibility into it. So our mission is to provide storage that can tell you about the data. It can help you maximize the upside and cap the downside. So what do I mean about that? Well, you can look at data governance and understand about whether or not you've got people accessing data improperly or whether you have things like clear case credit cards so you're got people accessing data improperly or whether you have

Starting point is 00:01:25 things like clear case credit cards so you're exposed from a data privacy perspective. We let you do search and discovery. So if you want to plot part numbers against sales orders against regions, you can start to look at your unstructured data and understand what's going on. And we built in data recovery, which was all the benefits for quick implementation and recovery of a snapshot, but with the benefits of a backup that give you fault isolation and catalog. And that's all wrapped in a primary storage array. So we provide both block and file. And so what we really think is you bought the storage, you should know what's in it, and it shouldn't cost you anything extra to know about it.

Starting point is 00:02:05 And with that, I'll turn it over to Dave to introduce himself. Absolutely. Again, yeah, pleasure to be here. Dave Seils, the CTO of Data Gravity. And as Paula has laid out, we are bringing to market data-aware storage. It's a new category of storage that we've defined, and we're starting to see resonate in the market space you know invitations and sincere formal flattery and if anybody's watching the storage industry there are more and more people talking about data aware and data analytics at the point of storage

Starting point is 00:02:37 and we're we're happy to be driving that forward so how can you guys do all this stuff all at the same time i mean being you know it used to be being a primary storage device was a was a full-time occupation for compute and storage and stuff like that you guys have enough uh compute power in your in your system to do this sort of thing absolutely so a lot of your storage arrays so if you look at a Nimble or an Equalogic or a Tentree or probably even Pure are running active passive. So that second controller isn't doing a lot of heavy lift. It's hanging out waiting for a failure. In the case of data gravity, that secondary controller is really the intelligence controller. So we've actually, you know, someone coined the phrase active intelligent.

Starting point is 00:03:22 And so we're actually utilizing all the compute and memory that's on that secondary controller. And we've re-architected storage so we very efficiently can get the data over there. Basically, we hijack the AHA stream. A little techie there, but it's all about the architecture and being frugal in how you use the resources. Don't be afraid of being techie with us. We like it. Yeah, yeah, really. All right, so I'm going to totally nerd out then as a warning.

Starting point is 00:03:51 So SSD as well as disk or just SSD only, or how does that play out? It's a hybrid. So we actually have three tiers within the storage array. We have NV RAM, quite a bit of it, which is non-volatile RAM, so we can keep things tiers within the storage array we have nvram quite a bit of it which is non-volatile ram so we can keep things in memory in a persistent way um we also have ssd and we have spinning media so we do the combination of all three and you support file and block so is that iskuzi as well as fiber channel or um so given my roots of ecological, I don't acknowledge that Fibre Channel exists yet.

Starting point is 00:04:28 Exists yet? Well, it took Nimble a couple of years. In the mid-tier, they should just give it up. Ethernet won. Get over it. Oh, God. This could be a very interesting discussion. Paula, how could you say such a thing?

Starting point is 00:04:44 So, it's iSCSI, SMB, and NFS. At what level of SMB? 2.1. We don't do 3.0 yet, but it's, you know, it's a roadmap item. Yeah, absolutely. We're also virtualization aware, so we treat virtual machines as first-class objects inside of our storage platform, which is obviously a value add there for when we look at our intelligence and data protection, being able to give a VVOL type of experience to virtual machines today on data gravity. Yeah, so everybody talks about the urban sprawl of VMs, but nobody ever talks about VMs being overweight, right? So if you look inside a VM, we've had customers who said, well, there's nothing in there. And we say, well, there's 5 million files. Many of them are type X. And they're like, 5 million?

Starting point is 00:05:31 5 million? Where did we get 5 million? And so it's like, well, here's what they look like. So there's bloat as well as sprawl in the VM environment. So you crack open a VMDK and understand the files and the file content on the VMDK? Yep. Absolutely. That's impressive.

Starting point is 00:05:49 That's one of our unique value adds. My background, I come from the data protection space around virtualization previously with Veeam software. Almost Veeam-like, I was going to say. Yeah, yeah, yeah. It kind of is. At Veeam, we came up with a technology called Instant VM Recovery here. It's truly instant. But what we're doing is definitely pushing the envelope.

Starting point is 00:06:13 We do everything at the point of storage, though, so we don't need agents. We don't need to carry long-time VMware snapshots. Some of the pain points that even a virtualization backup product have, we don't have in data gravity's environment. So you mentioned data protection as well as data storage. So I mean, typically data protection in our world has been off of the storage system, just because the storage system becomes a single point of failure. So how does that play out in your space? So basically, what we've done is we've partitioned the system so you know you have redundant everything and so actually we've reimagined snapshots. So snapshots haven't really been reinvented in the last 20 years. I mean people compress them I'm not sure that's reinvention.

Starting point is 00:07:00 What we've actually done is separated the snapshot from the primary storage so it lives in its own pool, if you will, of disks. So if something happens to the primary pool of storage, the intelligence pool slash where these reimagined snapshots live, which we call discovery points, are over there and they're fault isolated. So we're actually able to provide very good, no single point of failure for the discovery point. And what makes a discovery point interesting is we not only keep a protected snapshot, if you will, within the discovery point, but we also catalog it. So we know every file that changed, whether the files are in a VM, in an iSCSI LAN, in an NFS mount pointer, in a SMB share.

Starting point is 00:07:44 So we can look at the content and tell you between any two points in time what changed. So you no longer have to either what I call restore from a snapshot is the biggest data loss I've ever heard of, but it's sanctioned because you have no idea what changed between two snaps. We know exactly what changed. Unless you, of course, mount the two snaps, diff it for a day, try to comb down the files you look for, and then look what you need. We'll just give you the list of what changed and you can do, you know, granular recovery points. There really should be, you know,

Starting point is 00:08:17 recovery point objectives, recovery point time, but there should be recovery point granularity as well. And we get right down to the file, which people like. Paula, folks have been making claims that their snapshots will replace my backups for at least a decade. And I've always had two problems with that approach. The first is that snapshots are dependent. And it sounds like you solve that by moving the snapshots to another set of spindles and the other problem is that every time I've ever worked the help desk and I have to admit having spent more than my fair share of time as level 27 support there you discover some strange things when you actually take those calls, one of which is that no company on earth actually closes its quarter in SAP.

Starting point is 00:09:13 There's always an Excel spreadsheet involved. And that Jane, who does that, goes on maternity leave. And a call comes to the help desk that says, Jane's on maternity leave and a call comes to the help desk that says jane's on maternity leave and we're not sure what the name of the spreadsheet she uses is and we can't find it we think she deleted it um we know it was around on this date because that's when she closed the last quarter and we know it's in one of these five directories could you find it for us yeah and that means i have to go that means I have to go to the catalog. So how extensive is this catalog?

Starting point is 00:09:50 So this catalog, I'm sorry, I was about to interrupt you. This catalog is fully searchable. So you could look for all the files that Jane wrote before she left. So we could find it on what did Jane write before she left. We could say, well, we know that it was Q2 2015. So you could do a search of the content Jane modified to find out what files Jane modified that had Q2 15. If you knew it was an Excel file, you could do a search to say, let's find the file Jane, you know, between this time window. You get my drift. It's very, to say, let's find the file Jane, you know, between this time window. You get my drift.

Starting point is 00:10:25 It's very, you have, it's, you know. Oh, I really like the Jane modified part. Yeah, yeah. Because we don't get that in the normal backup application. All we would get, you know, all I could do is name and location and date. Your analytics are going beyond just the file system metadata. You're getting into the audit logs as well, right? Well, we think audit logs are not the way to track information.

Starting point is 00:10:53 So basically, storage is always known who is reading and writing things because it had to authenticate it. But once it authenticated it, it dropped that information. So what we've done is we've crafted that metadata and basically send it over our B2B or our board-to-board path for our active intelligence configuration. So we're getting the real-time activity streams as data is coming into the box by users, all tied into Active Directory LDAP and NIS. So you can keep it much tighter than a log, and we put it in a searchable form. So we're collecting metadata along the way. So you'll see other people who talk about being data aware

Starting point is 00:11:35 will talk about clients. When they talk about clients, they mean IP addresses, so who mounted it. That's absolutely interesting, but you really want to know levels below that as to who's reading and writing it at an individual person level, not as a machine level. I want to get back to snapshots. Now, normal snapshots in the olden days were, over time, they became pointer manipulations to data that's sitting on disks that says this is an older track or this is an older block or

Starting point is 00:12:05 something like that. Now, in your case, if you're trying to put those snapshots on a different partition, are you cloning the data itself or are you creating some sort of a metadata structure that still points to the primary partition? It's a clone of the data itself so you have it's like taking a shot it's like taking a snapshot when you first create uh um a one so if you first create a lot when you take a snapshot then you get a peel off of um all the copy on right data right so we're able to have um if you think about it sort of like a full backup with with uh virtually full the rest of the way. So we have one full copy of the data and then everything else, we're deduping and compressing as well, so everything else

Starting point is 00:12:56 is incremental, but it looks like a full all the time. So we're always physically sparse after the first one, but look virtually full. So you don't have any of these problems with incremental versus full backups. Oh, that's good. That's good. I like that. And you're not copying the data from one set of spindles to another at the snapshot time. You're doing that continuously. Yes. So basically, if you think about how HA works in a storage array, your job is to mirror the writes to the other controller. So if the primary controller goes down, you have a copy of the writes.

Starting point is 00:13:31 It's a contract you have with the server that you won't accurate until you have it in two places. We take that mirrored copy of the writes in real time, and that's how we construct these new kinds of snapshots or discovery points. We construct it from the fine-grained writes that happen as part of the HA stream. Well, I like this idea of using a passive controller as your intelligent system there. Not only is it maintaining the continuous data protection solution, but it's also doing metadata control and logging. And you've mentioned something about cracking the files themselves. Do you understand the text in some of these files? We do, yeah.

Starting point is 00:14:14 We process about 400 different file types. When you look at it, it really covers about 90% of the human-generated unstructured content in the world. And we full- text index that. And as we're doing that, we also have a pretty intelligent rules engine instead of the data gravity intelligence controller that looks for well defined patterns. So, you know, Susie saves a spreadsheet

Starting point is 00:14:37 with everybody's social security numbers in the public drive, which happens to be on a virtualized file server running in a virtual machine. We know that within the process of that discovering point that that just happened, and we can alert that back to the business. Sure you don't have a cray or something for a controller here? You can't do all this stuff.

Starting point is 00:14:53 You can't. It's impossible. Tell me it's impossible. Xeons are getting powerful, Ray. Yeah. Really? More is a valid law. When we kind of started out the gate, it probably wasn't possible.

Starting point is 00:15:07 But as the product matured and we came to market, the folks at Intel have definitely capped up. The amount of cores that you can put in a server nowadays make it possible. All right. So you've got 24 cores and 7 terabytes of DRAM to do all this stuff. No, I don't believe it. You're missing the brain share aspect of it. We have a lot of smart people that have invented what tomorrow looks like today. Yeah, so basically we have people here from companies such as Indeca and Vertica and Natiza.

Starting point is 00:15:40 Oh, yeah, yeah. So we took storage people, which are blockheads, right? We took us blockheads and file heads and object heads, and we married them with smart, you know, data scientists, if you will. And, you know, the trick really is in how to figure out what you need to store and how to do the data manipulations. The other key is when you set up your structures, you can use the same data and transform it to represent different things. Like I can tell you who read and wrote the file, but I can use that for security, but I can also use that to see who's interacting or collaborating through the data and who's an expert from the data. So it's all data transformations.

Starting point is 00:16:21 The really surprising thing for me was, you know, we had come up with a data where storage, it has four facets, like we said, the one that our customers are really gravitating to are, you know, obviously, maybe not obviously, our file analytics, which tell you at a glance, you know, you just bought 10 terabytes of storage last year, and now it's gone. We can tell you where it went, who's using it, both from a space and from a reading and writing perspective. But we can also tell you if that data is clean or not, so if there's data privacy issues in it. And the data privacy is getting to be something IT and security professionals are very worried about. The other thing I learned, and I guess maybe because I'm old, I didn't know this, is this thing called data protection. And when you talk to a security person, they think about virtual data protection, about that it didn't leave the building,

Starting point is 00:17:12 that it didn't have data privacy issues. You talk to an IT person and they think about physical data protection. And so we're starting to map those, you know, you say tomato, I say tomato kind of things. And we're starting to find those definitions. And it's pretty interesting as you start to marry these two worlds, and they're going to have to be married. Because if you don't do this at the point of storage, you lose an opportunity to, you know, you don't protect the castle. You protect everything outside the castle, but you don't protect the castle.

Starting point is 00:17:39 And so when people storm in, you know, they can find stuff, and you don't even know they found it. You mean the alligator-filled moat isn't enough anymore not so much right speaking of florida all right let's go uh all right so you guys support like replication slash mirroring to other data gravity solutions at off-site locations or how does that work we absolutely support that. We do it through third-party partnerships right now, and we've been very successful with that.

Starting point is 00:18:11 And eventually you'll see us integrate it in the array as well. We tried to stop the world so we could get all the features into V1, but no matter how much I asked, it wouldn't stop. So you'll see that coming. And you mentioned deduplication and compression. Is that all that's done in the, I'll say the intelligence engine? Is that how it plays out? We do targeted deduplication because a lot of times it doesn't make sense to dedupe because we know about the content. We know what it makes sense to dedupe. It's both in the primary

Starting point is 00:18:38 side and on the intelligence side, we both dedupe and compress. Compression tends to be very, very good, but dedupe tends to be, you know, if you take a SMB file share, the amount of deduplication you're going to see tends to be pretty small. Well, that all depends on how well those people run PowerPoint because the guys who make a 100-meg logo on every page, that stuff dedupes.

Starting point is 00:19:11 But I'm struck by this intelligent deduplication. So are you guys leveraging your file knowledge about that and figuring out that I've got 700 VMs, all of which are running Windows Server 2008 R2, and therefore WinSock.ell duplicates across all of them? I mean, are you really doing dedupe at that kind of level or is it still some kind of block? It's a block level. Let's block the duplication.

Starting point is 00:19:35 Well, we're knowledgeable about the content so we can get the VMs and we have, so again, this is a roadmap item. We know about the content and we can get more intelligent. You could imagine us, you know, if you were thinking about read caching, for example, if we know you're reading a document and you just read page two, we can guess what your next page is going to be. We're not doing that yet, but you could imagine us moving in that direction. But we really think that, you know, our real value add is in knowing the content and reporting on it

Starting point is 00:20:04 so that your data isn't, you isn't basically a ticking time bomb. And in some cases, it really is the amount of data privacy issues a lot of people have. So are you expecting to – yeah, I see this as like an e-discovery solution, significant version of that. But this is more than that. Are you going after primary storage? I guess that's the question. 100% of primary storage. I keep repeating this and people get it eventually. We believe you paid for the data. You should be able to look inside it. We also believe that having somebody

Starting point is 00:20:40 trying to figure out what's in your apartment in New York by standing on a building 100 yards away with binoculars trying to guess what's going on, while interesting, probably is maybe illegal or illegal, but not necessarily accurate. It all depends on whether you have a PI license. We actually believe we ought to invite you into the house and have you look around, right? Because you really want to see what's going on in the house. Being inside, you get the best view and the most efficient view. Yeah, it's interesting. You definitely have a better mousetrap, but point solutions for a lot of these problems have existed.

Starting point is 00:21:21 And the problem with them is that they have to scan the data periodically. And so, you know, you could go to Northern Park Life and get mediocre file system analytics, and you could go to somebody, Atronis, and get access log scanning, and then you'd make a backup. But that meant overnight, your poor net app had to deliver 100 of its data four times to four different data movers yeah you're absolutely right howard i mean the uh the one thing we hear from customers who bought those solutions is that when they're running those scans everybody in the organization knows that and so they defer those to the weekend when the users aren't there. And they now have

Starting point is 00:22:05 time-lapse photography to take a look at what's happening in their business where they want real-time agility. You know, with data gravity, you have a full motion video that's always on with no impact. Yeah, it's interesting that you guys are in the, you know, deep impact Armageddon or volcano Dante's peak great minds think alike moment. Because, you know, some word imitation and I, knowing how long these things take to develop, I find it hard to believe that, you know, Quadra or Cumulo looked at you guys a year ago and said, we're going to code this really quick. But it seems like... Excuse me? Their marketing team's got it real quick.

Starting point is 00:22:51 Imitation is their form of flattery. It's been fun reading press releases the last couple weeks. So tell me a little bit about the hardware solution that you're selling. It's a hardware, I'm assuming it's a hardware solution? Is that... It's a hardware? I'm assuming it's a hardware solution? Is that true? It's hardware. We're using enterprise proven hardware but we didn't build any of our own hardware.

Starting point is 00:23:11 We've got a 2U 2 controller head if you will or compute node and 4U 24 drive storage. So on the compute node, we've got SSD,

Starting point is 00:23:31 and on the 4U storage node, we've got spinning media. So it's near-line SAS. We stayed away from SATA because in past lives, SATA has been a miserable experience, and so we're sassy. All right, so is your system, does it scale up beyond the 24 drives, or do you cluster multiple units together, or how does that work?

Starting point is 00:23:57 So currently we do not, but you will see us soon providing expansion, and then you'll see us have a way to add more computes to the storage in the coming years. So I'm not – so I could be one of the inventors or co-inventors of scale-out or the scale-out people are doing today. I don't think that's the right way to do it in the future. And so you'll see us do something different. I'll just give a teaser there, but I think that's a very interesting thing to hear from you.

Starting point is 00:24:31 As everyone else starts to do scale out, it's very interesting to hear you say we did scale out. We don't think that's the right way to do it this time. It's another way. It's actually exciting to do it this time yeah there's another way actually i think to do in 2001 um but the problems you're solving in 2001 and 2003 um you have different problems now because now you want computes close to the data um and so you want to think about how you scale and it's also if you start to do the economics it's an you know having lived this it's an expensive way to scale right um it's also, if you start to do the economics, having lived this, it's an expensive way to scale.

Starting point is 00:25:07 It's also a network landmine as you start to get from 4 to end nodes when you start to look at, especially if the server network has to match the storage network. It's not that the switches are absolutely good enough. It's just you've got to have a really good network engineer architecting this thing to get the kind of bandwidth and latencies you need across the network, even with 40 gig. So the disk drives are like 4 terabyte or 6 terabyte disk? Go ahead. We do 2 and 4. I learned a long time ago you want to let any disk drive. So I'm going to get in trouble.

Starting point is 00:25:46 You can cut this if you want or keep it. I learned a long time ago staying a little bit behind the bleeding edge technology curves for disks is always a better plan from both price performance and just availability and stability. It's not a place where I want to innovate. I want someone else to approve and they work. And then I'll be a follower. Any SSD capacities that you install on the compute node? Right now it's 800 and 400 gig. Yeah, yeah, yeah.

Starting point is 00:26:19 Having done my penance with cases of disk drives fresh out of the... This is the first production run. I can understand how not wanting to be a pioneer because of arrow scars is a good idea. Oh, God. Let's not go there. So Equalogic came to be the first people when I was working there. This was before we got bought by Dell. We'd always be the first one out with a drive. And I have to tell you that was not always as as seamless

Starting point is 00:26:45 as you'd want because you'd be you'd be basically working with drive vendors to qualify it and that's not something we're going to get into okay i gotta ask the relative question here so uh hyper v support um it's coming it's absolutely coming in fact, I'll tip my hand a little bit. It's not shipping right now, and it won't be in our next release, but you'll see it. In fact, it was on my screen, and I was showing it to Dave. So I'm not good at the whole secret thing. So it's FileBlock. It's SMB and NFS. It supports VMware, VMDKs.

Starting point is 00:27:24 So VVOL support, I guess, is the other required question. I mean, honestly, if you look at our architecture, we have already essentially built VVOLs into the products. We don't have to rush to solve a legacy problem like a lot of other people have to by adopting VVOLs. We already give that per virtualization awareness. I think you'll see us close the loop with some of the tenants that support VVols, like Vaza and some of the VAI tenants that are on our roadmap we're closing the gap on. But most of our customers who see what we do today already feel they already got VVols.

Starting point is 00:27:59 And you guys are basically – Yeah, because, I mean, you do per VM snapshots. You don't currently do replication at all, so you can't do per VM snapshots. You don't currently do replication at all, so you can't do per VM replication. And the only other thing that I usually am yelling at people about for viewalls is support for VS API for data protection, but I'm not running conventional backups if I use you guys anyway. That's correct.

Starting point is 00:28:24 We don't even use the APIs for data protection at all. We're already storage. I already know what to change. I don't need to change block tracking. And I can – I still invoke VMware Snapshot, obviously, for quiescence, but we only carry it just long enough to get that quiescence happen. We don't have the committal tax problem of carrying a VMware Snapshot for a backup. Yeah, yeah.

Starting point is 00:28:45 So the other question, I guess, is VSS support. Yeah. You're probably reading our roadmap screen behind us. No, it's all right. It's just a natural evolution of us closing the gap. We do invoke VSS quiescence through VMware Tools and today for file system. Going up the application stack and handling application logs and log management is something you'll see from us here

Starting point is 00:29:12 in the second half of the year. Okay, you want to talk a little bit about your tiering capabilities? I mean, with disk and SSD, typically it's some sort of either it's a cache or it's a tier. And how does it play out in your space? So we try to – so we're content aware. So we actually try on the write side to always be writing, you know, full-stripe writes, and they're always sequential writes. So, you know, write caching don't get as much, you know,

Starting point is 00:29:47 it's not really caching. If you're doing some kind of a write log, it's just the acceleration isn't as big a win for us. And we've got read caching where we're smart about how we're creating the caches. But I want to make sure that when people talk about us, they realize that, yes, we have great storage, but there's lots of great storage out there. What I think our value we bring to customers is the insight into the storage,

Starting point is 00:30:11 specifically around, you know, in a lot of places like the state of Florida, you violate a data privacy, it can get really expensive really fast. I think it's like $1,000 a day for 30 days, and then it goes to $50 with a maximum of half a million bucks, if I'm remembering. Dave, am I remembering? I'm close. You're close. Right? And then you go look at the state of Massachusetts, it gets expensive. There's like 32 different states where somebody did something careless, cut and pasted something out of an Excel spreadsheet, dropped it in the public share, and then somebody hacked in and got it, it could start to cost you a fair amount of money. The other thing we've learned is

Starting point is 00:30:50 there's a ton of data that's dormant in arrays and no array vendor, but probably us tells you, you should be either moving stuff to the cloud or deleting it if you're not using it, but you have no visibility today. Do you know, do you have like a terabyte of data that the person doesn't even work there anymore and the stuff is video, right? So... Oh, I definitely do. ...by those services, right? And so, and you know, there's nothing good that can happen with storing stuff you don't

Starting point is 00:31:20 need to store because it's costly and there's probably something in there that's going to get you into trouble we have not we've probably scanned i don't know hundreds of millions of files you know probably close to um a petabyte of storage and you know we found most people you know that we've looked at their data have always found that basically the ROI is just what we found to kept them out of something that would have been expensive to mediate. So that's where I think where our, our biggest value is in the insights into the data. So we love talking about storage cause I'm a blockhead at heart,

Starting point is 00:31:57 moving to a file system head. Although I think I like blocks better than files. Now that I've dealt with both, although I did file systems in a previous life too, but I think it's the intelligence that we bring to the storage is really where people want to have the conversation with us and it's interesting because you know we see um the IT people and sometimes in small companies the security people are the same people sometimes they're different when we started at Equalogic we said well now the network people and the storage people are gonna have to talk. I wonder how that's gonna work out for us and it worked out pretty well. When we started

Starting point is 00:32:32 Data Gravity and started selling we're saying well now the IT people and the security people are gonna talk, have to talk and that seems to be working out pretty well for us as well because security is top of mind for most IT organizations today you know and for us it's because security is top of mind for most it organizations today you know and for us it's kind of you know it's sad but good because there's a break in at least once a month and most of them you know people call it a breach but it's a theft right someone broke in they took stuff i call that a theft right a breach seems someone got into the castle but it was okay because we were playing some video game and you shot them and everything was good.

Starting point is 00:33:06 A theft, right? Somebody got in, stole stuff, and then used that stuff. So it's almost like a kidnapping because they got in, got your social security number, now they're you. So it's sort of a virtual kidnapping. Yeah. So let me try to go back to the data-aware stuff. So are you doing a full-text index of the files? Are you maintaining that, obviously, on your store?

Starting point is 00:33:30 The first question is, are you providing a full-text index? We are for the human-generated unstructured content that we can process. We do wrap that with a faceted search engine experience, which users get extremely quick with us. I mean, one of the unique things about data gravity is if you look at our interface, it goes from the basement to the boardroom. If you log in as an IT professional, you get a rich storage administration experience. You get everything you expect from storage. We also extend this all the way to the end user. And when the end user logs in, they felt that they just logged into Google.

Starting point is 00:34:04 If you ask a user today how they find information, they say, I fire up my web browser and I ask Google a question. We essentially allow them to do the exact same thing on the corporate assets, the enterprise data that's at rest. We let them ask those same questions and get that same information in that exact same paradigm they're already used to. And so that's driven out of our full-text indexing capabilities that are in the product. And the index is roughly, what, 10% of the storage that you're indexing? Is that how it plays out? It really depends on how much text, how much duplicate text. And so I'd say 10% is fair. I've seen less, I've seen more. But it really depends on the content

Starting point is 00:34:45 you're looking at. And that's also spread across SSDs and disk as well? No, that's just on spinning media. Okay. So basically everything that the backup or the intelligence controller does is spinning disk space, right? There is a metadata acceleration piece, and there's a bit of a read cache piece, but the rest of it is spinning media.

Starting point is 00:35:23 And the active controller that's doing the storage activity is primarily maintaining the SSDs. Is that how I read this? I'm not sure. Well, we've got it. We've got mirrors. So when we look at our pools, the pools are mirrored so that the controllers can fail over and also so you can use either pool as the primary in case of a disaster. So it's very much set up. It's sort of interesting.

Starting point is 00:35:38 It's the merger of what you would do for best practices for storage with what you would do for best practices for big data analytics. So you've done a merge in the environment. But you want to know that if you lose the primary pool because somebody came and pulled all the drives out and for whatever reason or you triple faulted for whatever reason or you just wanted to do a task to see what would happen, you want to be able to run your primary storage off of that intelligence pool. Because as a storage person, right, the storage gods say primary storage must always run, right?

Starting point is 00:36:16 I don't care what the religion is, but the storage god always says primary storage must always run and must not be impacted. Yeah, yeah, yeah. Yeah. And the only cardinal sin is data loss. Only cardinal sin is data loss, which is not – when you get to the hate, the gates, they do not let you in after if you've had that happen to you. All right, getting off the religious discussion. So I'm assuming you provide like a RAID 6 kind of solution in the back? It's a RAID 6 for the HDDs, we do something interesting because we went and asked customers,

Starting point is 00:36:50 so how much space do you want for your backup intelligence versus how much do you want for your primary? And the answer was, I don't mean to be flippant, but maybe I do, was, gee, I don't know. How much do you use? I'm not sure. It varies. So what we ended up doing was creating dynamic space allocation. So the primary pool and the intelligence pool started as default size, and then there's a free pool so that you can grow based on capacity or performance needs. So you can see that those two pools can grow, and they can also shrink. So, for example, if you decide to delete a bunch of discovery points because you want space back, you can, and you can reclaim those disks. So we've done a dynamic space allocation, which made it easier for our customers to figure out how to size things.

Starting point is 00:37:33 Yeah, and it's in provision to both LUNs and file systems, I guess. Yeah. Correct. So, wait, I'm just stuck on shrink because everybody says they can grow, but shrinking. So after I finish a project and I – Go ahead. Well, I mean, it does. It does.

Starting point is 00:37:55 Absolutely. I agree. I agree. We've had thin provisioning for 15 years, but thin unprovisioning with trim or unmapped still is questionable in many people's minds. So if I complete a project and I delete the

Starting point is 00:38:12 200 terabytes of VMs that I was using for that project, some of those drives could return to the free pool? That's cool. Absolutely. We're smarter than the average bear. Yeah.

Starting point is 00:38:29 Well, full-text indexing is interesting. Compression and deduplication and thin provisioning is also interesting. File unified storage is another event. You know, you've got a lot of parts to this storage puzzle. Yeah. You know, at one point I made a list of the third party products I would need to duplicate what data gravity is doing. And I came up with six that, you know, I would need. Yeah. And that's assuming that that I had decent hybrid storage, but I would need, you know, another half dozen products to cover all the analytics pieces. So it's

Starting point is 00:39:08 actually quite interesting. So what do you see your market? I mean, this solution is for small, medium, mid-range solution, or is it high-end enterprise guys? I mean, where are you guys selling

Starting point is 00:39:23 this sort of thing? We designed the mid-market, Ray. We came out looking at the mid-market as someone who wants this but never is going to obtain it on their own because they're never going to buy those six products, however we're just talking about, and are never going to integrate it. But as we've been going to market, we built with the mid-market focus. We are getting pulled upstream. We're not taking down entire enterprises, but we're getting departmentally used.

Starting point is 00:39:48 The one thing we know about data gravity is that 100% of the customers that have tested us have found exposure. And so it kind of becomes the Trojan horse. One department brings it in for a certain set of data, and then it just starts landing and expanding. How often do you get drawn into the Fortune 500s by legal? Quite often. I think legal slash security audit compliance have definitely resonated with what we do. They spend hundreds of man hours trying to post-analyze after something has happened. And when they realize that they can just search on that and quickly surface that, you know, it's almost the holy grail in their world.

Starting point is 00:40:29 And not all data is created equal. We don't, by all means, walk in the door and say, put everything on data gravity. But there is definitely a strong set of data that should live on us. And when they see that and they start aligning to it, we get that one use case and we grow from there. Yeah. So you have the, you know, we have, you know, law firms, accounting firms, state and local government. We're in manufacturing. We're in small financials. We're getting traction on the bigger companies at the departmental level. But there's some common patterns, like, you know, somebody's going to leave, just left the company.

Starting point is 00:41:02 Okay, what did they take with them? Somebody can't find something let's help, you know, they've got the, uh, an end of quarter report they have to do, and they can't find the file. Where is it? Did they delete it? The favorite one is, you know, people cut and, but they don't think they, they think they copy, but they cut and paste it. So they move stuff and now they can't find it. So now the guy cannot go, you know, or a lady doesn't have to go scrolling through trying to figure out what happened to the file. They can just go look at what that person just did and do the restore. Or the person could do the restore themselves because we do have self-service restore and some customers are using that. So I believe this, you know, just like with Ecologic, I thought automated storage made sense, and everybody thought we were insane, and now everybody does it.

Starting point is 00:41:49 I do believe that, you know, data-aware storage, storage that knows about the data it's holding, is going to be the norm in five years. So you heard it here. I'll podcast 100 when we come back with you guys. Everybody will be talking about it. So is that self-service restore integrated into the Windows volume shadow copy for shared folders mechanism? So I can just right-click and say previous versions?

Starting point is 00:42:14 Or do I go to a web interface? Not today. It is through our web interface. But it's pretty interesting when you watch an end user, when you can say, here's everything that you recently deleted and just let them undelete it yeah i mean same from the help desk perspective you know you don't have to play that game of where did you save it what was it called all you have to do is search

Starting point is 00:42:33 on the user's name we show you everything that they recently done you know and you can you can practically read them back the list of files that they recently deleted and just uh help them self-service themselves and when end users log in they can see things like who's been reading the stuff I wrote. So I joke around when we have meetings here, if everybody hasn't read the file, I'm not going. So if I put out something, I can say, well, I guess nobody's ready for this meeting yet. All right. Well, we're about the end of our podcast. Howard, do you have any last questions?

Starting point is 00:43:03 Well, it's just when do you put the sync and share interface on it so that I can let the sales guys on the road get to the data from their laptop? Oh, God. You don't know the roadmap item, right? Dave paid you for that. You can already do it.

Starting point is 00:43:19 There's some products you can sit on top of us. We'll be deploying that internally shortly. And then you deploying that internally shortly. And then you could see that coming. The good thing about this is if you're doing a normal storage array, the number of features you can do are about six, right? You can do better snapshots.

Starting point is 00:43:36 You can do better clones. You can add replication. And then you're going to be about, we're more holistic, so there's hundreds of things we can put. Now we just have to prioritize. Yeah, yeah. All right. Dave and Paula, any last comments?

Starting point is 00:43:50 Yeah. Yeah, I would just say that anybody who's listening out there that wants to experience it, we're not shy. Come knock on our door. We're happy to show you the value of data-aware storage in your environments, and we do a weekly webinar, so anybody who wants to see it in action, come join us. All right. Well, this has been great.

Starting point is 00:44:10 It has been a pleasure to have both of you here on our call. We've enjoyed it as well. Absolutely. Next month, we'll talk to another startup storage technology person. Any questions you want us to ask, please let us know. That's it for now. Bye, Howard. Bye, Ray.

Starting point is 00:44:24 Until next time, thanks again, time thanks again Dave Paula yeah thank you

Your Ad Here

Grey Beards on Systems - GreyBeards talk data-aware storage with Paula Long & Dave Siles, CEO&CTO DataGravity

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.