Grey Beards on Systems - 111: GreyBeards talk data analytics with Matthew Tyrer, Sr. Mgr. Solutions Mkt & Competitive Intelligence, Commvault

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Matt Lieb. Welcome to the next episode of Greybeards on Storage podcast, a show where we get Greybeards Storage bloggers to talk with system vendors and other experts to discuss upcoming products, technologies, and trends affecting the data center today. This Great Redundant Storage episode brought to you today by Commvault was recorded on December 1st, 2020. We have with us here today Matthew Tyrer, Senior Manager, Solutions Marketing and Head of Competitive Intelligence at Commvault. So Matthew, why don't you tell us a little bit about yourself and what's been going on with your Activate solution these days? Hey Ray, thanks for having me on the podcast. It's great to be able to hop on here and chat with you guys. So for me, I am what you would call a Commvault veteran being with the

Starting point is 00:00:56 company almost well over 12 years now. But prior to that, I had experience with a bunch of other tech companies, including Dell EMC, and I've worked around the world on different data center and cloud projects. Now, that's me in a timely solution these days. You know, given the current climate and all the remote work that we're going through and just data being much more dispersed, you know, at its core, Activate fundamentally helps users and customers to know more about the data that they have. And I've always looked at it as if you know more about the data that they have. And I've always looked at it as if you know more, you can do more. So Activate is a component of the Commvault solution set, I guess, or platform, and is focused on understanding the data in the data center, across data centers as well, or within the environments? Well, that's a great point to really highlight is the fact that something that Commvault always has prided ourselves on is that underneath all of the different functionality that we offer is a single platform, a single code base.

Starting point is 00:02:22 So this is just an extension of the Commvault data management platform, specifically to deliver features around data governance, compliance, e-discovery, analytics, and those types of data centric operations. And to your point, what kind of data can I manage? I can be looking across physical virtual hosts, either within a data center, across multiple data centers, in the cloud. So, and even within the backup set. So what's nice is, you know, I know a lot of people know

Starting point is 00:02:58 Commvault from, you know, our position, you know, as a leader within the backup and recovery space, but we don't have to be backing up the data to run the analytics on it. So you look at, you know, very heterogeneous environments and it's great that we can look at the live data that's out there and active in the environment, but also the historical stuff that we've captured through our own backup and recovery processes. So, I mean, was this the sort of stuff that admins would do on their own by hand through searching through catalogs and directories and stuff like that? In the storage space,

Starting point is 00:03:34 they always used Excel spreadsheets to try to manage their storage. But at some point, it got to the point where 15,000 tabs was not going to work anymore. Well, I mean, that's the big thing with so much data out there and the variance. You know, I mean, you've got workloads over here, over there. You mentioned the cloud, you mentioned data centers. I mean, it's everywhere, right? Well, and now we add to the mix just the proliferation of kind of the remote data set. and I and Matt, we're all sitting here spread around different geographies and

Starting point is 00:04:05 the reality of it is we've got business critical data sitting on our remote desktops instead of what used to be on a shared drive probably in the data center just given the current climate. So there's all this data spread out and if you have to

Starting point is 00:04:21 manually be going through and understanding, well what is this? What is this? It makes that task a lot more of a challenge. It would take many years of effort to go find all this data and try to understand what the status is of it and that sort of thing, right? Well, that's why fundamentally we've tried to drive all this automation into it. So running the analytics on the back end, making sure that we've got all of that metadata and content specific data. So not just looking at, you know, the headers and the who, what, where, when kind of thing, but actually diving into the content itself to understand, well, what's contained within there? Is there, you know, maybe data related to privacy or maybe this is a very sensitive data set.

Starting point is 00:05:05 Yeah, so it automates kind of the collection and processing of that data. And, you know, of course, behind the scenes, there's a lot of machine learning to drive some of that process. Hmm, understanding of what the data is, how sensitive it is, and that sort of stuff. That's interesting. I was going to say, Matthew, you mentioned that it didn't necessarily have to be housed within a Commvault backup. So you're collecting metadata

Starting point is 00:05:36 against data that's not even being backed up. Potentially. Yeah, that's right. So you look at other products in that space. Some features from other vendors, like other backup vendors, look only at the data that they're protecting or only the data that they're archiving. Others that play purely on the analytics side is only capable of looking at what's living currently in the environment, but they have no view into the historical data. You know, what was there last year? What was there two days ago? So that would line some backup someplace, right?

Starting point is 00:06:17 Right. So what's really cool about the Commvault data services there is it does have that fairly unique ability to look at both sides. So looking at data that might be live in the environment, that maybe isn't, yeah, maybe it isn't even being backed up by Commvault for various reasons. Maybe it's being protected by another product or maybe it's. Well, that's where I was going to say, and that's part of kind of the feature set is to provide that risk analysis and profile to look at it and say, hey, you know, you've got this data over here that's not being protected. Do you want to do something about that? Right. That's exactly what I was thinking. And I think that I don't know that I've ever heard of a platform that covers unarchived data. Obviously, the data is the lifeblood that feeds your company. And we talk about it as the new currency. It's gold. That and your personnel are the things that you absolutely need in order to make your company work. But if you can do that and look

Starting point is 00:07:37 at, who knows, remote workstation data that's not being backed up or what have you, and being able to flag the administrator and say, hey, are you aware that this, what appears to be information you might want is not being backed up, you know, that's a massive step towards the integrity of the DR scenario, but so much more. Would that be considered risk mitigation kind of thing? You mentioned risk. I mean, you mentioned disaster recovery, Matt, but also when you look at the obvious risks around malware, insider threats, ransomware, you look at any of

Starting point is 00:08:27 these risks, inclusive of disaster recovery, and I look at those risks as, you know, an extension of anyone's disaster recovery plan. But I mean, you can't recover what you don't know about. And you look at, you know, prioritization of recovery. So it's like, you know, how can you recover your data when you don't even know what data you have? And so these analytics can help you to prioritize maybe the order of operations. Hey, you know what, we really need to recover these things first. But it can also help you identify maybe other data sets are at risk, you know, the proverbial file with the payroll on it that's open 777 for everyone to see so so so it gives so it's not just about seeing okay what data is being backed up and what data isn't it can look at and say hey this data contains sensitive information maybe we should quarantine it or move it somewhere

Starting point is 00:09:21 you know on the whitelist servers or maybe we should delete it entirely. Maybe it's like a data spillage, data leak kind of thing, or completely orphaned. It's like, holy cow, we need to reassign ownership and permissions for this file so that it can be properly managed. So there's a whole myriad of things that you can do once you've kind of got that extra knowledge

Starting point is 00:09:44 around that data. Visibility and stuff. The other challenge that has emerged over the last, oh, I don't know, decade or so is that compliance regimens change depending on where you're at in the world, right? I mean, so if you're a customer that's got multi-site data centers located in Europe and Asia and America, et cetera, et cetera. I mean, the different compliance regimens would cause you some sort of heartache to try to do this by yourself, right? Oh, definitely. And that's really where the data governance component of

Starting point is 00:10:17 Commvault Activate comes in, is being able to build out algorithms and intelligence around looking specifically for data, you know, associated to these regulatory guidelines. So looking for personally identifiable information or PII. And again, you know, you can't properly manage it or be compliant to these different regulations if you don't know what data you've got under your custodianship. I heard that California's got their own flavor of GDPR. Yeah. Yeah. CCPA. I mean, there's, you know, it's a reality that all of these regulations are going to continue to come in and enforce just as people continue to put more and more scrutiny on the handling of their data. I mean, just look at the consumer side of it where, you know, people are like, wow, I don't want Facebook having that information about me or,

Starting point is 00:11:17 wow, I don't want these websites tracking my cookies. You start extending that, you know, into the enterprise and into the corporate space where it's like, hey, I need to properly manage, you know, my customers' data so that I'm not the next headline about a data leak, you know. Somewhere you mentioned privacy in this regard, too. So what's data privacy mean in an Activate sense? Well, it can more helping to support the implementation of kind of this data governance and data compliance initiatives within the company because it could change. I mean, you know, you look at kind of the standard data set, you know, associated with, you know, HIPAA, health care records or, you know, social insurance numbers or credit card numbers, those ones. But, you know, maybe for different businesses, different types of data have different sensitivities. Maybe an organization

Starting point is 00:12:11 needs to track specific contract numbers or trade secrets or IP and stuff like that. Yeah, patents or something like that. So they want to, you know, hey, I want to always know where all this information is. And if you find something somewhere that it shouldn't be, flag me or again, back to that automation side of it, flag it, but also automatically lock it down so that we can deal with it. And we've kind of mitigated the. Which brings up another question. The auditing factor, right? If you see somebody, a non-HR person, looking at credit card information or whatever PII might be sitting within the system actually notify or look back and see who was the nefarious person who did open this and might actually have a copy of it? Well, that's kind of where you're bridging into the insider threat detection side of things.

Starting point is 00:13:17 So it can certainly keep track of who has access to what and who has been accessing what. So to your example, maybe it's perfectly fine that that HR person can look at that data, but it's not perfectly fine that that HR person has a copy of it in their home directory. So you could kind of start going from that and again, you know, drive action against it. So that's really one of the, again, a big differentiator there is it's not just reporting and giving you pretty dashboards and pie charts and, you know, dynamic graphics and stuff like that. It gives you the ability to actually drive action from those reports. So, you know, you're getting, you know, this input and, you know, it's like, hey, wow, I need to actually deal with this right now you can right from those dashboards initiate action to either mitigate the threat so maybe

Starting point is 00:14:11 it's a oh you know that file definitely needs to get uh dealt with or maybe it's something more mundane where it's like oh wow i just found that we've've got 100 copies of that one database sitting around because people have been just cloning it for dev tests or DBAs doing the dumps every time they change a table. So being able to even just reduce some of that data sprawl by saying, wow, I don't need 50 copies of that. Let's get rid of all of them except for the one that we need. And that can really help with everything from migrating to the

Starting point is 00:14:52 cloud. Do I need to move all the garbage to? No. Or data consolidations. Maybe I want to move within the data center from all these other legacy servers into maybe an HCI platform or something. So again, there's just so many different things you can do from a data management perspective when you know more about that data. And Matthew, you mentioned earlier that the solution also cracks into the content. Can you want to talk a little bit more about that? I mean, it's actually a content index? Yes, Yes. So looking beyond kind of that

Starting point is 00:15:27 top level metadata of, you know, who owns it, how big is it, what file type it is, you know, those types of thing, it can go into the contents itself and understand it. So I'll use maybe an email example first is, you know, the typical ones to get to from attachments subject but i can start actually going in and seeing okay well the email itself is got a discussion between ray and matt it has to do with a contract and there's an attachment and in that attachment there's uh you know some personal information or something so so i can start like going into, you know, those different layers so that I'm not just managing it up at this top layer. You know, I've got that deeper insight into the data

Starting point is 00:16:16 and I can start looking at, you know, especially these days with emails, you can say, you know what, I need you to find me all the emails, you know, related to the conversations that Matt, Matt, and Ray had because I think they were doing something nefarious. And so you can go in and see it and find it. And the best part is, and this is a conversation I've had with IT people for years, is the best part, and this always perks their ears up, is the ability to delegate and define role-based access with this, especially when you get into the legal side of things. I mean, I remember when I was a help desk person working in the data center, it'd be

Starting point is 00:16:59 like, hey, Matt, can you find all the tapes that have these files on it? And it's like, ah, you know. Yeah, are you kidding me? It's a reality, though. I mean, this thing's happened all the tapes that have these files on it? And it's like, ah, you know. Yeah, are you kidding me? It's a reality, though. I mean, this thing's happened all the time, right? Oh, exactly. Oh, yeah. But now I could actually set that up so that, hey, you know, the legal department or the CSEC team, you know, they've got their own access to look at that data.

Starting point is 00:17:24 And maybe I want them to look but they can't export anything so it's like you know go find what you're looking for and then come to it and uh and i can i can present and export that that data with the you know audit chain of custody and all that this is brilliant it could take or shift some of the burden off of IT for these tasks. Yeah. So I worked in e-discovery for a while and content management. And, you know, the idea of, say, a competitor is looking for, you know, digging through or has a mole inside an organization trying to get competitive information and a lawsuit gets brought up. The e-discovery on that can be really, really daunting. But you are really only digging through this massive, and I have to imagine that the database for the content filtering and the content management side is quite large.

Starting point is 00:18:38 But you're not digging through all the existing data. You're digging through the metadata to discover that kind of information. And if you have to then run to the data itself to do digging through, that same database tells you exactly where it is because that's part of the metadata. So that discovery function is going to be a far quicker process. And certainly when you bring in lawyers to do e-discovery, if they don't have to dig through every ounce of data that sits within your environment, then the rapidity at which they can access that information would reduce the overall e-discovery costs by some factor of major numbers.

Starting point is 00:19:30 Oh, definitely. Especially if companies are going to outside counsel or other parties to leverage those services. I mean, that can get real costly real fast. So having a mechanism to really quickly go in, find what you need, and then if needs be, even put it under a legal hold to make sure that, you know, let's say it is backup data, you don't want the retention to expire on that. So I can actually grab all the data specific to that search and put it into its own special bucket, if you will. You know, gone are the days where it's like, you know, a legal hold would come in and it's like, okay, well, put all the tapes in a box and let's buy some new tapes.

Starting point is 00:20:15 I don't need to throw the baby out with the bathwater. If I've got like five files on, you know, a multi-terabyte tape, yeah, I can just grab those files and I've still got the full chain of custody and say, okay, well, yeah, so those are the original files that came from here and here and here and here. And here's just those files. So it's pretty granular.

Starting point is 00:20:37 So Matthew, many of the analytics solutions that I'm aware of are focused on, I'll call it savings of storage and stuff like that by finding redundancy or finding files that no longer matter or just making sure that you have the files that do matter in the proper context and the proper backup sets. And if you're migrating from one site to another, you only have to migrate some of that stuff. Does Activate do those sorts of things as well yeah that's a part of what we call file storage optimization so that's um you know looking at uh data duplicates you know maybe even it can even kind of run some some what-ifs around archival policy.

Starting point is 00:21:25 So it's like, okay, well, let's look at this data set. You know, when's it last been accessed? When has it last been touched? Who owns it? How old is it? You know, how big is it? But then also, to your point, you know, how many copies of it do I have? And we've got a number of customers that are using those analytics specifically for that.

Starting point is 00:21:43 We've got one customer, they're trying to reduce their overall data footprint from over 20 petabytes down to about half that. This is formidable, this is serious money here we're talking about. Yeah, that's not a small chunk of change there in terms of just the costs associated with it. So I look at it as, you know, if you have better data management policies and the analytics can help you with that, that actually translates into overall better storage management and cloud management.

Starting point is 00:22:20 Because now you're not now you've got insight to understand well you know do i need this over here do i need that over there maybe this should be tiered over the into this area or maybe this should be just removed entirely so you know it gives you better control over those different aspects of your storage of your cloud of your your living environment so that you can manage it better. It occurs to me that if Commvault is able to see where your data is located, it might also help to prevent some of the shadow IT from taking place. Somebody opens up a rogue cluster on GCP and Commvault is aware of it because it sees those data flows, it might be able to let audit understand that, hey, you've got this bill coming in from GCP for data storage and compute resources, et cetera.

Starting point is 00:23:22 There's a potential cost savings just by virtue of having that system in place as well providing that visibility oh definitely yeah and and you know kind of like i was saying before you can break it down even by like departmental ownership by individual ownerships by groups so you could actually start building a you know a cost model for who's consuming what and which resources are growing and at which pace or shrinking kind of thing. So yeah, you could very definitely use that to kind of help with the financial aspect of IT as well. So Matt, do you have any last questions for Matthew before we close? No, not at the moment, but I'm absolutely interested in learning more as time goes on. Okay.

Starting point is 00:24:11 Matthew, anything you'd like to say to our listening audience before we close? Sure, yeah. Well, first of all, thanks for tuning in, everyone. There are a number of real quick YouTube videos out there on the Commvault channel that walk through at a high level what file storage optimization, data governance, and e-discovering compliance are capable of. So if you do want to learn a little bit more, you can go there or reach out to us on Twitter, LinkedIn, or right off the website and let's have a conversation. I'm always open to help educate people.

Starting point is 00:24:47 All right. Well, this has been great. Thank you very much, Matthew, for being on our show today. Thanks for having me, Ray. And thanks again to Commvault for sponsoring this podcast. That's it for now. Bye, Matt. And bye, Matthew. Take care. Until next time. Next time, we will talk to another system storage technology person.

Starting point is 00:25:06 Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.

Grey Beards on Systems - 111: GreyBeards talk data analytics with Matthew Tyrer, Sr. Mgr. Solutions Mkt & Competitive Intelligence, Commvault

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.