Grey Beards on Systems - 111: GreyBeards talk data analytics with Matthew Tyrer, Sr. Mgr. Solutions Mkt & Competitive Intelligence, Commvault
Episode Date: December 10, 2020Sponsored by: I’ve known Matthew Tyrer, Senior Manager Solutions Marketing and Competitive Intelligence, Commvault for quite awhile now and he’s always been knowledgeable about the problems the en...terprise has in supporting and backing up large file data repositories. But lately he’s been focused on Commvault Activate their data analytics solution. We had a great talk … Continue reading "111: GreyBeards talk data analytics with Matthew Tyrer, Sr. Mgr. Solutions Mkt & Competitive Intelligence, Commvault"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Matt Lieb.
Welcome to the next episode of Greybeards on Storage podcast,
a show where we get Greybeards Storage bloggers to talk with system vendors and other experts
to discuss upcoming products, technologies, and trends affecting the data center today. This Great Redundant Storage episode brought to you today by Commvault was recorded on December 1st, 2020.
We have with us here today Matthew Tyrer, Senior Manager, Solutions Marketing and Head of Competitive Intelligence at Commvault.
So Matthew, why don't you tell us a little bit about yourself and what's been going on with your Activate solution these days?
Hey Ray, thanks for having me on the podcast. It's great to be able to hop on here
and chat with you guys. So for me, I am what you would call a Commvault veteran being with the
company almost well over 12 years now. But prior to that, I had experience with a bunch of other tech companies, including Dell EMC, and I've worked around the world on different data center and cloud projects.
Now, that's me in a timely solution these days.
You know, given the current climate and all the remote work that we're going through and just data being much more dispersed, you know, at its core, Activate fundamentally helps users and customers to know more about the data that they have.
And I've always looked at it as if you know more about the data that they have.
And I've always looked at it as if you know more, you can do more.
So Activate is a component of the Commvault solution set, I guess, or platform, and is focused on understanding the data in the data center, across data centers as well, or within the environments? Well, that's a great point to really highlight is the fact that something that Commvault
always has prided ourselves on is that underneath all of the different functionality that we
offer is a single platform, a single code base.
So this is just an extension of the Commvault data management platform,
specifically to deliver features around data governance,
compliance, e-discovery, analytics,
and those types of data centric operations.
And to your point, what kind of data can I manage?
I can be looking across physical virtual hosts,
either within a data center, across multiple data centers, in the cloud.
So, and even within the backup set. So what's nice is, you know, I know a lot of people know
Commvault from, you know, our position, you know, as a leader within the backup and recovery space,
but we don't have to be backing up the data to run the analytics on it. So you look at, you know, very heterogeneous
environments and it's great that we can look at the live data that's out there and active in the
environment, but also the historical stuff that we've captured through our own backup and recovery
processes. So, I mean, was this the sort of stuff that admins would do on their own by hand
through searching through catalogs
and directories and stuff like that?
In the storage space,
they always used Excel spreadsheets
to try to manage their storage.
But at some point, it got to the point
where 15,000 tabs was not going to work anymore.
Well, I mean, that's the big thing
with so much data out there and the variance. You know, I mean, you've got workloads over here, over there.
You mentioned the cloud, you mentioned data centers. I mean, it's everywhere, right?
Well, and now we add to the mix just the proliferation of kind of the remote data set. and I and Matt, we're all sitting here spread around different geographies and
the reality
of it is we've got business critical
data sitting on our remote
desktops
instead of what used to be
on a shared drive probably in the data center
just given the current climate. So there's all this
data spread out and if you have to
manually be going through and understanding, well
what is this? What is this? It makes that task a lot more of a challenge. It would take many years of effort
to go find all this data and try to understand what the status is of it and that sort of thing,
right? Well, that's why fundamentally we've tried to drive all this automation into it. So
running the analytics on the back end, making sure that we've got all of that metadata
and content specific data. So not just looking at, you know, the headers and the who, what,
where, when kind of thing, but actually diving into the content itself to understand, well,
what's contained within there? Is there, you know, maybe data related to privacy or maybe this is a very sensitive data set.
Yeah, so it automates kind of the collection and processing of that data.
And, you know, of course, behind the scenes,
there's a lot of machine learning to drive some of that process.
Hmm, understanding of what the data is, how sensitive it is,
and that sort of stuff. That's interesting.
I was going to say, Matthew, you mentioned that it didn't necessarily
have to be housed within a Commvault backup.
So you're collecting metadata
against data that's not even being backed up.
Potentially. Yeah, that's right. So you look
at other products in that space. Some
features from other vendors, like other backup vendors, look only at the data that they're
protecting or only the data that they're archiving. Others that play purely on the
analytics side is only capable of looking at what's living currently in the environment, but they have no view into the historical data.
You know, what was there last year? What was there two days ago?
So that would line some backup someplace, right?
Right. So what's really cool about the Commvault data services there is it does have that fairly unique ability to look at both
sides. So looking at data that might be live in the environment, that maybe isn't, yeah,
maybe it isn't even being backed up by Commvault for various reasons. Maybe it's being protected
by another product or maybe it's.
Well, that's where I was going to say, and that's part of kind of the feature set is to provide that risk analysis and profile to look at it and say, hey, you know, you've got this data over here that's not being protected.
Do you want to do something about that? Right. That's exactly what I was thinking. And I think that I don't know that I've ever heard of a platform that covers unarchived data. Obviously, the data is the lifeblood that feeds your company. And we talk about it as the new currency. It's gold.
That and your personnel are the things that you absolutely need in order to make your company work.
But if you can do that and look
at, who knows, remote workstation data that's not
being backed up or what have you, and being able to flag the administrator
and say, hey, are you aware that this, what appears to be information you might want is
not being backed up, you know, that's a massive step towards the integrity of the DR scenario, but so much more.
Would that be considered risk mitigation kind of thing?
You mentioned risk.
I mean, you mentioned disaster recovery, Matt,
but also when you look at the obvious risks around malware, insider threats, ransomware, you look at any of
these risks, inclusive of disaster recovery, and I look at those risks as, you know, an extension of
anyone's disaster recovery plan. But I mean, you can't recover what you don't know about.
And you look at, you know, prioritization of recovery. So it's like, you know, how can you
recover your data when you don't even know what data you have? And so these analytics can help you to prioritize maybe the order of
operations. Hey, you know what, we really need to recover these things first. But it can also help
you identify maybe other data sets are at risk, you know, the proverbial file with the payroll on it that's open 777 for everyone to see so so so it gives so it's not
just about seeing okay what data is being backed up and what data isn't it can look at and say hey
this data contains sensitive information maybe we should quarantine it or move it somewhere
you know on the whitelist servers or maybe we should delete it entirely.
Maybe it's like a data spillage, data leak kind of thing,
or completely orphaned.
It's like, holy cow, we need to reassign ownership
and permissions for this file
so that it can be properly managed.
So there's a whole myriad of things that you can do
once you've kind of got that extra knowledge
around that data.
Visibility and stuff.
The other challenge that has emerged over the last, oh, I don't know, decade or so is
that compliance regimens change depending on where you're at in the world, right?
I mean, so if you're a customer that's got multi-site data centers located in Europe
and Asia and America, et cetera, et cetera. I mean,
the different compliance regimens would cause you some sort of heartache to try to do this
by yourself, right? Oh, definitely. And that's really where the data governance component of
Commvault Activate comes in, is being able to build out algorithms and intelligence around looking specifically for data, you know, associated to these regulatory guidelines.
So looking for personally identifiable information or PII.
And again, you know, you can't properly manage it or be compliant to these different regulations if you don't know what data you've
got under your custodianship. I heard that California's got their own
flavor of GDPR. Yeah. Yeah. CCPA. I mean, there's, you know, it's a reality that all of these
regulations are going to continue to come in and enforce just as people continue to put more and
more scrutiny on the handling of their data. I mean, just look at the consumer side of it where,
you know, people are like, wow, I don't want Facebook having that information about me or,
wow, I don't want these websites tracking my cookies. You start extending that, you know,
into the enterprise and into the corporate space where it's like, hey, I need to properly manage, you know, my customers' data so that I'm not the next headline about a data leak, you know.
Somewhere you mentioned privacy in this regard, too.
So what's data privacy mean in an Activate sense?
Well, it can more helping to support the implementation of kind of this data governance and data compliance initiatives within the company because it could change.
I mean, you know, you look at kind of the standard data set, you know, associated with, you know, HIPAA, health care records or, you know, social insurance numbers or credit card numbers, those ones.
But, you know, maybe for
different businesses, different types of data have different sensitivities. Maybe an organization
needs to track specific contract numbers or trade secrets or IP and stuff like that.
Yeah, patents or something like that. So they want to, you know, hey, I want to always know
where all this information is. And if you find something somewhere that it shouldn't be, flag me or again, back to that automation side of it, flag it, but also automatically lock it down so that we can deal with it.
And we've kind of mitigated the.
Which brings up another question.
The auditing factor, right?
If you see somebody, a non-HR person, looking at credit card information or whatever PII might be sitting within the system actually notify or look back and see who was the nefarious person who did open this and might actually have a copy of it?
Well, that's kind of where you're bridging into the insider threat detection side of things.
So it can certainly keep track of who has access to what and who has been accessing what. So to your example, maybe it's perfectly fine that that HR person can look at that data,
but it's not perfectly fine that that HR person has a copy of it in their home directory.
So you could kind of start going from that and again, you know, drive action against it.
So that's really one of the, again, a big differentiator
there is it's not just reporting and giving you pretty dashboards and pie charts and, you know,
dynamic graphics and stuff like that. It gives you the ability to actually drive action from
those reports. So, you know, you're getting, you know, this input and, you know, it's like, hey,
wow, I need to actually deal with this right now you can right from those dashboards initiate action to either mitigate the threat so maybe
it's a oh you know that file definitely needs to get uh dealt with or maybe it's something more
mundane where it's like oh wow i just found that we've've got 100 copies of that one database sitting around
because people have been just cloning it for dev tests
or DBAs doing the dumps every time they change a table.
So being able to even just reduce some of that data sprawl by saying,
wow, I don't need 50 copies of that.
Let's get rid of all of
them except for the one that we need. And that can really help with everything from migrating to the
cloud. Do I need to move all the garbage to? No. Or data consolidations. Maybe I want to move within
the data center from all these other legacy servers into maybe an HCI platform or something.
So again, there's just so many different things you can do from a data management perspective
when you know more about that data.
And Matthew, you mentioned earlier that the solution also cracks into the content.
Can you want to talk a little bit more about that?
I mean, it's actually a content index?
Yes, Yes. So looking beyond kind of that
top level metadata of, you know, who owns it, how big is it, what file type it is, you know,
those types of thing, it can go into the contents itself and understand it. So I'll use maybe an
email example first is, you know, the typical ones to get to from attachments subject but i can start actually
going in and seeing okay well the email itself is got a discussion between ray and matt it has to do
with a contract and there's an attachment and in that attachment there's uh you know some personal
information or something so so i can start like going into, you know, those different layers
so that I'm not just managing it up at this top layer.
You know, I've got that deeper insight into the data
and I can start looking at, you know, especially these days with emails,
you can say, you know what, I need you to find me all the emails,
you know, related to the conversations that Matt, Matt, and Ray had because I think they were doing something nefarious.
And so you can go in and see it and find it.
And the best part is, and this is a conversation I've had with IT people for years, is the best part, and this always perks their ears up, is the ability
to delegate and define role-based access with this, especially when you get into the legal
side of things.
I mean, I remember when I was a help desk person working in the data center, it'd be
like, hey, Matt, can you find all the tapes that have these files on it?
And it's like, ah, you know.
Yeah, are you kidding me? It's a reality, though. I mean, this thing's happened all the tapes that have these files on it? And it's like, ah, you know. Yeah, are you kidding me?
It's a reality, though.
I mean, this thing's happened all the time, right?
Oh, exactly.
Oh, yeah.
But now I could actually set that up so that, hey, you know, the legal department or the CSEC team, you know, they've got their own access to look at that data.
And maybe I want them to look but
they can't export anything so it's like you know go find what you're looking for and then come to
it and uh and i can i can present and export that that data with the you know audit chain of custody
and all that this is brilliant it could take or shift some of the burden off of IT for these tasks. Yeah.
So I worked in e-discovery for a while and content management.
And, you know, the idea of, say, a competitor is looking for, you know, digging through or has a mole inside an organization trying to get competitive information and a lawsuit gets brought up.
The e-discovery on that can be really, really daunting.
But you are really only digging through this massive, and I have to imagine that the database for the content filtering and the content management side is quite large.
But you're not digging through all the existing data.
You're digging through the metadata to discover that kind of information.
And if you have to then run to the data itself to do digging through, that same database tells you exactly where it is because that's part of the metadata.
So that discovery function is going to be a far quicker process. And certainly when you bring in lawyers to do e-discovery,
if they don't have to dig through every ounce of data
that sits within your environment,
then the rapidity at which they can access that information
would reduce the overall e-discovery costs by some factor of major numbers.
Oh, definitely.
Especially if companies are going to outside counsel or other parties to leverage those services.
I mean, that can get real costly real fast.
So having a mechanism to really quickly go in, find what you need,
and then if needs be, even put it under a legal hold to make sure that, you know,
let's say it is backup data, you don't want the retention to expire on that. So I can actually
grab all the data specific to that search and put it into its own special bucket, if you will.
You know, gone are the days where it's like, you know, a legal hold would come in and it's like, okay, well, put all the tapes in a box and let's buy some new tapes.
I don't need to throw the baby out with the bathwater.
If I've got like five files on, you know, a multi-terabyte tape, yeah, I can just grab those files
and I've still got the full chain of custody
and say, okay, well, yeah,
so those are the original files
that came from here and here and here and here.
And here's just those files.
So it's pretty granular.
So Matthew, many of the analytics solutions
that I'm aware of are focused on,
I'll call it savings of storage and stuff like that by
finding redundancy or finding files that no longer matter or just making sure that you
have the files that do matter in the proper context and the proper backup sets.
And if you're migrating from one site to another, you only have to migrate some of that stuff.
Does Activate do those sorts of things as well yeah that's a part of what we call file storage optimization so that's um you know looking at uh data duplicates you know maybe even it can
even kind of run some some what-ifs around archival policy.
So it's like, okay, well, let's look at this data set.
You know, when's it last been accessed?
When has it last been touched?
Who owns it?
How old is it?
You know, how big is it?
But then also, to your point, you know, how many copies of it do I have?
And we've got a number of customers that are using those analytics specifically for that.
We've got one customer, they're trying to reduce
their overall data footprint from over 20 petabytes
down to about half that.
This is formidable, this is serious money here
we're talking about.
Yeah, that's not a small chunk of change there
in terms of just the costs associated with it.
So I look at it as, you know, if you have better data management policies and the analytics can help you with that, that actually translates into overall better storage management and cloud management.
Because now you're not now you've got insight to understand well you know do i need this
over here do i need that over there maybe this should be tiered over the into this area or maybe
this should be just removed entirely so you know it gives you better control over those different
aspects of your storage of your cloud of your your living environment so that you can manage it better. It occurs to me that if Commvault is able to see where your data is located, it might
also help to prevent some of the shadow IT from taking place.
Somebody opens up a rogue cluster on GCP and Commvault is aware of it because it sees those data flows, it might be able to
let audit understand that, hey, you've got this bill coming in from GCP for data storage
and compute resources, et cetera.
There's a potential cost savings just by virtue of having that system in place as well providing
that visibility oh definitely yeah and and you know kind of like i was saying before you can
break it down even by like departmental ownership by individual ownerships by groups
so you could actually start building a you know a cost model for who's consuming what and which
resources are growing and at which pace or shrinking kind of thing. So yeah, you could
very definitely use that to kind of help with the financial aspect of IT as well.
So Matt, do you have any last questions for Matthew before we close? No, not at the moment, but I'm absolutely interested in learning more as time goes on.
Okay.
Matthew, anything you'd like to say to our listening audience before we close?
Sure, yeah.
Well, first of all, thanks for tuning in, everyone.
There are a number of real quick YouTube videos out there on the Commvault channel that walk through
at a high level what file storage optimization, data governance, and e-discovering compliance
are capable of. So if you do want to learn a little bit more, you can go there or reach out
to us on Twitter, LinkedIn, or right off the website and let's have a conversation. I'm always
open to help educate people.
All right. Well, this has been great. Thank you very much, Matthew, for being on our show today.
Thanks for having me, Ray.
And thanks again to Commvault for sponsoring this podcast.
That's it for now. Bye, Matt.
And bye, Matthew.
Take care.
Until next time.
Next time, we will talk to another system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the
word out. Thank you.