Grey Beards on Systems - 67: GreyBeards talk infrastructure monitoring with James Holden, Sr. Prod. Mgr. NetApp
Episode Date: July 26, 2018Sponsored by: Howard and I first talked with James Holden, NetApp Senior Product Manager for OnCommand Insight and Cloud Insights, last month, at Storage Field Day 16 (SFD16) in Waltham, MA. At the... time, we thought it would be great to also have him on the show. James has been with the NetApp OnCommand Insight (OCI) … Continue reading "67: GreyBeards talk infrastructure monitoring with James Holden, Sr. Prod. Mgr. NetApp"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with...
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeard bloggers together with storage and system vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
This GreatBird on Storage podcast is brought to you today by NetApp.
It was recorded on July 20th, 2018.
We have with us here today James Holden, Senior Manager of Product Management at NetApp.
So James, why don't you tell us a little bit about yourself
and what's new in OnCommand Insight and Cloud Insight?
Sure. Thanks for having me.
So yeah, my name is James Holt. I've been with NetApp the last five years as part of the product
management team for OnCommand Insight. And in the last 12 months or so, we've also been building out
a new product called Cloud Insights. It's a very exciting time for us. Cloud Insights is going to
be a SaaS-only offering for performance monitoring of cloud and on-premises infrastructure technologies.
It's based on a lot of the IT that we've evolved over the last few years on OnCommand Insight, but really built for modern infrastructures and modern architectures.
So, yeah, thank you for having me.
Can you tell us a little bit about OnCommand Insight?
I'm just going to call it OCI for now, if that works.
Sure, yeah.
Yeah, so OCI has been a very popular tool within the NetApp estates
and for a lot of very big organizations globally.
It's used for a variety of reasons, monitoring, troubleshooting, cost and performance optimization,
and for really tying all the kind of infrastructure, inventory and performance data into the wider business processes within organizations.
So the largest banks use the tool set, but nine out of the top 10 Fortune 500 companies use it really to kind of get an
understanding of how their storage estates are behaving, performing, and making sure
that they're running as optimal as they can be.
So is OCI what Ray and I back before our beards turned gray called SRM, Storage Resource
Management?
Yeah, we fit it into that kind of category of tool set with OCI.
Yes.
It's a little different from that. It's not the management in the sense of it's creating volumes or LUNs out
there. It was the reporting and monitoring and troubleshooting that we really did or do.
And does it work with other storage vendor products as well as NetApp? Absolutely.
So we've got coverage of all the major storage platforms, Dell EMCs.
We've got everything from Symmetrix to an Xtreme.io, Isilon.
And we even go back to the older product sets, the DMXs, the Clarions of old.
Oh, Howard, you'd have support.
It's a reason to fire up my CX500.
I can manage it now.
There you go.
Absolutely.
Yeah.
So all the big storage vendors, IBM, HP, we've got coverage of those.
And some of the smaller, more recent introductions to the market, the Pure Storage, the Infinity
Darts, we've got support for those guys as well.
And then it also looks after the hypervisors.
Too frequently, I hear storage vendors say,
we support everything.
We support NetApp and EMC.
Yeah, there you go.
There are a couple other vendors out there.
Yeah, and the way that we've built on Command Insight
is that adding a new collector is just a patch.
So we prioritize patches on a customer request basis.
Customers ask for support of Infinidat,
we give them support of Infinidat. You haven't gotten so far as to allow vendors to create their
own connectors, have you? We haven't gone that far, but we do work with them. So obviously,
some of them are more advanced in their APIs and their access to the guts of their equipment than others. And we work with them on a case-by-case basis
to maintain support, especially currency,
if they're proxies.
What about the server operating systems and stuff like that?
I assume you support things like vSphere
and things of that nature.
Yeah, so kind of the critical piece about the way
that OCI operates is that we see the end-to-end relationship.
End-to-end? Application to the storage? So hyperscale, so Amazon, I don't understand.
We're trying to understand.
So when you say support for hyperscales like Amazon,
you're going to support like EC instances and things of that nature?
Yep.
So we'll discover the EC2 instances.
We'll discover the EBS storage at the back end,
the S3 storage.
We'll map the relationship between those.
Even if it's not NetApp storage in that environment?
Yeah.
Oh, my God.
So do I have to install a collector someplace as an AMI?
So the way that we operate, we're completely agentless in OCI.
We have data sources that reach out wherever APIs or CLIs they can do to
communicate with the end device that we're talking to. So we just need an IP address,
username, and password. So AWS has APIs that will let me see
what latency on an individual EBS instance is? You can run the OCI server on your on-premises environment, or you could run an OCI server
on maybe on an EC2 instance.
All we need is network connectivity to it.
So in the situation where it's completely firewalled off, what we can also do is put what we call a remote
acquisition unit out there that will then allow a HTTPS connection between the two and allow us to
take pushes of that data into OCI. I like this a lot. Oh God, yeah. Especially when you start thinking about an EC2, excuse me, an EBS insured IOPS instance where you pay six cents an IOPS per month for asking them to provision that many IOPS for you. application's actually using so that I can change that provisioned rate to be just a little bit more
than it asked for at the peak. And I'm not spending huge amounts of money to provide performance
that I'm not using. And it works like that with Azure as well?
It does. Yeah. And carrying on with that use case, it's a great one. We've got other ones that
really help organizations understand where they've got waste in their
infrastructure because like you say you're paying for this equipment you you want to get the money's
worth out of it yeah but even worse so in the public cloud where i haven't paid for it yet
and it's one thing in the data center where i wasted money because i bought too big an array
but but that money is already wasted and you telling me that i'm wasting that money just
makes me feel bad.
But in AWS, when you're telling me you're spending too much on provisioned IOPS, next month I can save money.
Almost in real time.
Yeah.
What we see is EC2 instances get spun up. The EBS volumes that get attached to them are created and they're occurring at quite a substantial amount of cost.
Storage costs in AWS are high.
It's a bigger part of the AWS bill at the end of the day.
Especially if you want performance.
Yeah, those instances that don't come cheap.
But when the VM then gets terminated,
any other than the default EBS volume,
they have the ones get stranded, just get left out there.
So AWS doesn't tell you they're out there.
It just charges you for them.
We can see those
and eliminate those from the infrastructure.
And I know in my AWS
infrastructure, I rarely
delete VMs. I just shut them down
because I might need them later.
And the storage stays forever.
I understand that logic.
That's the nice thing about storage.
So how does OCI compare with Cloud Insights and what is Cloud Insights?
Yeah, so Cloud Insights addresses a slightly different market space than OnCommand Insight.
OnCommand Insight was built for some of the largest organizations.
It's still applicable to the smaller estates but what we've
seen is that if you have got a smaller environment sometimes you just don't want the effort of having
to maintain on-premises infrastructure to monitor your environment it's a lot of organizations
are moving to the cloud cloud first policies so consuming a s SaaS offering is something that they can
really want to move to, they can handle, it's a lot less pressure for them. So we've come up
with Cloud Insights to fill that space. And where we've also slightly changing the,
differentiating between the two products is that Cloud Insights is built for monitoring
the real modern infrastructures
where it's not just a virtual machine out there
and backend storage.
It's a microservices architecture that's really highly changing.
Mesos, Kubernetes, all that stuff.
Exactly, exactly.
And there's different challenges that come in those environments.
These microservices come into existence and disappear very quickly.
Cloud instances come into existence and disappear very quickly.
And the performance metrics that you need to capture and gather
and all the interconnecting relationships,
because it's so transient and because there's now…
Yes, sampling every 15 minutes doesn't work
when the average life of a process is 15 seconds, does it?
Exactly, exactly.
So it's a new architecture in the back end of Cloud Insights
that's going to cope with that sort of situation.
But that runs in the cloud
where you're running these container environments
as well as on-prem?
I know it's a SaaS service, so I assume it runs in the cloud, but I don't know how you gather this information.
Yeah, so it is a SaaS service.
It's gathering from the on-premises and it gathers from the cloud environment as well.
It works in the same way that we have the remote acquisition unit for on-command insight.
We have an acquisition unit for cloud insight.
That only makes sense.
A little bit of lightweight code, you put it on a virtual machine.
That then gives you the access and the control and security
to push only the data that customers want to push.
Yeah, because otherwise I'd have to open 4 million holes in my firewall
to let you see everything you need to see.
Yeah, and that's just not practical.
There's other monitoring tools out there that go down that route, and you find that you are just making Swiss cheese of your network and your security.
So with the simple acquisition unit where it's a controlled, secure connection, all the credentials are stored on your own, customer's own environment, it makes it far more palatable
for the security team to allow this.
And it's an SAAS offering.
So what's the minimum commitment?
I run a very small data center.
Yeah.
Besides the Clarion and a few other systems.
But we don't have a,
I suppose there's a minimum commitment.
A managed unit is our smallest number that we can go down to a managed unit is a host or five terabytes of
storage either one of those so that's how we charge i think i got that much on my desktop
here not quite but close so the vmware container services and stuff like that,
I mean, VMware has got a couple of different solutions for containers.
Do you support all of them?
I guess it's Docker, it's PKS, Kubernetes,
and there's the VM container services.
Yeah, so we're building out support matrix
and kind of our data collectors at the moment.
So again,
the similar way that we do it for on command insight,
we're looking at the market demand and looking for customers are asking for,
and we build accordingly.
Right,
right,
right.
So at this point you have like the big guys,
Kubernetes,
Mesosphere,
and those sorts of things.
We have the Kubernetes.
Okay.
Okay. So it's a piece of
yeah it's not out yet um we're in a preview fashion moment so okay um we're actually publicly releasing in october this year yeah, I can imagine interesting problems that we have to address and things
like that. Today, you talk about an application and it's a collection of VMs that access a
collection of data repositories. In a containerized world, how you know that this container that only
appeared for 30 seconds belongs to that application is an interesting problem.
Yeah. Yeah, so that's, well, as a product manager, the coolest thing about working on SaaS offering
is how fast you can actually develop.
OCI, On-Command Insight, we did three to four releases per year, and that was our case.
With Cloud Insights, it's just a continual development cycle with new content drops happening weekly, daily, sometimes even hourly.
In the solution?
In the solution, yeah.
There's no upgrade anymore.
Well, in the SaaS offering, Ray.
Well, I understand, but it still is in my words, suicidal
from a perspective.
If you can do it, it's great,
but there's risk there.
I guess.
The world is changing.
Ah.
Yeah.
The real key for me is that if a new feature arrives kind of seamlessly.
I don't see any downtime with that feature appearing.
I've not done any upgrades.
I run a Cloud Insights instance that some of the NetApp field folk to actually see and play around with.
And I'm continually surprised and pleased to see new pieces being dropped into the product set.
Sometimes it's just little simple things like the way you can operate maybe a widget, a visualization in the tool.
Other things are fundamental changes, the way that user management works or the way that new data collectors can be added to the system.
So obviously the moment is very, very fast because we're ramping up to the public release.
Right, right.
There's no reason that those new features can kind of slow down as we hit that.
It is a brave new world of new cool stuff every day. Is it a public preview?
Is that the right statement and right term? I'm trying to understand
the way this all works here. We call it
preview. The way that people can actually register
for a preview is visit cloud.netapp.com
and there's all the cloud services that NetApp offers out there,
from cloud volumes to SaaS backup.
And Cloud Insights is one of those services.
At the moment, it's a registration form that people can fill in.
And then someone from either the product team
or from our engineering group will be in contact
and we'll help people set their environment up
and get them running on Cloud Insights.
So Cloud Insights is more targeted to more modern application environments
and OCI is more of a classic traditional enterprise application
with Hypervisor.
Is that how you'd state the two different solutions there? Yeah, and I'd just kind of add to
that as well that the cloud insights
is still applicable for
all your on-premises environment and the cloud infrastructure.
It's aimed at the person or the
IT or the operations team
that doesn't want to have to maintain their own on-premises infrastructure
that are looking for something that isn't as capable as OCI.
There's a humongous amount of features that have gone into OCI.
It's 12 years in development.
So all the capacity reporting, chargeback,
all the kind of integrations
connected to service.
Now, maybe those aren't as applicable.
Obviously, there's a difference in price point.
Cloud Insights is where managing,
maintaining,
and it's not got all the features
that OnCommand Insight has got.
So it's a cheaper price point as well.
And if you are actually got NetApp storage,
and this is any NetApp storage,
there's going to be an addition that's purely for those
that is actually free.
Oh, excellent.
So a Cloud Insight solution, which is free?
For NetApp storage, yes.
So if you've got an on-tap device, even if it's 7-mode or it's an E-series or FlashFaz, HCI solution, whatever it may be,
they will have the ability to feed that data into Cloud Insights and get seven days of performance. I must admit that one of my big takeaways from the session at Storage Field Day
was that I need to learn more about ServiceNow
because you talked a lot about ServiceNow integration.
It's been a long time since I've installed a system like that.
Well, James was talking a lot about how OCI integrates into ServiceNow, and I keep hearing about ServiceNow, so it's just hit my have-to-think-about-it level.
And then we talked about chargeback, which I have always hated, just as a concept.
As a concept, I've always...
Well, I mean, there's certain places where you need to have something like chargeback, right?
If you're a service provider, you need it.
You are.
In the past, if you're an IT department, it's led to more resentment than actual usefulness, in my experience.
Advantage.
However, when we're now running IT departments
all competing with public cloud provider,
then we need to be able to do that.
We need to be able to,
even if the billing actually doesn't happen that way,
we need to be able to say to the CMO,
we provided this service
and you could have gotten that as SAAS and it would have cost you this much more.
Right, right.
As a comparison.
The impact of public cloud, which of course is BuildBack, has got me rethinking the whole process.
Yeah.
It used to be, and given its bad name, it was Shameback.
It was a Shameback report that people used to try and produce.
But it's now more of a case of a proof point.
This is my costs running on-premises.
This is how much it's costing in the cloud.
There's an often substantial difference.
Yes.
Yeah.
All right, gents.
So, Howard, any last questions for James?
No, I think we got it.
James, is there anything you'd like to say to our listening audience before we sign off?
Thank you for listening. And please do visit cloud.netapp.com and register for a preview of Cloud Insight. It'd be great to see you on board.
Is there a handle where our listeners can abuse you on Twitter or other social services?
Yeah.
I'll put that in the post, if you will. How's that?
If you can send it to me, James. All right. Well, this has been great. Thank you very much,
James, for being on our show today. Yeah, thank you for having me.
And thanks to NetApp for sponsoring this podcast. Next month, we'll talk to another systems storage technology person. Any questions you want us to ask, please let us know.
And if you enjoy our podcast,
tell your friends about it.
Please review us on iTunes
as this will also help get the word out.
That's it for now.
Bye, Howard.
Bye, Ray.
Bye, James.
Bye, guys.
Until next time.