PurePerformance - Managing Performance on Pivotal Cloud Foundry with Jimmy Stewart of Kroger
Episode Date: January 30, 2019Jimmy Stewart of Kroger, along with Michael Timmers and Kamala Dasika from Pivotal Cloud Foundry, discuss Kroger’s migration to PCF and how they tackle monitoring with the Dynatrace Bosh Agent...
Transcript
Discussion (0)
Coming to you from Dynatrace Perform in Las Vegas, it's Pure Performance!
Hello everybody and welcome back to Dynatrace Perform 2019.
This is Brian Wilson with Pure Performance,
and we have Mark and James of Perfbytes with us once again.
Pure Perfbytes Performance.
Absolutely.
Dynatrace Perform Performance.
Absolutely, and today we've got, well, today we've been talking to a lot of people today,
but right now we're talking with some people from Pivotal, Cloud Foundry, and Kroger.
Awesome.
So why don't you all go ahead and introduce yourselves for us, give a little short background, and we'll dive in.
My name is Mike Dimmers. I'm a platform architect with Pivotal.
I am based in Cincinnati, Ohio, been with Pivotal for three years,
and my main role at this point is as a resident architect on-site at Kroger.
Awesome. great.
I'm Kamala Dasika from Pivotal as well,
and I manage the product marketing
for our cloud platform, Pivotal Cloud Foundry,
as well as our technology integrations
with folks like yourselves, from Dynatrace.
Dynatrace, I've never heard of them.
They have a solution or something?
Yeah, maybe. That's what I heard.
That's me, we should talk to them.
They're great.
And I have to do this because me being Brian Wilson,
I have to introduce a special guest coming in from Hollywood, California.
You know it.
The one, the only Jimmy Stewart, ladies and gentlemen.
Maybe the only one left alive.
Who knows?
But my name is Jimmy Stewart.
I am the Enterprise Monitoring Manager.
I've been with Kroger for three years.
I've been in the performance management space for probably 10 and seen all the products all the way back through
Wiley Interscope and love seeing all the evolutions. Good time to be in the game.
And just so people understand what Kroger, right? Kroger owns a lot of, you might not know your
supermarket is part of the Kroger family, but you have a lot of different properties, right?
Oh, absolutely. So Kroger likes to keep that local grocery store feel.
So when we merge with other entities like Ralph's or Fred Meyer or entities like that.
Harris Teeter.
Harris Teeter, exactly.
So we like to keep that local grocery store feel, but like having the scale of Kroger.
So Kroger, even if you might not have a Kroger in your neighborhood,
you might have a Kroger family store in your neighborhood.
Right, right, right.
I grew up shopping at Kroger, actually. Yeah, there you go. Kroger in your neighborhood, you might have a Kroger family store in your neighborhood. Right, right. I grew up shopping at Kroger, actually.
Yeah, there you go.
Kroger proper.
It's amazing how many people tell us that their first job was with Kroger, bagging groceries.
I mean, it's...
I worked for a Pathmark.
Yeah.
Yeah, I was bagging at a Super Value.
Yeah.
Sorry.
But now we're all in technology.
Right, right.
We made it here.
Steps forward.
Good.
So you're working with PCF, or Pivotal Cloud Foundry, right? Yes. You've. Steps forward. You did. Good.
So you're working with PCF, or Pivotal Cloud Foundry, right?
Yes.
You've got a performance practice, a monitoring practice.
Can you tell us a little bit about it?
What have you been putting in?
Maybe why did you pick Pivotal?
Sure.
What did all that come about?
Well, we wanted to get into the 12-factor cloud app game, right?
Yep.
So the easiest way to do that on premise was to use a product like Pivotal
Cloud Foundry for our internal cloud. Once you get that elastic scale, you have apps spinning up,
spinning down, you want to give that self-service so people can use it just like they would an
external cloud process. So when you're a monitoring team, you might not essentially be told when
applications go out prior to being.
So it had a bit of a challenge.
You give your app teams this flexibility, which is great.
But then you also, after the fact, get told, well, why weren't you monitoring my application?
Well, you're moving fast, and that was good, but you probably should have told somebody, too.
Right.
So one thing that Pivotal Cloud Foundry did for us is allowed us to use one of the Bosch tiles to inject Dynatrace into each of the foundations.
Oh, so you did it through Bosch.
Yes, absolutely.
Awesome.
Absolutely.
So when apps would spin up, they would automatically pop into the Dynatrace product.
Nice.
So when we have more static proxies and applications pop up, you see the tracing through Dynatrace all the way down through the database.
You'd see any kind of Reddish caching
or any other appliances or services that we had
without really having to do anything at all.
It was pretty great.
And then you also get the add-in.
So for anybody listening, the difference being,
if you think about there's a PaaS agent
which you would actually have to put it into every Diego cell every kind of piece that's running manually right
but if you use the Bosch agent add-on it's automatically injected into every
cell but additionally it's giving you probably some insight into your PCF
infrastructure oh absolutely there is a plug-in into Pivotal Cloud Foundry which
will allow you to see the performance
of the Go router, the Doppler components, and all the things that make Pivotal run,
right?
So you'll find maybe on high volume days, your Doppler is capping out on CPU and like,
okay, maybe we need to add some processes to that.
Helps you get ahead of things before they're customer impacting.
And of course, when you're using Dynatrace user management you also see how
things like that impact the the user performance. So if something's you know
at 98% CPU, hey you're not wasting any CPU that's great. If it's not impacting
your user but you might want to also take a look at your scaling settings to
make sure you're given enough horsepower. Excellent, excellent.
And you had helped, I'm sorry.
So, Mike, you had helped.
Did you help with the implementation of this at Kroger,
or what was your role with the Kroger side of things here?
Yeah, so Kroger has been working with Pivotal for many years now,
so one of the earliest users at scale.
So what I've been doing recently is helping work with the platform team
to make sure components like the Dynatrace Bosch add-on
work well and to expand and grow their usage
as their teams, as they move adoption forward.
So today there are well over 7,000, 8,000 application instances running across 13 foundations,
which is a really sizable installation.
So then when you actually go into the Dynatrace dashboard and see that all of those foundations
are automatically lit up and every application deployed to each of those foundations is available
to go deep dive on, it's really impressive.
And does this include everything from sort of back office operations, supporting
all the stores as well as the web and basically every system in the place?
So it's sort of interesting.
I started with the digital program, which was probably the easiest because it was all
getting built from scratch.
You didn't have a whole lot of tech debt that you were having to do.
But before I came to Kroger, I didn't realize the breadth of Kroger.
You're talking about manufacturing
for the ice creams that you love so much.
You're talking about point of sale.
You're talking about pharmacy.
You're talking about a lot of different systems
that are sort of along that line of maturity,
a different space.
So there definitely is a push to move
to more of the cloud-ready apps.
So the 12 factor, and I think that's partially why
use of Pivotal has exploded like it has.
People are given the tools, they're getting seed projects,
and they can just immediately inject them.
So adoption, of course, is varied across the enterprise.
Some are better than others.
We have productivity tools that actually are
for store associates that are now in Pivotal.
And we're looking to continue that trend.
Great, great.
Well, actually, I have a question for you, Jim.
Oh, yeah.
So what did you guys used to do before?
So, yeah, I mean, it's interesting.
We have a new co-host.
How terrible was it?
No, it's a good question.
How much pain were you in?
Well, so you go through the migrations and things.
I mean, you're talking about bare metal in the beginning, right?
So the time it takes to actually get new hardware procured,
I mean, we're talking about months, right?
Because you've got a wire, you've got a rack and stack and power,
and then you move to, like, VMs, which, you know, in theory should be great.
And it was.
It was much, much faster.
But still, you're probably talking about a couple weeks to get a VM stood up.
You're getting better.
But then with Cloud Foundry, I mean, we have the ability if we're running a promotion,
if we're giving away, you know, a free bacon-wrapped beer and suddenly the website blows up, right?
You have the ability to scale on demand, right? Assuming that you have the back-end resources, and we make sure we do,
that we can handle that kind of load.
And it's really been good for customer experience.
I know you were talking about problems that we had at one time.
You talk about you have a podcast that speaks on that as well.
One time we made the mistake of sort of putting a teaser out there
that we were going to put a big load of detergent on the website
and we were giving it away for free.
And that would happen sometime before noon.
So people would go on the site.
Sometime before noon.
What could go wrong?
And people just hammered F5 until the website went down to its knees.
Wow.
We never did that again.
We got a little bit smarter with that.
On News of the Dam, we call that the pattern of free.
Well, you know, the wonderful thing about being at Kroger is
we do often give away a free coupon on Friday.
So we do get that.
We call it the free Friday deal.
And so we get that Black Friday type load every Friday.
So we get to test how our scaling is working, making sure, finding what our new bottleneck is,
and pushing that performance as fast as we can.
That's fascinating.
You know, I almost, I have to admit, for a half second I zoned out when you said bacon wrapped beer.
I'm glad I was able to pull myself back in.
Your name is Brian Wilson,
you're at Las Vegas Diner Trace Perform 2019.
Oh, okay.
Just bring it.
Whoa, whoa.
Yeah, there are definitely certain items
that drive the website,
so we like to know the night before
what they're going to put out there,
so we know what DEF CON we need to set the scale at.
This actually is a really important part,
because a lot of people think, you know, DevOps, right?
And there could be arguments all over the place
about what's the proper name and all.
And now you have people saying biz DevOps
and people are like, oh, we don't like adding the name.
But that's a perfect example of whether or not
you're going to call out that name.
Right.
Including business into the entire ecosystem
of this information helps them know
what sort of capacity they can hold
so that they also start thinking,
hey, we're going to give away some bacon rat beer.
We better alert this team.
That's going to be probably really popular.
And then you can come back and say, well, that's great.
We can handle 3X, the biggest one we've ever had, so we should be good.
So this is super interesting and a great point.
So we send out an email to people with the free Friday offer in the middle of the night.
And just the way that people interact with their phones and other things, And a great point. So we send out an email to people with the free Friday offer in the middle of the night. Yeah.
And just the way that people interact with their phones and other things, when they get up, they check their email, they go to the site.
Yeah.
Well, we introduced SMS push messages.
Yeah, yeah.
So people get a text message.
They treat that differently than an email.
They see it.
They click it.
They're to the site immediately.
Yeah.
We would see traffic to the site immediately hockey stick, right?
I mean.
And they're like, okay, that's cool. Hey, guys, let's send a push notification to the site immediately hockey stick, right? I mean, and they're like, okay, that's cool.
Hey, guys, let's send a push notification to the app.
That was even faster.
And just the amount of people that you send all at once versus the gradual load you see through an email,
it's just, one, it's interesting to see how people interact with their devices,
but also the power that marketing has to drive traffic to your website.
Can you describe kind of the architecture for the elasticity then that to handle these things or is it much more proactive on a Friday?
It's not proactive. We set a minimum and maximum for the number of instances that we can have.
So we're very much broken into many different micro services. So different actions on the website
expand and contract different things.
So we have static proxies on the front
and it goes into our cloud foundry,
pivotal cloud foundry environment.
And it'll expand what we need
and then keep down what we do.
The wonderful thing is, is sort of like
if you're thinking back to old AIX days,
you can flex up what you need
to meet that demand for certain things.
Say you need more capacity to handle click list orders
later in the day versus coupon
traffic, you expand and contract to meet that need.
Yeah.
Yeah, and in practice, it turns out Kroger is really popular with our product managers
because they are probably the largest user of the auto scaling capability of the platform.
Yeah, can you describe that a little bit more, how that actually works?
Is there any integration with Dynatrace around that?
That's not a specific Dynatrace integration.
Right.
It's a definitely a pivotal,
it's a Cloud Foundry function.
And what it does is when you do push an app to a foundation,
you can specify a minimum number of
instances and a maximum number of instances.
And so as traffic comes in
and you can select from a couple of different metrics.
So it can be by HTTP latency or memory usage, those types of things.
And then every one of the apps that Kroger is deploying today from their digital customer program is auto-wired with the autoscaler.
Okay, cool.
So out of the box, they know they're running in two data centers with all their applications, autoscalant capable, and able to respond really well to this Friday traffic.
Geographically as well?
So dispersed, or is this mainly in the?
We're not nearly as diverse with our locations,
but that is definitely something that we want to do,
is send people to the closest data center.
So that's more of an aspirational thing.
Yeah, sure, no, that makes sense.
And easier to do within a pivotal PCF architecture.
Absolutely.
We talk about this a lot, treating your instances like cattle instead of pets, right?
I mean, I've heard you guys talk about it before.
And having the ability to put down a bad instance that's causing a bad customer experience and spin one back up.
Yeah.
That's what you do.
No matter how much it hurts you.
You love them all.
You love them all, but still.
But, you know, we use Ansible, too,
and you can trigger things from Dynatrace
that will make Ansible do things
that will help you auto-remediate.
And Pivotal's a great target for those sorts of actions.
Yeah, that's kind of what I was thinking about as well.
To Dynatrace's credit, I think you guys were one of the first
few monitoring platforms
to really recognize these broader trends, like you were talking about, Jimmy, with microservices
and some of the continuous delivery and auto-scaling and these kinds of environments and architectures.
You guys completely replatformed from Appmon to Dynatrace in order to support these kinds
of architectures.
To your credit, you also deployed it as a Bosch add-on, is actually the perfect way yeah to integrate with uh with pivotal cloud foundry that add-on was in beta
forever yeah well i mean yeah you don't you're you're getting into every piece of pcf you don't
want to you don't want to get that wrong right that's correct but there were some bumps in the
road but it's all it's it's really really awesome and I think I saw there's some new, I don't know if they've been publicly put out,
but I know we have some additional components of the PCF Foundation
that we're going to be adding more insight into.
I've seen some whispers of some of that, so that's going to be really cool.
But to your point, though, one of the things we've been talking to,
so I was leading a class on the hot days on Monday for Dynatrace for Atman users.
And one of the things we kept on hammering is like, yeah, we did have to replatform because we knew we could not take the product we built in 2005 and use it in this thing.
And that's similar to what most customers are seeing, too, with their own applications because they have an application that might have been around for a long time you might
do a lift and shift just to get your feet
in the cloud but that's not going to
survive in the cloud and they have to make those
same changes and transitions as well
and then with all the openness
you all have the APIs
we were talking to Neotis, they have APIs
we have all our APIs, there's this new
collaboration of data sharing
that all these tools are doing, which really
makes a lot of really cool stuff happen that maybe
none of us originally planned for.
So it's a really cool thing.
Great. I like that.
What's on the horizon for you? What are your next big
steps this year and next?
Well, digital's always moving at the speed of digital.
And we're continuing to disrupt things.
We have great partnerships with some of
our autonomous fulfillment centers, which will need a lot of IoT-type monitoring.
We have self-driving cars now, so we are living in the future.
Does Kroger have self-driving cars?
Well, it's a partnership with a company called Neuro.
So in Scottsdale, Arizona, they are doing self-driving grocery deliveries.
It's pretty amazing.
I also saw the new little handheld checkout items where they're experimenting in stores where you can scan and pay.
I think we're in 800 stores now.
That's called Scan Bag Go.
Yeah.
And you go around and you scan your items as you go and you hit the checkout at the end and pay your bill and you're on your way.
I've tried that.
It's very cool.
It's new things coming all the time.
And new things to monitor all the time.
So it's nice to have a flexible thing.
And, of course, we're trying to modernize some of the stuff that's a little further along in the maturity spectrum.
So we will have no lack for work, I'm sure of that.
So I can get you to commit to how many session replay units right now?
Listen, Brian Wilson.
In beta at the end of February.
Another long beta. Long beta, yes. That's right. It's going to be worth the end of February. Another long beta.
Long beta, yes.
That's right.
It's going to be worth the wait, though.
It's going to be great.
How has the conference been for you all?
I know you guys have been at the booth and all.
Oh, yeah.
The booth's been great.
And I think I particularly liked some of the sessions that I've been to.
And I think we've had, obviously, the session by Kroger.
And we also had another joint customer of ours,
Humana, speak just a few minutes ago, which was perfect as well.
That's great.
I'll say what I've taken away from it is the decision about the right tool for the job
has never been muddier.
So opening up the AI engine for custom metrics, things that you would have used a time series
database for, now you have the Dynatrace AI in front of for telemetry.
You're like, okay, well, maybe the right tool for the job is actually Dynatrace AI in front of for telemetry. You're like, okay, well maybe the right tool for the job
is actually Dynatrace and not something like Prometheus.
And the query language, right?
Oh, absolutely.
Which, if you use the API, you know that.
But now it's much easier to use
and make that a standardized way of interacting with the data.
Yeah, it makes you think.
Cool. Awesome.
All right, well, thank you all for coming by.
Thanks for having us.
It was a pleasure talking to you.
Thank you guys very much.
Thank you for having us.
Enjoy the rest of the show.