PurePerformance - 056 The State of Monitoring in a Kubernetes World with Brian Gracely
Episode Date: February 26, 2018New to Kubernetes? Already a pro? In both cases, tune in to this episode, as we have something for both sides of the aisle.Kubernetes seems to have won the container orchestration game. Major cloud an...d PaaS vendors are supporting Kubernetes, and attendance at KubeCon in Dec 2017 skyrocketed. Today we chat with Brian Gracely ( https://twitter.com/bgracely ), Director of Strategy at Red Hat. Brian also co-hosts @PodCTL ( https://twitter.com/PodCTL ) – a podcast dedicated to containers, OpenShift, Kubernetes, and Cloud Native. In our chat we learn where and what Kubernetes is right now, where its heading (e.g: providing better onboard experience with developers, more APIs …), why we have to pay attention to Service Mesh ( http://philcalcado.com/2017/08/03/pattern_service_mesh.html ), and why it is important to have a good cross technology monitoring strategy that supports both your brown field legacy services as well as the green field cloud native. We also enlighten you about what the BWI (Brian Wilson Indicator) is!
Transcript
Discussion (0)
It's time for Pure Performance.
Get your stopwatches ready.
It's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello, Andy Grabner. This is Brian Wilson. How are you doing today?
I'm good. I survived the first snowstorm of the season in Boston.
Well, good luck. I mean, you know what? I'm just going to say right now, I can't think very well. I'm not feeling good, everybody. So I'm going to make any kind of – I'm going to use that for any excuse that I can today, including my flubbed reaction to your snowstorm.
You know, we here in Denver haven't really had any snow yet.
People think it's really cold in Denver, but it's not.
Yeah.
Well, I also wouldn't really call it a storm.
Just because it snowed for a couple of hours doesn't make it a real storm.
The snow is almost gone.
Well, I'm sure people reacted like it was a storm they ran to their grocery to get you know the grocery store to get their their milks their milk and bread and eggs and stuff right i hope you got there was the number
one topic on the news for two days right it gave them something to report on other than what they
report on anyway all the time yes so that that was good so brian i mean the good news is that
we have another brian today we do it. It won't be confusing at all.
No, not at all. And I would like, if you're good with it, I just go ahead and introduce our guest.
I think that would be appropriate.
Are you ready?
Yes.
Awesome. So when I'm flying around, I typically try to download podcasts and listen to things and kind of get up to speed on things.
And one podcast I ran into is actually run by two guys, and we got one of them today, Brian Graceley.
And Brian runs – actually, before I – instead of me explaining it, Brian, are you there with us?
And maybe you want to introduce yourself and tell the world who you are?
Yeah. Hey, Andy. Hey, other Brian.
We'll figure out how to differentiate that.
You can just call me other Brian.
There you go.
So my name is Brian Grace Lee.
I'm director of product strategy at Red Hat.
And I assume the podcast that you were listening to
is one that we just started maybe about four or five months ago.
It's called PodCTL, P-O-D-C-T-L.
And it's mostly focused on this new technology called Kubernetes.
Well, I guess it's been around for a couple of years, but Kubernetes and also just kind of containers in general.
So the whole ecosystem around what's going on with containers and container scheduling and microservices.
Great.
Cool, yeah.
I actually listened to that podcast.
I listened to a couple of your episodes and then I thought,
you know what,
it would be really cool to get you guys on our podcast because we've been,
I mean,
we've,
we focus a lot on performance engineering on monitoring.
Obviously we talk a lot about some of the new things that come,
whether it is around the technology aspect of things that people build, the cloud native apps.
We also talk a lot about process change when it comes to everything that happens around the DevOps movement.
And I thought it would be great to get you guys on board.
And I know that your colleague, Tyler, couldn't make it.
But you guys have been at a large conference last week, KubeCon, I believe, which is, I assume, the number one go-to conference if people want to learn about what's happening in the Kubernetes sphere.
Is that correct?
Yeah, definitely.
It was – so KubeCon – so I think officially it's called like Cloud Native Con and KubeCon.
So it's run by the CNCF, which is a Linux foundation group.
So last year they sort of started it
and made it formal. It was like a thousand people out in Seattle. And then this, this year, a year
later in Austin was like 4,500 people. So yeah, it's definitely kind of become the, um, you know,
aside from, you know, the big events like, like reinvent or, or something else, it's really kind
of become the big place to go talk about Kubernetes and all this container stuff.
You know, you know, it's funny. I just realized that when you're saying QCon, It's really kind of become the big place to go talk about Kubernetes and all this container stuff.
You know what's funny?
I just realized that when you're saying QCon, you're not saying QCon, like Q, the letter Q, because there was another one.
I don't know if they're still around.
Is QCon with the Q still around?
It is still around.
Yeah, it is still around. I think it's more Java developer focused.
Yeah, I guess it's a little confusing.
Yeah, awesome.
Anyway.
So Brian, Grace Lee now. more java developer focused yeah i guess it's a little confusing yeah awesome anyway so so so brian uh gracely now actually by the way i just looked at your twitter uh page and i assume your
the picture that is on there it's on purpose that it doesn't say brian gracely but but brain gracely
oh yeah i actually went to one of our we i – I was giving a keynote at one of our events, one of the Red Hat events, but it was in Turkey.
And I assume somebody must have put my name in the prompter, and the prompter probably went, oh, that's a typo.
You spelled brain wrong.
So it's kind of a fun joke that you go somewhere, they spell your name wrong.
Yeah.
So coming back to KubeCon last week, what happened last week?
What are the big things that are happening in the Kubernetes sphere and the OpenShift sphere when it comes to people talking about CNCF?
Any highlights that you want to – maybe?
Yeah. I mean, I think the big thing, so, you know, let's say a year ago, just to kind of put it in perspective, we were all talking about, you know, where's Kubernetes going and like, how do we make it stable? How do we make it grow? You know, how do we add new features at sort of the container scheduler level? So like, how do we do stateful applications or how do we do batch jobs? And so there was a lot of stuff going on a year ago about, you know, how do you make this container orchestrator stable and support more applications and so forth?
And there still was a lot of debate in the industry of, you know, is Kubernetes really kind of the best technology?
You know, Docker had their own version of a technology called Swarm.
They still do.
There was some technology called Mesos that had spun out of Twitter. And so there was still kind
of a lot of debate in the industry a year ago of, you know, will one of these sort of standards or
implementations kind of win? This year, that discussion's kind of gone away. Pretty much
every major vendor is now supporting Kubernetes. Every major cloud
provider is now supporting Kubernetes. And so the discussion really kind of shifted from,
you know, what can Kubernetes do at the lower level container scheduler to really a lot of
discussions about, you know, how do we make it easier to get applications onto Kubernetes?
Are there frameworks now that are going to be more kind of Kubernetes native
to help developers, you know, understand these constructs of availability and scale out and
stuff. And so there was a lot of discussions about service meshes, a lot of discussions about
kind of new developer frameworks to make it easier. So definitely a shift in the discussion
from lower level container stuff to much more kind of developer productivity,
developer tools and stuff like that.
And is the discussion around companies,
vendors are building stuff on top of Kubernetes
or is Kubernetes also getting
some built-in key functionality
like a better orchestration?
I don't know.
So you mentioned service mesh.
Is that getting into Kubernetes
or is this an option for vendors to provide services on top of maybe a better platform that Kubernetes
is going to support, provide? Yeah, I think the answer is a little bit of both. So, you know,
there's always a discussion about, you know, do you make something a native service in
Kubernetes? So like, like, for example, Kubernetes has, you know, native services to do, you know,
inbound and outbound routing, service discovery, things like that. And then there's, you know,
then there's discussion of like, okay, once you start getting into developers, how much of it
should be kind of abstracted. So to break it down, I guess, one of the things that
Kubernetes has done over the last couple of releases is it's always had this mechanism to
where, you know, if you want to do default scheduling of applications, that's sort of
built into Kubernetes. And then you started to have a lot of different use cases that would
come along. Some of them would be like vertical industry specific things or, you know, stuff that was kind of out of the mainstream type of job. So,
you know, batch jobs, long running jobs, short running jobs. So they've been working very hard
at allowing you to build sort of custom controllers. They have a concept called
custom resource definitions that allow you to say, okay, you know, I have some special,
unique kind of workloads and we may want to build those specific to say, okay, you know, I have some special, unique kind of
workloads. And we may want to build those specific to say our industries, maybe it's a healthcare
thing, or it's a, an oil and gas or financial services type of thing. So, so that's, that's
one of those things where it's like, it's kind of built into Kubernetes, but the actual implementation,
you know, would, would probably be vertical specific. So you see some
of those things happening. You see a lot of people that are, you know, you see a lot of projects that
are now emerging to say, okay, you know, in the past, you just kind of gave Kubernetes some
containers, but, you know, the development experience wasn't much more sophisticated than
that. Now we're beginning to see some frameworks. Um, so some, some new ways of doing packaging of applications. So there's a
project called helm. Um, there's some higher level constructs of, you know, how do we, you know,
essentially try and shape what the developer kind of desktop experience looks like. And so you see
some projects called like a draft. Microsoft has something called draft. Um, there's a few others. And then, um, you start to get into some
stuff like people are kind of fascinated with this phenomenon of serverless, right? That, that AWS
started with Lambda and they're trying to figure out, okay, can we bring that same concept of,
you know, very short running functions, um, very, very, you know, scale down on what the developer
writes. And so there's a whole discussion happening about, you know, what should we do for
serverless? Should serverless be a feature of Kubernetes? Should people build frameworks,
say, on top of that custom resource definition? And so, you know, there's definitely still some
gray areas between is something a feature of Kubernetes and does it just take package them up in a container,
that means Kubernetes makes sure that at the scheduled time, there's enough container instances running of that service.
They also know how to talk to each other because I assume Kubernetes does the service registry and the service brokerage.
So that's there.
In terms of orchestration based on load, is that also already built in?
That means scale up in case of load, failover in case of certain services start producing
error results, then shift traffic over to other instances.
Is this already available or is this
something that you said is now coming? Yeah. So that's a, that's a really good question.
There's actually a lot of pieces in there. So, um, so from purely the perspective of
if I have an application and it needs to scale in some way, um, up till now, Kubernetes has
sort of allowed you to scale that based on like CPU thresholds and to a certain extent, like memory thresholds from a node, a worker node.
And you could set some thresholds based on, say, like percentage usage or something. eight begins to have at 1.9 and so forth, which are things that'll be available, you know, here
in the springtime, starts to get much more granular about the type of metrics that you could use,
or, you know, characteristics that you could use to scale it up. So right now, it's, it's pretty
simple, but it's, it's expected to get much more granular. So, so that'll help, that'll help both,
I guess, developers as well as operators deal with, you know, let's say scale of an application,
but also, you know, dealing with things like DDoS attacks or, you know, security threats.
So that's one area that's focused a lot.
So, you know, the other thing, and you sort of touched on this early on, so, you know, developer rights and application.
One of the big areas that the Kubernetes community is starting to kind of realize is that that sort of,
you know, you write your application as one thing. So you write something in Java or, you know,
Go or whatever you write it in. But then you kind of have to go through this hand crafting exercise
of writing a bunch of YAML, sort of a description file or a manifest file that says, you know,
here's what I want to happen to the application.
You know, make sure it always has three instances running, you know, put a load balancer in
front of it, you know, kind of descriptive stuff.
And there's a big focus right now in the Kubernetes community to say, yeah, you can do things
in YAML, but maybe that's not the best way for our community to adopt a lot of developers.
Maybe they don't want to deal with all that stuff. So we are beginning to see some, you know, developer-centric tools that say,
hey, let's hide a lot of that YAML stuff.
Let's simplify what the user experiences look like.
And so we're seeing some of that too.
And those things are really kind of outside of the Kubernetes scope
in terms of, you know, dealing specifically.
But, you know, again, getting to
that beyond the first early adopters into more mainstream, how do we make it simpler for them?
So a little bit of both of those things are happening. Yeah. And I wanted to, you know,
going back to the idea of the auto scaling and moving beyond metrics like CPU and memory,
I was thinking that initially thinking, well, hey, if we see the service
response time degrading, that might be a good one to feed in.
But then as I was thinking about it, as you were talking, I'm like, well, that can be
really dangerous too, because I'm bringing this up in terms of saying like feeding metrics
into saying when to scale can be a very complicated concept right because if you
have a service that's slowing down in response time you don't necessarily want to add more
instances of it because that service might be slowing down because of something else downstream
and adding more instances could exacerbate the problem so just a thought that i had that i wanted
to to bring up because it um just's the complexity of when to automatically scale based
on data, I guess gets kind of a little wonky when the more you think about it.
Yeah, definitely. Yeah, definitely has some, some, you know, positive and negative
connotations. And I think you guys coming from the, you know, management monitoring perspective,
probably have a very different perspective than, say, I don't know,
a developer who just goes like, oh, it would be cool if I could do that.
Would you put that feature in, put that nerd knob in there for me?
Right.
Well, I think it's a great way to start, right?
I mean, it's a great way to start with I want to build an application
and I want to put it out there in the wild
and I want to make sure that in case some know some i don't know some media outlet puts
it up when i get some pr that it doesn't just break because i did i can't wrongly configure
the instances i need so i think that's it's a great start but what we see in more complex
applications and i think that's what the other brian wanted to get to if you have a kind of a
chain of services if you start scaling up the the front end service, but actually don't know that with some of the accounts we work with, they are
using the monitoring data to figure out, hey, who is impacted, right?
So the bottom line is, do we slow down our end user experience or do we make it worse?
So do we are evaluating some SLAs on our services?
And then where is the real failing component that needs to be fixed or scaled up?
And then they use the monitoring data to trigger remediation actions or, you know,
now we actually start calling itself healing. So from the monitoring data, we say we know that,
let's say, 50% of our users, which translates, let's say, to 10,000 users are currently impacted
that are using that particular feature. Root is not the front-end service layer,
but it's like a back-end database that cannot keep up because there's an index that's out of date.
So this could then trigger an automated remediation action that actually fixes the problem at the root
instead of going with the default action,
which may be adding more front-end web server instances to handle the incoming load.
So I think we're seeing it more right now from the other direction where the monitoring triggers an action.
And that action can be implemented in your orchestration tools, whatever you use.
And then actually triggering the correct actions using the APIs that like a Kubernetes provides and kind of triggering scale up and scale down from another tool.
I think that's what we are seeing.
Yeah, no, I definitely – I think that's a good summary because what it does is it really highlights that the Kubernetes community – and I've never really heard this discussion from them to say,
hey, we want to, you know, try and take all the advanced intelligence that platforms like
Dynatrace have and try and replicate those, you know, in this sort of lower level container
scheduler system. I think they're still very much saying, we want to give you some basic primitives
to be able to do things, but, you know, allow your advanced systems to say,
yeah, this is what the picture of these microservices really look like. This is where
your faults really exist. This is the correlation between these things. And so, yeah, I don't think
that that shift in sort of knowledge and awareness of sort of the complexity of all
these microservices is shifting down into the platform level at all. I think, you know you know, it's, it's still very much going to be in these much more
high level intelligence systems. And it makes, it makes absolute sense. Cause like you said,
um, number one, we're still in really early days with, with understanding how all these
distributed systems work. But, but number two, um, like what, what looks like the cause could
very well not really be where the cause is. It's just like the symptom, right?
Yeah.
Exactly.
But I mean based on what you explained, I think it's the right way to go, right?
You're building and so Kubernetes is extending its basic – it's a core functionality.
I'm sure there's a lot of new APIs that come up.
It will build in some of the basic use cases to make onboarding easier
on that platform, right?
As you said, the kind of the first initial experience should be, hey, this is something
I can actually use and work with and build my first app.
Because what we see a lot with our enterprise customers, they try to figure out what is
the next platform that they're using to build the next application.
Is it going to be Kubernetes based or do I need to go to Cloud Foundry or
what do I go and build everything on top of what Microsoft provides to me? And I think if you have
this, if you're making this initial kind of prototype proof of concept first project experience
smooth and easy, then it's more likely that people will stick with that platform and then figure out,
how do we then really scale this to larger more complex enterprise applications yeah yeah i think that's
what it's going to be yeah and i think you know at least um you know one sort of trend we're seeing
in the industry is that that that first decision of like which platform do i choose um the industry
is sort of moving towards towards kubernetes so i I mean, you know, obviously, you know, folks like us at Red Hat have been doing Kubernetes for about three years in the OpenShift platform.
But, you know, we've seen Microsoft making huge bets around Kubernetes.
You know, they acquired a company called Deus and just expanded their Kubernetes offering in the Azure cloud.
AWS made an announcement two weeks ago that they're now formally supporting Kubernetes.
Google obviously has done it for a while. But even like the Cloud Foundry community is now,
you know, supporting Kubernetes sort of as a parallel platform to what they tend to do with,
you know, like Spring Boot and some of the 12-factor stuff in traditional Cloud Foundry.
But even they're supporting Kubernetes as well. So I think we're going to see that accelerate more and more because people can now look at the industry and
go, okay, they all seem to be agreeing on one standard in general. And I think the biggest
indicator that Kubernetes has kind of won is that I myself am trying to make a concerted effort to learn it. So if I'm doing it, then it must be. Yeah. So it's funny, actually,
after we spoke with Martin and we did the OpenShift one, that's when I was finally like,
you know, so Brian, my role, I'm like, I'm a sales engineer and we have to stay on top of a lot of
this technology, but it also helps for us to specialize in certain areas so that we can all
help each other out. So after we talked with Martin, one of our colleagues, he did a basics of OpenShift several episodes back. And I was just thinking, man,
you know what, I'm just going to go figure, you know, get my hands dirty with Docker,
move on to Kubernetes, and then move on to OpenShift and really just do that track.
And I was thinking, and you addressed this earlier, I was thinking, boy, I wonder what's happening
with the competition to Kubernetes.
And Andy and I were just discussing,
and we were like, yeah, we've got to make sure
to ask about that.
And it's funny that they really did become a dominant leader.
I think a lot of the other cloud technologies
are still a little fuzzy in which ones are coming out ahead.
But Kubernetes really is a clear, seems to be,
and I'm not going to try to wave the finish line flag or nothing,
but it really seems to be way out ahead of everyone else.
Yeah.
And I want to coin a new term.
I think we need to define the BWI, the Brian Wilson Indicator.
So if Brian Wilson is betting on that technology,
it's probably mainstream.
If I'm going to try to learn it in my spare time, then yeah,
it has to be the winner.
Exactly.
So now back to the other Brian.
Actually, yeah.
I'm the other Brian. He's the real Brian.
That is really the real Brian. So Brian,
on the
monitoring side, I mean, we talked about
some aspects of monitoring, but anything else that happened last week at KubeCon, anything where monitoring plays a role?
Any discussions about how monitoring has to change some of the requirements of monitoring in a Kubernetes microservice service mesh world? Yeah, I think, I mean, there wasn't a lot of, at least at a community level,
there wasn't a lot of monitoring specific sort of net new announcements. You know, I think that
a couple of things to take away from that from a monitoring perspective, one of them is just the
number of production customers that are now running Kubernetes is, you know, up into the thousands
and so forth. So I think from a monitoring perspective, you know, we're going to quickly
see companies moving from, you know, things in POCs or, you know, small instances to very large
things very quickly if people haven't already seen this. So, you know, that is always a good sign,
right? You find out a lot of things once they go into production and and you start to scale and so forth. So so that was a big trend. You know, I know from our perspective, like we've seen literally customers in in every vertical and in every part of the part of the world that are doing things with Kubernetes and starting to get to very, very big scales. I had a, I had a banking customer I was talking to just before
KubeCon and they had said, you know, we, we got up to about a million transactions a day. This was
like the day before KubeCon. And then I got a note from them this morning and they said, yeah,
I know we're now up to 2 million transactions a day on our system. And we, we feel very comfortable
in sort of where it's scaling. So that I think we're going to see more and more kind of across
the industry, bigger, bigger production environments.
The other one is this concept of service meshes.
And you mentioned it a little bit.
It's not a completely new concept.
But we are seeing this emergence of a couple of different projects or standards for service meshes, you know, sort of very targeted towards microservices. So there's a there's a project
called Istio, I-S-T-I-O, which is, you know, jointly came, you know, originally started by
IBM and Google and some folks at Lyft working on, you know, how to do east, west, north, south,
kind of very granular routing of traffic, which obviously we'll play into where monitoring is.
Lyft had a similar or a kind of a companion project called Envoy, E-N-V-O-Y, which is their
proxy technology. But then we also saw some other technology. There's a project called Linkerd,
L-I-N-K-E-R-D, which is another project that had kind of spun out of Twitter at one point.
And there was some other ones around there.
But so there seems to be a little bit of competition in the service mesh space.
So, you know, a couple of different types of implementations that are happening.
But there's also, you know, kind of a feeling that these different learnings are going to come together in some sort of common way and that they're kind of being driven today by the web scale people.
But, you know, these sort of start at the web scale things and then we see them kind of fall down into, you know, certain large scale, say, financial services, application applications or retail. So that'll probably be something that'll be more of a keep your eye on it toward 2018
and how that impacts places to integrate monitoring, places to integrate tracing and stuff like
that.
So because the idea of service mesh, as far as I know, and correct me if I'm wrong, it's
kind of like a, I should have proxy is the right word, but it's like a proxy service where ideally all the traffic goes through it and then the service mesh implements concepts such as circuit breaker, load balancing, making sure that in case, let's say, one instance behind the service is constantly returning errors, take it out of rotation. Maybe it'll end launching another one.
And if every single transaction actually goes through service meshes, that's what you mentioned, I believe.
Watch out for the monitoring perspective because service meshes are obviously another key component to monitor, I would believe, right?
Because they will see all the traffic that goes through.
They need to be operating just as fast
because otherwise they become a performance hotspot
and an availability issue.
I think I've also seen some talk in blog posts about,
well, why not just only monitor service meshes
to do end-to-end tracing?
Because they're basically,
if every
transaction goes through a service mesh, you basically at least know who is talking to whom
and what's the response time. I think I saw that somewhere, but I'm not sure if that's the only
answer to monitoring, but I think it's an interesting initial approach. Yeah, there was
definitely a bunch of conversations about it. I had a good
conversation with a gentleman named Ben Sigelman, who was kind of had been at Google, started this
concept called open tracing. And then he had launched a company called Lightstream, or I guess
Lightstream's name, he'd been around for a year or so, but they kind of came out of stealth. They're
very focused on tracing. And he was fairly bullish on the idea of, you know,
how do you integrate tracing then with the, with the service mesh? And again, like you said,
you know, having this very granular, uh, you know, hop by hop, uh, service by service kind of
visibility. Um, you know, I think there's also, you know, the, the, the one, the couple of ways
I've heard this sort of explained at a real simple level is, you know, a lot of the previous sort of service mesh frameworks, if you will.
So things like, you know, what Netflix, the Netflix OSS stack had done and was kind of language specific.
So, you know, you'd end up having teams that would build stuff language specific.
This is trying to be sort of language independent, if you will.
So it's a little more
of an infrastructure ops way of going about things. And then, you know, like you said,
you know, having a proxy or sort of a sidecar proxy deployed with every application,
on one hand sounds very interesting, because you get this very granular visibility.
And then the flip side becomes, okay, if we're routing everything through all these proxies, you know, what, what is that going to do to performance? How in the
world am I going to trace how many hops it takes to get from A to B and so forth. So there's
definitely a lot of, um, you know, big ideas, but, you know, maybe not as much operational
experience as to, you know, how ready this stuff is. Well, in case also Brian, the other Brian,
we should probably add links to open tracing and also
the diamond trace page on open tracing because diamond trace does support open tracing
and so they will be interesting so these are good things to for our listeners to read up so
educate yourself around service measures as you say 2018 we may see you know that kind of concept being pushed more, open tracing.
From a monitoring perspective, again, because we are obviously trying to figure out topics
that are relevant to the performance community, to the monitoring community.
Any other things that you have seen that monitoring tools can do or maybe can't do right now?
Maybe some input for us to say, hey, guys, you know, there's so many monitoring vendors out there.
But it seems there's still certain things that are just not done well or watch out for that because that's going to hit you in case you don't catch up.
Is there any – I know it's a tough question, but is there anything? Yeah, I think we're still very much seeing people that there are very few kind of all microservices greenfields.
So people are trying to figure out – I have – if you sort of trace the path of an application that they're building, it's going to be some mix of Java EE applications, maybe a couple of microservices, say on the front end or, or a mobile application. And then it, you know, may have to deal on the backend with,
you know, like a, like a mainframe transaction. And so I think they're, they're trying to figure
out, um, like one topic that, that, uh, one conversation I had was, um, you know, how do
I deal with an environment where say some of the application is running on containers on top of
Kubernetes, uh, but, but the, you know, other parts of it are, you know, a bare metal database or a third party API service,
say for like SSO or something else, like what's the way to think about that from a monitoring perspective?
And I think people are looking for just some some visual concepts.
They're also looking for, you know, is this going to impact me organizationally? You know, when you have such a kind of a hybrid environment in terms of,
you know, brownfield applications and some new stuff or stuff that's on a platform and off a
platform. So that's, that always kind of becomes the real question, less so than like, do I need
this specific technology? It's, you know, do I only need one technology?
Do I need three or four, you know, three or four tools to do that?
And, you know, people are looking for guidance in that a lot.
Well, I think that's, I mean, hopefully in that conversation that you had, you said, well, Dynatrace is the tool that can actually do that.
Because that's actually what we are, you know, what we obviously see with our enterprise customers, as you said, there's a lot of legacy applications out there, and they will become the backbone for some of these new apps that you're building.
And therefore, when we designed our current architecture, we made sure that we can trace transactions across different data centers, across cloud providers, across technologies.
And what you mentioned in the beginning, right, we still have a lot of customers that have a mainframe somewhere in the back end.
And connecting that with the distributed world or with the cloud native world
becomes a key requirement.
And that's what we at least solved with our one-agent technology
and our PurePath technology.
So, well, that's good to know, though, yeah. Yeah, and I think once technology. So, um, uh, well, that's good to know. Yeah.
Yeah. And I think once people get a sense of, okay, I have the right tools in place that the
next question always becomes, um, you know, what do you recommend, uh, in terms of my organization?
Like who, who should be doing, what should I, should I leave, you know, kind of, should I
retrain my current ops team? Do you have best practices around, um, maybe evolving what the, the org structure? Cause I
think people are beginning to kind of realize like what's your org structure look like will
ultimately kind of impact how you do ops, how effective you are. And, um, so I've seen a lot
of people that, that may have a mix of Greenfield Brownfield applications that are now willing to say okay um you know just just trying to adapt this to my old silos doesn't work like
what do you recommend so i'd love to hear from you guys you know beyond the technology part like
how does the organization evolve well i i think there's there's different answers to that but i
mean i can tell you an example that i believe worked pretty well, and that's actually our own internal transformation story.
Because we've been around for several years, and we've migrated and kind of transformed to a new model where we took our traditional enterprise AppMont product that we deployed twice a year to our customers or shipped twice a year to our customers and they install on-premise to what we have now where we run both SaaS and on-premise and we ship feature updates every
other week. So every spring gets deployed and we also do constant production deployments on a
daily basis. What we learned, I believe, is that our development teams are responsible end-to-end for their applications and features.
That means they rely on a platform, which we call our pipeline and our orchestration layer,
that is maintained and run as a product by our DevOps team. So our DevOps team owns the pipeline
and owns the orchestration engine, and developers are basically using that product
that allows them to push code changes through the pipeline into different environments all
the way into production and then developers are also responsible for what happens in production
that means they obviously want to know in case something fails they they have full access to
the monitoring and then they are tasked to tasked to obviously fix in case something is wrong.
And that transition wasn't easy.
And I don't want to go through all the details because I believe our listeners have listened to us talking about our transformation story.
But I believe we saw a big shift towards enabling individual developer teams, give them the right platforms so that they can focus on what's important
and that's creating value for the business
but also maintaining that value,
which is making sure that these apps and services
keep up and running
and working very close with business
to figure out what the next big things we need to build.
And we kind of went away from the traditional ops team.
So there's no, at least in our engineering team
that is running and operating the Dynatrace platform, there's no traditional operations team anymore.
Interesting.
Yeah, that's very similar.
So at Red Hat, the OpenShift platform comes in a couple of different flavors.
One is software that we will ship to customers and they can run it and operate it anywhere they want.
We also host a couple of managed services or SaaS services.
And we've kind of gone through that same transformation.
It was, you know, I think originally we had the model of like, well, we'll give that SaaS team the same software every quarter or so.
And we're now to, you know, about every three or four days giving them, you know, small partial releases. And they're going through the process of, of both, you know, like you said, doing continuous kind of integration to the SAS
application, learning how that works, kind of blurring the line between the engineering team
and the ops team. And then, you know, we just got off a call this morning with them and they were
talking about, you know, trying to really be good at building tools, automated tools to allow low level
self-service, you know, very granular kind of spin up of clusters of things. And but yeah,
once you once you force your teams to do that, the feedback loop that they get in their learning
curve kind of takes off, you know, to accelerates like crazy. There's some there's some painful
times early on, but the feedback for us has been really similar. And it sounds like it's very similar to your story.
Yeah.
I got another question.
So obviously it is easy or easier for green legacy apps, over to a new concept like containers and Kubernetes?
Yeah, so it's a great question. you know, we, when we first kind of went down the path of containers and platforms and stuff,
our expectation was, you know, people wouldn't have any interest in, in moving stuff because,
you know, moving existing stuff, because the, you know, the thought process was, well,
you'll have to adapt it to the systems and, and, and the cost of doing that. And, you know,
are the, are the existing developers still around and all those things? What's been sort of
surprising to us is a lot of our customers,
you know, beyond just the, the, you know, new kind of microservices, they're building something to
change their mobile application or kind of update their kind of customer experience front end.
A lot of them, especially with containers have said, Hey, you know, in order for us to make the
dollars work, the ROI work of this, these platforms, you know, in order for us to make the dollars work, the ROI work of these platforms,
you know, can I move an existing application? So they'll, you know, they'll say, well, you know,
I currently have an application that today just it's a, you know, it's a Java EE application,
for example, maybe it runs in JBoss or WebSphere or something. Today runs on Linux fine. Could I
put that in a Linux container? And while you go, well, it's, it's, you know,
it's kind of a big monolithic thing. Does that really make sense for, for containers? Because people talk about them as being for microservices. We've actually seen a lot of people do that.
And the reason being, now you have this common language between the development team or the
application team who says, this is how we package it, this is how we test it. And the operations
team who says, okay, here's, here's an immutable way of packaging the application. We have immutable infrastructure.
We can kind of build a more modern operation around that. And then they, you know, in some
cases they'll, they'll kind of monitor it in somewhat similar ways that they did before,
you know, agent based and so forth. But we've been, we've been really kind of pleasantly surprised at how well containers
and Kubernetes are working, like for stateless applications and being able to sort of lift and
shift applications, you know, to a point where we have a lot of companies that are, you know,
doing that today. And, you know, the benefits aren't so much just like, oh, I mean, they get
a little benefit that seems sort of like operational efficiency, virtualization type of stuff.
But more so it becomes this forcing thing of, OK, my dev team now has this this one common terminology and process around packaging and testing.
And then the dev, you know, the ops team has sort of similar and it's helping them with those, you know, sort of dev and ops transition.
And so that that's been really interesting to us.
Yeah, that's cool.
I like that.
So kind of exposing developers in a kind of – I mean we're exposing them to the ops world without having them to completely transform also their apps.
Right.
Yeah, that's interesting.
So I had a workshop last Friday
with one of the bigger consulting,
IT consulting companies.
And so all of what they are selling right now,
at least the team that I worked with,
it's all cloud transformation,
cloud native transformation.
And one of the things that they got very excited about was actually the
concept of not only lift and shift but really the kind of breaking the monolith um and so we talked
about how can we break the monolith how can we how can we not only move over to container technology
but actually then leverage the fact or the capability of of scaling up individual pieces
of the app so we have to always break it apart.
And one concept that I introduced them to,
which is something that I think APM enables,
so monitoring and especially the way we built it,
you can install, let's say, Dynatrace on a monolithic app,
and then you can draw virtual boundaries
around your certain interfaces or your certain methods and classes.
And then you can observe your monolith and figure out
how are these kind of components within the monolith talking with each other?
What are the real dependencies within the monolith?
And which allows you to test your assumptions on,
if I have this monolith and my developers tell me,
here's a component that they
believe they can extract well with the monitoring tool that actually does tracing and season to the
bytecode and analyzes who is calling whom on a method to method and component to component level
you can either you know completely destroy these assumptions that they had or say yes that's
actually a good a good component we can extract it. We can try to move this out into its own instance or entity
and then run it as a separate service
and then scale it up and down independently.
So kind of breaking the monolith,
even though I'm pretty sure it does not work
with all applications out there,
obviously not without a certain effort.
I think the first step towards figuring out,
can we migrate it
is actually taking the monolith and kind of shining the light on that monolith
while it's running and then figuring out what are kind of the individual components
that it could potentially extract.
And I think that was a concept that they really liked
because you don't have to necessarily do any code modifications to run that kind of exercise prior to extracting the monolith into smaller pieces.
Yeah, no, I like that.
A lot of the workshops that I've sat in for similar stuff, how to break up the monolith or how to strangle it, gets into – gets into, you know, you have to have expertise in domain driven
development. And then you have to sit down with your business leaders and prioritize,
okay, do we really need to be dependent on this? And, and at the end of the day, you, you know,
you get some whiteboards, or you get some discussion, but yeah, having, having a tool
that will actually give you a sense of like, okay, where, where's your links? Where's your
dependencies? Where's your, where's your priorities? I love that. That's a great first step. And like you said, it literally costs
you almost nothing to kind of have real data around, you know, what, what, what you could do.
And then you can start to figure out, okay, what makes, what makes real sense. And, um, you know,
how do you put business metrics against saying, Hey, do we rewrite something or decouple it? So
that's awesome. I like that story. Yeah. and I think if you actually take it a step further,
maybe I need to suggest it to our development team
because we see so much information within the monoliths.
We could run our AI, our machine learning on top of it,
and then instead of us coming up with assumptions
and drawing the virtual boundaries,
the tool itself can say,
we figured out a certain part of your code
that kind of is independent and the
only kind of touch points with other parts of the code are through these two or three interface
methods so that would even be cooler if you are thinking it about thinking it around that okay
all right well this is my little self an idea yeah i know that's great yeah um brian anything
else that you want to that you want to mention about kubernetes open shift i know we guys we
have a partnership with redhead um anything that you want to additionally mention that could be
interesting for our listeners for our community that are, as I said,
centered around monitoring Dynatrace.
Obviously, I think a lot of our listeners
are kind of familiar with Dynatrace.
Anything else that you want to tell?
Yeah, I'll throw out two quick plugs
because I think we've talked about a ton of things
and maybe overwhelmed some people.
So OpenShift is obviously our
implementation of Kubernetes. It's very enterprise centric. So a couple of things, and we can put
these in your show notes. I know we've done a couple of kind of webinar, demo webinars together.
We call them the OpenShift Commons community. So if people kind of want to see what that
interaction looks like between, you know, your system and what it looks like at a Kubernetes system, um, there's a couple of really nice demo videos that show that and we'll make sure those links get in there.
Um, but even, even at a simpler level, if we kind of go back to the Brian Wilson index of, you know, how do I learn this stuff?
So we, um, we, we, we work with this really awesome company called Katakoda, who's a training partner.
And if you go to katakoda.com, K-A-T-A-C-O-D-A.com, they've got really, really nice tutorials set up.
So you go in through the web.
They have environments already built for you, so you don't have to mess around with anything. And then they have a bunch of sort of pre-written tutorials, everything from Kubernetes basics, Docker basics, you know,
other things in there. And then we've built a kind of a custom one for OpenShift. So if you go to
learn.openshift.com, there's probably about 10 or 12 modules there that'll walk you through all
the basics of, you know, setting up applications, monitoring applications, scaling them and so forth. So if you're, if you're like, if you're like other Brian and
you're wanting to learn this stuff, it's a, it's an awesome first resource and costs you nothing.
And you don't have to have any tools besides just your laptop and a browser.
I'm, I'm bookmarking it right now. Yeah. Awesome. That's pretty cool. Thanks for that.
So, yeah, it's a, it's a fun space. It's growing really quickly.
You know, it's a really good, healthy community, so very accepting and, you know, lots of people willing to help and stuff.
So, you know, for people that might be interested in this stuff, you know, jump in.
The water's warm.
And you can definitely find people through, you know, Slack channels and other things to help you, whether you're a newbie or you've got kind of an advanced problem.
That's great.
Cool.
The other Brian, anything else from your side before we wrap it up?
No, I mean, to me, this was a lot, as you know, these are the things I'm going to be diving into. So listening to this was more learning for me than participating as you could hear um just for anybody who is
interested in following my path of of learning um obviously you can get doc you know docker's
free obviously and you can go down with the kubernetes but what i'm just so if if there's
a curiosity what i'm using for my model is we have up on the Docker community, our demo application,
easy travel. It's, you know, this hokey little, that's actually pretty, pretty advanced. But you
know, in terms of the technologies it employs, but it's a it's a we have a Dockerized version
of our demo application out there, which you can then run. And so that's what I'm going to
plan on running through the ringer with just for standalone Docker, setting it up, then setting it
up through, you know, using just straight up Kubernetes, and then seeing what I can do with
all that in OpenShift. And I'm sure there are other applications. But the nice thing about
easy travel is you have multiple tiers. So it gives you, instead of just doing something like spring music,
you have a multi-tiered application
that you can run and play with these things.
So that's about all I can contribute to this today
is to say if you do want to play with it
and you're looking for a multi-tiered application
to use up in,
if you search Dynatrace in Docker,
you'll find our demo application.
And it's got nothing to do with, you know,
buying our tool, using our tool.
The application runs as its own without anything else.
But of course, you can check out the tool as well.
That's all I have.
And I want to say thank you to Brian for joining.
But I think, Andy,
I think it's time to summon the Summaryator.
Well, I want to keep it short today.
I think what I learned is that the big next thing we will see on the Kubernetes front is making it easier them, that they have a good first experience when putting, let's
say, either a prototype or some simple apps on Kubernetes.
And then I'm sure people will get stick to it.
We also learned that Kubernetes kind of surpassed all of its other competitors.
It seems it's the number one choice when it comes to orchestrating of containers.
We also learned today that in case we want to predict
the future of technologies,
we have to summon the Brian Wilson indicator.
We have to look at the Brian Wilson indicator
to figure out what's going on.
And I was also very happy to have a little excursion
around monitoring, how to to monitor I think we are
and obviously this is
a podcast that should not be
too focused on tools
but it seems we at least
with the stuff we're doing on supporting
open tracing and the way our one agent
works giving us full intent visibility
was a good
way the way our
developers integrated it
or built our new Dynatrace platform.
And having that said, thanks, Brian, for being on that show.
And hopefully you'll have a lot of listeners in the future for your podcast.
We'll definitely make sure that you guys get mentions
and the link gets back to your podcast.
And if you ever want to come back to the show or even bring Tyler as well the next time, you know, you're always invited because it's a great way to educate the larger community out there.
Yeah. Thanks, guys. It's been enjoyed the conversation.
And, yeah, we'll definitely have to get Tyler on next time.
We'll work out the scheduling better, but thanks for having me on and hopefully everybody has a
great holiday season and hopefully we'll get a chance to talk to you guys in 2018.
Right. Excellent. Cool. Thank you. Thank you.