PurePerformance - What is Data-Driven Continuous Delivery aka CDv2 with Tracy Ragan
Episode Date: January 11, 2021When moving to microservice architectures its time to re-think continuous delivery. Just as many software services rely on a core data analytics engine to make better automated decisions we need to ap...ply the same for continuous delivery. We can assess the risk of every microservice deployment based on data from production and the desired change of configuration. We can assess the potential blast radius and mitigate it through modern delivery options such as blue/green, canaries or feature flags.Tracy Ragan, Creator & CEO of DeployHub, CDF board member and DevOps Institute Ambassador shares her thoughts on why we need to move to smarter data-driven delivery pipelines. Tracy (@TracyRagan) gives us insights into why not every microservice is created equal and what approaches we can take to better control updates that contain multiple microservice updates.Also make sure to check out their latest project Ortelius and take Tracy up on a virtual coffee chat as discussed in our podcast!https://www.linkedin.com/in/tracy-ragan-oms/https://twitter.com/TracyRaganhttps://github.com/orteliushttps://go.oncehub.com/15-30MinuteVirtualCoffeeWithTracy
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
As always, my name is Brian Wilson.
And as always, Andy Grabner's name is Andy Grabner.
Sometimes Andreas Grabner, but only for his mother.
Happy New Year, Andy.
Happy New Year.
I just wanted to say it's great that we're still we're still doing that show even in 2021 hoping that well knowing that 2020 is behind us and 2021 can only get better i i think so right with a lot of
potential there's a lot of promise on the horizon so let's keep our fingers crossed that uh yeah for
everybody and i think this is a good point came up a few episodes ago when we were talking about
chaos where when when talking about chaos testing if if someone says, oh, that'll never happen again, just kind of a reference back to 2020.
Exactly.
Whatever won't happen, COVID, murder wasps, everything, you name it.
I think there were some vampires somewhere.
Who knows?
Anyway, we are in a new year
new show
it's yeah
as you mentioned
we've been doing this
I think since
I was going to say
2015
but it can't be
I think 2016
2016
yeah
and thanks to everyone
who's been listening
if anyone is on here
who's listened to the first episode
and everyone on
who has perfect attendance
thank you so very much
we love you all
and we hope
we keep entertaining you with our
short banter. You know, really quickly, Andy, I got to mention this. I know we try to keep it
short. We used to have it a lot longer in the beginning, right? My wife, we were driving home
from my mother-in-law's and she threw on this podcast and we're listening and talk about banter
15 minutes in, they hadn't gotten to the topic yet all they were talking about was
things like their audio quality funnily enough um and just all this other stuff i'm like when does
this show start we're 15 minutes in so to our guests who are about to introduce the reason i
bring this up and it's relevant is way back in the early days we used to go quite a bit longer
before we went in not 15 minutes now we're pretty good at getting right to the chase. So speaking of getting to the chase, Andy.
Perfect. Yeah.
I'm very honored to have for the inaugural episode in 2021,
for the New Year's episode, Tracy Reagan.
And hopefully I pronounced the name correctly
because I noticed multiple ways,
probably how you can spell Reagan, Reagan.
It is Reagan.
And it is quite an honor to be here
for the first 2021 podcast.
It couldn't be a better way to start the year.
Awesome.
Hey, Tracy, it's amazing.
I have your LinkedIn profile open.
And let me just read this out.
Creator and CEO of DeployHub,
helping DevOps teams simplify microservices at scale.
CDF board and DevOps
Institute ambassador. That sounds like you are really busy. I am really busy. I am really busy,
but I've been in a small company for the last 20 plus years. So busy is what you learned.
Yeah, now very cool. And we got to know each other through the CDF, the Continuous Delivery Foundation.
And I actually think, and I remember when we had a chat two weeks ago, a couple of weeks
ago, and the invite popped up in my calendar.
And then I thought, how did I end up getting a 15-minute coffee call with her?
And then I actually remembered and I looked back into my emails
and you were basically sending out,
was it on Twitter or email or on Slack?
If you want to chat with me,
just book a time for a coffee chat
and then let's have a conversation,
which I thought was really great
because this started the whole communication
or conversation we now have.
And so thank you so much
for being so open and available for the
community. Absolutely. It was my goal in 2020 when I realized we were all going to be seeing the world
through the Zoom window to really reach out to folks and start talking about what we're facing,
not just in terms of being in quarantine or pandemic, but what we're facing in terms of new technology,
everybody is starting to talk about cloud native and K8s.
And there is certainly a tsunami on the way.
And there was no better time to do exactly what I did in 2020,
to really start talking to people.
And I have probably spoken to,
I'd probably say about 500 people over the last year,
all in different levels of their journey in Kubernetes.
And what better way to really define requirements for an open source project, right?
Yeah, yeah.
So that was my goal.
That is really cool. So just let us help me understand, because maybe other people want to follow that same model.
You pick a certain time range in a week or like a certain time slot or time slots, and
you put it on a calendar and people can book the time or.
Yep.
I block out my calendar from nine in the morning until one in the afternoon, my time, mountain
time.
And that way I have the afternoons to get work done.
And I open up the calendar and I,
you know, let everybody know. I, you know, reach out to people I see in LinkedIn.
If somebody wants to, you know, wants to, you know, follow me on LinkedIn,
I immediately send them an invite to say, let's not just follow each other. Let's talk.
I've never heard of someone doing that. That's really, really awesome.
I think that's the first I've heard of it yeah that's really amazing cool hey now tracy i know we both and i'm sure brian
included we have one of our favorite topics is kind of thinking and contemplating about what's
the future of this discipline that has been around for a while, but it seems it's been stuck a little bit in the,
in the old way we did things. And I'm talking about continuous delivery,
talking about,
I think you actually,
I mean,
this is the first time I'd even,
I mean,
I had,
I heard the term,
you said it's time for a continuous delivery V2.
Yes.
Absolutely.
Can you enlighten us what that means for you?
What,
what do you,
what do you want to achieve?
Well,
you know, you guys started,
you started it when you mentioned chaos, right?
Earlier in your bantering.
And we are entering a phase of chaos engineering.
Like it or not, we are.
When you think about the real benefit of Kubernetes
is the auto-scaling and the fault tolerance.
And how do you really get that?
You get that by decomposing your monolithic applications into functions, microservices.
The minute you do that, you create chaos engineering because you instead have one
big monolithic that we sorted out all of the link issues at the earliest state in the development lifecycle,
which is at the compile and link step.
And we're leaving that to runtime, whether it be in development or QA or production.
That link step is being done at runtime.
That, in essence, is chaos.
And now while we have solved the problems of sort of encapsulating from the operating system. Once we've broken apart an application and we don't link it, we are exposing ourselves to problems with different versions of different microservices that make up different versions of the applications across many different clusters. It is a huge chaos problem and one I find fascinating.
And that's what we have to think about.
We have to really understand what we did when we decided to decompose an application version into independently deployable microservices.
What did that mean to the continuous delivery pipeline? and how do we now need to morph from the CD perspective to be able to still have a North Star,
still understand what an application version is,
and still be able to say,
we want to put these new features in version 5.1 of our new application.
How do we do that now?
Let me ask you a question because I want to test my knowledge.
I know Andy's probably got a million questions.
I can see it.
Now that we can see you, Andy, I can see your gears running.
But I do a lot of theoretical.
I work in pre-sales.
I'm not doing implementations.
I read about stuff.
We talk about stuff.
We discuss things with customers.
So the one thing that's always tripped me up a little bit
has been the idea of service meshes, Istio, for example.
And the way I described it recently to somebody,
thinking I understand it, was similar to what you're talking about.
How in traditional applications, you would define your endpoint, you would have a monolith talking to a monolith, and you had a point and it was done.
And if you're using something like Kubernetes, you have to define those endpoints in every single pod, every single container running in there.
The idea of a service mesh is you just say, you know, connect to login service and Istio or your service mesh knows what those endpoints of login service and manages all that.
Only because you're bringing this up.
I just wanted to check.
Am I getting my understanding of service mesh?
Is that the general concept for that?
It is.
Just to summarize what service mesh is, it's request routing.
Yeah.
Okay.
It's pure.
But it, I mean, don't even get me started on this conversation.
That could be an entire different podcast.
Exactly, no, no.
Maybe we'll take you up on that one, yeah.
But in terms of the continuous delivery pipeline, in the future, my prediction is that,
and when I first started saying this, when we were able to see people in person,
they would look like, they would say, you know, I'm crazy, or they thought I'm green or something. We have the ability to get rid of dev, test, and prod
because service mesh can do the routing. Now, if we understand, we have one big,
massive cluster, right? And we put all of our microservices are immutable, so they're all
running in there. What's the point of having a different cluster? Why can't we just get service mesh to route to the correct persona, the right version of the application?
The only caveat to that is how do you manage multiple, you know, dev, test, and prod databases?
Which brings me to another really fun conversation, which is mono versus poly databases.
So, yeah, service mesh is really going to, you know, most companies haven't started looking at Service Mesh, but they will.
I was at a Spinnaker presentation and they, you know, they did this beautiful presentation.
Then they said, now I'm going to tell you we did something and don't think we're crazy because it really solved a lot of problems.
He said, we combined our dev and test into one cluster.
And I was like, yes, I knew that would happen.
So it's beginning. It is beginning. And finally, we will get rid of waterfall. We really have always, you know, we talk about
waterfall like we've done it forever. But I mean, we talk about
Agile like we've gotten rid of waterfall, but we still do waterfall.
We still, in Agile, we've gotten rid of Waterfall, but we still do Waterfall.
In Agile,
you do a small change to code. You check it out.
You compile the whole beast,
and you release the whole beast.
In microservices, it's the last
mile of Agile.
Now, we can really start
thinking about how to get rid of DevTest and Prod.
That's how it
ties into CD.
I think it might be interesting, actually, Andy,
to have some follow-up conversations as well,
because I think there's a lot of really cool topics
we can get into.
I just wanted to check that service mesh thing,
if that sounded like a little bit what you were talking about,
which it sounds like somewhat.
But yeah, let's go back to the idea of CD2.
Yeah, the CD2.
I still want to there's two
comments that I
have to your
statement.
First, the
first statement
is Anita
Ingle,
the head
of our
DevOps,
or we
now call
it ACE
team.
She made
a statement
about two
years ago
or three
years ago
where she
said the
maturity of
an organization
for her is
indirect
proportional
with the
number of
environments
that you have
so that means the more environments you have the less mature you are if you in the end you only
have one environment that's prod then you obviously reach the highest level of maturity
on the other side i gotta say and we talked with kelsey hightower and others right if you think
about kubernetes there's a lot of things changing in these platforms. And the question is,
how do we deal with to kind of test
the new platform versions,
the new versions of things we're depending on?
Because if everything runs in prod
and you are updating your prod cluster
to the latest version of Kubernetes
without having the ability to test this somewhere,
then you may run into the problem
that you're upgrading your Kubernetes cluster
and all of a sudden everything falls apart.
I mean, and maybe I get this wrong.
Maybe there's a better option for doing this,
but at least this is one of the few reasons
I can also think of not trying to achieve prod only.
Well, we'll see how the industry goes.
Yeah, yeah.
We will.
And, you know, there is a part of that statement that you just described, the maturity level based on environments, that has to do with your ability, your team's ability to do true configuration management.
There should never be a guess about it. Now, certainly if you're making some kind of big update to a cluster,
something at the low level, I would say maybe you want to test that in a different cluster, right?
But for the majority of, you know, it's the old 80-20%, right? 80% of the code that we have
running in a cluster really doesn't change that often. It's 20%. So how do we make the 20% as efficient as possible? And how do we support business agility
by allowing code to get out to end users as quickly as possible? How do we deliver innovation
all the time? And that is the essence of a microservice, is the ability to do that.
So while that 20% is critical
and 20% may have to have its own cluster to be tested,
I do predict that there will be a time in the future
that for the majority of the changes,
they will bounce right to production.
Yeah.
No, your word in God's ears or whatever,
that's at least a saying in German.
I'm not sure if that translates well into English.
Yeah,
no,
there's,
there's,
I just actually came across that.
I forget what it was on some show.
Oh no,
no,
no,
no.
It wasn't,
sorry.
It was a,
it was some political lacerate on Twitter.
Let's not go there.
Yeah,
exactly.
I'm not even going to mention the names.
But yeah. All right. So let's go back to CD version two. Now, Tracy, we have talked about
in our coffee call that we had, and we talked about event-driven continuous delivery. I explained to
you what we are doing with Kepton, kind of the same story about what you were saying.
We were breaking up monolithic applications into services
and then connecting them through events,
but we haven't done that, let's say,
evolutionary step in continuous delivery yet.
Is it about time?
And actually, does it solve the problem?
And so I would like to get your thoughts
on what CD version 2 really looks like,
what event-driven plays with, and what's the event-driven concept? I mean, what is CD version
2 for you? So if we think about microservices, let's just keep it in the context of a microservice,
because that is where it really requires the biggest shift in thinking.
Not all microservices are equal.
We will have microservices that impact lots of applications.
We will have microservices that are front-end that impacts only one application.
We'll have microservices that are security-related, login routines, database access routines.
And not all of them are equal. So why should we continue with a very authoritative workflow process that forces every microservice to go through the same
workflow? Now, I think one of the, when I first looked at Keptit and I just, you know,
this has been some time ago and I've read through it, there is a concept of strategy that came up.
And I kept thinking about that.
And really what we need is not a CD workflow, but we need CD strategies based on the microservice.
And the best way to do that is through something that's more templated,
something that's more event-driven. And we really should be able to create a workflow on the fly
or a strategy on the fly. So if we stop thinking about workflows, because workflows really puts us
into this very strict kind of, you know, you do something at dev, you do something at test,
you do something at prod. But if you think about a proper strategy for a particular microservice that has a particular risk level, then you start thinking in terms of, well, what do I need to really do to get this out?
What is the proper strategy of this particular microservice,
we should be able to on the fly create a strategy that pushes it through the pipeline that's appropriate for that microservice.
Now, whether we do it based on events or some kind of a templating engine, I'm a big fan
of Stephen Tirana and his Jenkins templating engine.
I think that that will save a lot of work for a lot of people who are using Jenkins.
I'm a big fan of Kepton and how you have a kind of a control plane listener that says these are the events that could be executed.
And even Tekton.
This is the shift in where we're going.
Jenkins X just announced their beta 3.0,
completely based on Tekton and the events catalog.
So now that we have events, we have this idea,
how do we best put them to work?
And I feel like shifting from this concept of workflows
and start thinking about the proper strategy
for the item that we're managing
is where we should be.
That is, in my mind, that's the essence of CD version two.
And so that means fascinating.
And thanks for that.
I took a lot of notes.
But so it starts then with the assessment of the risk of a microservice.
You need to put them into different
buckets and say hey this is a i don't know very low risk microservice you can go to production
easily you can do canary deployment and then we have a certain model of how we turn on the canary
load but then there might be hey this is the login service and if this one fails then we obviously we
have a big problem so we need to go through a different process or yeah a different strategy this is how do we assess
the risk then how do we automate that or there is you know we have something now that we can really
start leveraging and it's called machine learning we have all of this information that we should be pulling back from the production environments
to start defining risk level. And it has to do with configuration too. If we go back to that
discussion around maturity and configuration, that's what we are focused on at DeployHub.
And that's why we're excited. This is sort of an auspicious day for us. This is our first full day for having Ortelius as part of the CD Foundation. And Ortelius is a microservice management tool that tracks
microservice, it catalogs them, it tracks their deployment metadata, it can track how,
if it failed when it went out. So, and based based on those, those, those criteria, we're going to
start understanding the criteria to start understanding the risk level of a microservice.
And that is the essence of chaos engineering, right? Because we're going to let the data tell
us that, not a human. We need the data to return that information and we need to act upon it.
And, and that acting upon it should start with assessing a risk level or the blast radius.
What's the blast radius of this microservice?
Maybe it should go to a test environment before it goes right to prod because I can promise
you in some of those, if we think about a strategy for a front end where it's just a
drop-down list that's being changed, that strategy might be let the developers test
it and push it out to production right away.
But if it's a security routine, we probably would want a strategy that might take it through several different steps of testing before it goes out the door.
So we have to start allowing the data drive the CD pipeline.
We have to have smart CD pipelines. It shouldn't be something that a human decides that this we're going to, in a very imperative
way, say it has to be pushed through this kind of a workflow because we're not monolithic
anymore.
I think the idea of the data driving these
decisions is really, really important because as you were describing this, my brain initially
is first time I'm hearing some of these concepts that you're bringing up here my brain was already like throwing up barriers like this you know and
trying to think through them because i'm like obviously i just can't say you know that like
let me think about what's making me react this way and it really came back to you know i haven't
done i haven't worked i've been a sales engineer since 2011 so the last time I did performance testing was 2009, 2010-ish.
Waterfall, no automated deployments,
very immature models, right?
And I think that's the key here.
Very, very immature models.
And I remember one time having an argument
with the product management team
because I wanted to do a performance test
on a release.
And the developer was like,
this is just such a minor thing,
doesn't need testing, we're pushing it,
it's got to go out.
You're not testing it. I'm like, you know, I was always like, this is just such a minor thing. It doesn't need testing. We're pushing it. It's got to go out. You're not testing it.
I'm like, you know, I was always like, we should test everything.
As a good performance tester would be, you know, try to make a fight for it.
Of course, predictably, went to prod and crashed everything.
Right?
This minor little thing.
It was some stupid mistake they made, right?
And that's what got me thinking, like, oh, how do say this is a non-important or a low-risk microservice?
And I think your answer specifically to the data point being, let the data drive that, not the humans.
But also, I think this all relies on the common theme we've been discussing so far about there having to be a maturity model in place before you're doing these things.
The reason that one and the reason why I was resisting the idea initially is because we were doing an old-fashioned deployment there was very low maturity
there was you know countless times you would deploy from dev to prod to qa i mean dev to qa to
prod to qa with you know logging turn on full or debug you know stupid things like that because
that wasn't being treated as code it wasn't be all these manual switches so it's the the long
point i'm making is if you look at this and if you drop your guard to look at this and
not resist like i started to um to think like okay if you have a proper maturity model in place and
you have your guard rails and you have as many you know things automated like your deployments
probably assuming that's what a lot of what the play hub helps you with right is to automate all
these pieces and make sure everything all the configs are properly set you remove the let's say the stupid risk the stupid
human risk from it and you're you're left to just using the data you collect to capture the real
risk um from the technical point of view which can help you do this so yeah no i i i like the idea
sure in short i like the idea i had some reservations as we were going but i was like thought through a minute
something to share because i figure a lot of people probably hearing this might be like oh
come on come on but again you have to be at a certain level this is not like i know how to
drive so i'm going to get in a 747 and try to fly it and remember in your example um and by the way
there is such a thing as a prod to test to dev, that's called an emergency release.
Exactly.
And there's a lot of those done on a very regular basis.
But in your example, you're thinking in terms of monolithic too.
Exactly.
And monolithic could potentially have a bigger impact because it's monolithic.
When you're moving smaller functions out, your risk level actually comes down for that particular deployment.
And that is the whole idea of Agile.
Exactly.
So that's why I keep saying we have really achieved Agile's last mile when we think about microservices.
And microservices will be deployed all day long. This is not a, we're going to get into a room
and have a meeting about a deployment of a single function
and discuss it and then to schedule it
and have people stamp it.
They have no idea what it does anyway,
which used to make me crazy
in those kind of deployment kind of approval meetings.
It's like, you don't know anything about this anyways.
Why are you approving it?
Trust your developers.
And if your developers break it, they need to fix it. And that's the importance of configuration
management, having a difference report, understanding what you just did and being
able to back it out really quickly or shift from blue to green. We have the skills. We have the
tools to be able to make this shift. And it's required. We don't have a way to go back.
Microservices has pushed us to a place that we have to rethink and reimagine everything about
our CD pipeline and start making it smart and start making it fast so businesses can really
achieve the agility that they've always driven themselves to achieve.
They want to be the first one on the market with their new feature.
Banking, insurance, all of these heavily, even the securities area.
They want those features out today.
They don't want to wait.
They want to get that stuff out now.
I want the vaccine yesterday.
That's who we are now as consumers. We want it now.
Tracy, I got a quick question for you then on this. I understand the happy world scenario
where we all have microservices, we all can deploy them independently.
And that's where we want to get to, obviously, with different maybe processes depending on the
risk. But in what i've seen
also with organizations that are now moving to microservices and they they want to push something
new out they always say well if we want to if we want to have this feature we need to push these
five microservices out in this version because in the end they all encapsulate a value stream or
like a value increment for me the challenge here is now how do we do you have any any thoughts on how you actually
organize this and how you are controlling the rollout of services that should be independent
but really they are not because they are depending on individual versions how do you do this this is
through feature flags that you just deploy them and then you turn them on at some point
or how does this work andy that was you're very kind to ask that question,
to be quite honest.
That is the essence of what we're doing with Artilius.
Think about Artilius as not a deployment solution.
Artilius is a configuration management solution.
So in the Artilius world,
let's just break it down to really basic.
A microservice is a component.
Applications are a collection of components. And why I use the word component, because it may be something other than
a microservice. It could be a Lambda function. I don't really consider that a microservice.
So it's a collection of components that can be independently deployed.
What we have to be able to do is every time a microservice is updated, you know, it's registered to Quay, it's in Docker, new versions in Docker Hub.
We have to be able to grab the details about that and version it.
Once that's done, we know that anything that consumes it also has a new version.
Now, you brought up another interesting idea, which we're talking about for 2021, and that's what I like to call component sets.
So while microservices are supposed to be loosely coupled, they're not always loosely coupled.
In fact, we know now that people are writing microservices, they're not an application.
They're not the teller application for the bank.
They're just a set of microservices that have to be deployed together. That's what we're calling a component set. So what we do is
we take that information and we pass that on to tools like Spinnaker or Argo or Helm to actually
go off and do the deployment. And we pull back that information and we check that deployment
file back into our logs so that it's hermetic and can be redeployed at any point in time.
But what you end up with is a central database
that shows the differences between two releases
at a component level or at an application level
or at a cluster level.
It can show the blast radius of a microservice.
Even before you deploy it, you can say, I'm a microservice developer.
I'm going to update this.
How many people are actually consuming it?
Oh, wow, 15 applications are using this.
Maybe I should be a little more careful and notify everybody that is coming across.
Or maybe our CD pipeline should be smart enough to say, this microservice has been updated. Go look at the configuration data and then re-execute the workflows for all of the testing for all of those applications before it goes out
the door. So everybody's had an opportunity to look at it. So it goes back to that statement
that you read about the maturity level. It has to do with being able to understand how applications
are put together,
what their differences are as they get pushed across, and the versions that consume them, and what their blast radius is. So it's back to being able to see the puzzle, the top of the box
of the puzzle. What are you building? What does this puzzle really look like? Even though it's
logical, we are still building applications.
And we still have to be able to see it that way.
And that's what the Ortelius open source community, that's the problem set that we're solving.
So by the way, when I asked the question, I had no idea that you were actually releasing
Utilius today.
So this is not that I do your favor here.
No, not the setup, really. releasing utilias today so this is not that i do your favor here but but but basically to the
listeners now we we tell them that we've recorded this not in the new year but maybe in the old year
if you look up the release date so damn it oh they already probably forgot what we were
yeah so you know go on. No, this is interesting.
So we have the, you know, as you know,
with what we do with Captain,
we obviously have a very tight integration
with monitoring tools, whether it's Prometheus
or obviously also Dynatrace.
That's where most of us work.
And we have a lot of this data, right?
Dependency data.
We know we have version information
and that was also our thinking,
like what can we do with this data? Or which other tools can leverage the data know we have version information and that was also our thinking like what what can we do this data or which other tools can leverage the data that we have by doing distributed tracing
across your microservices and knowing exactly how many users are currently using a particular
service that is like three levels down and what's the blast radius if this falls also how often has this component failed in the
last month when it was talking to another service in a certain version range right we have all this
data so i think we should also besides this podcast try to figure out how we can get our data
to your tooling and it's that kind of combined data that's going to start giving us those risk
assessments exactly yeah and that's what we start giving us those risk assessments. Exactly.
Yeah.
And that's what we then can push back into that CD version 2, right?
To define the strategy for any particular microservice based on the data that says this is the risk of it.
Yeah.
That would be cool.
Yeah, it would be cool. And there's another component that I just learned about last week because I was invited to a hackathon that we had internally.
And one of the guys, he was creating a tool.
He analyzed our – so in Dynatrace, we detect problems and also root causes when we detect a problem.
And he was basically looking at the problem history of the last month.
And he figured out, are there any particular points during the day where more problems occurred
than other times during the day?
And to what are they related to?
Is it infrastructure problems?
Because let's say every day
at two o'clock in the afternoon,
some team is doing infrastructure updates.
I don't know, right?
And then he was looking at it on a daily basis,
on an hourly basis, on a weekly basis.
And it's very interesting to also then put this
into consideration because if you know that there's an 80% chance, if you want to deploy now,
that it fails based on historical data, not because of that service, but maybe because
something else you don't have under control, then you can say, you know what, let's move this
deployment window a little further out. Exactly. So if this is the point in time that it's
auto-scaling, maybe you don't want to deploy it at that point in time.
Exactly.
Yeah.
So there's a lot of cool data that we can then use
to influence our automated deployment decisions.
Yes.
I think this is the big area of,
I don't want to say needs improvement.
I think the ideas are there,
but the area that needs implementation.
And we talk about this with a lot of our guests
who have awesome tools, right, about data sharing
and getting data from one tool to another
because there's a lot of tools out there now
that can leverage each other's data.
And I think there really needs to be
a focus on integrating these tools.
All of the APIs, all of the ingests, they can all process and use them.
And there's a ton of potential floating out there
to get to those even deeper maturity models.
And I think that's the biggest challenge facing everyone
is actually getting the time and the ability to get those set up.
Because just think about when we can get all these things hooked up.
It's really, really cool conceptually i just yeah i mean it's just a struggle of time right exactly i mean i think in the end every tool vendor wants to get as much data as possible
because the more data you have obviously the more magic you can do with it but still i think every
every tool vendor has their speciality field where their AI, their ML, whatever it is, their algorithms,
just based on their historical, based on their history, they can just do certain things like,
Tracy, you can do probably great things in your tools with the information about deployments and
metadata on these deployments. We can do a lot of great things on the damage rate side with
distributed traces and root cause analysis of problems. But you're right. I mean, in the end,
we need to figure out a better way to,
to integrate these data streams to give the right data to the right tools so
that these tools can then make the right decision at the right moment in
time.
Yeah. There needs to be a, there needs to be a data stream framework.
Right. There's your next open source project. You know, the you know the other comment i wanted to make on
this is that um i love all these ideas but when i when i interact with customers in the real world
at least the areas that i'm focusing on we know that there are no quote-unquote unicorns
right um what used to be the unicorn is just someone getting there first
and people following. But I do find that there are a lot of people who, there are a lot of companies
who then let's say the unicorn turns into a horse and a lot of people start getting horses. I think
there are a lot of companies and organizations out there who buy a large dog and put a saddle on it
and call it a horse, right? And I think that's the biggest challenge because what I run into quite often
is a company where their heart's in the right spot, but they half-ass it.
Because maybe that's all the resources they have.
Maybe they have a hard time getting enough talent in there.
They can actually execute whatever reasons.
They get some of it done
and then everything when you try to get it to that next level it all starts to fall apart because
they have a really shaky foundation or not a good foundation at all and to me that's like how do
this is i guess going more on a philosophical level how do we overcome that because there's
a lot of great ideas there's a lot of really awesome things people can do when they have that
maturity model but i think a lot of people start. There's a lot of really awesome things people can do when they have that maturity model.
But I think a lot of people start their journey with a really shaky foundation.
And then from there, everything gets exponentially harder to build up.
So how do you go back and... Yeah, it's cultural.
These are cultural problems.
You know, I'm part of the DevOps Institute.
And Jane Grohl always talks about the people of DevOps.
And there is a cultural shift that we're facing.
And one of them, one of the bigger, I think, cultural shifts is upper management allowing teams to fail.
Failure has always had negative connotations to it. But really, if you
fail, you've learned how not to do something. And failing and failing fast is the best way to,
to, to move from a horse, from a dog to a horse. Because I think a lot of times we only want to,
we're, there is some, we're timid and we don't want to completely buy into a process. And so we only just try certain aspects of it.
And that's what keeps that saddle on a dog.
So when having an upper management who says, yes, we're going to have your back when you fail.
I'm a director and I'm going to make sure that you're protected because you tried something new.
You tried something innovative.
And the next time we're going to get it right
and it's going to make our lives easier,
maybe in two months from now, not today,
that is the cultural shift that has to happen.
Upper management has to have the backs of people
who are trying new technologies.
Because you're right, you can't have, you know,
you don't want to put a saddle on a dog.
I just made that up. I don't know if that's a real thing.
I don't know either, but it works.
It works really well.
But how do you get that upper management?
I'm sure you've run into this all the time as well with places you're going to.
How do you get that upper management to buy in?
And I don't mean dollars with products.
I just mean to say, yes, we are going to finally commit to this.
Because that, I think, is always the hardest part. a lot of times we talk to the people on the ground doing
this stuff and they get it right, but it's just the limitations. So how do you break through that
barrier? How do you get people to take a vaccination? True. It's, it's, I think there's a,
it's, there's, it's not ignorance. It's being ill-informed. Yeah. And I think to take the risk
or stay scared to take the risk because you're ill-informed if you have the information.
Which is why we need to start leveraging all of the data that we have because there's nothing that upper management loves more than reports.
Right?
If we can show them from reports that we can achieve greater things with newer technology if we inform them.
You know, somebody saying no is just a request for more information.
Interesting.
So how do we constantly provide upper management the information that they need so they can
make the right decision?
Now, they may be super risk averse, and they're never going to want to move to a microservice
environment.
But guess what?
The developers will do it anyway.
Yeah.
They might need another cluster.
Yes.
They'll have their own cluster
and then we have running all this really cool stuff.
And then one of the directors will say,
we need to get this to production and then it's born.
So we have to inform them.
We have to understand that they're busy with their day-to-day work
and they're not down in the weeds. So how do to inform them. We have to understand that they're busy with their day-to-day work, and they're not down in the weeds.
So how do we inform them?
Great.
Yeah.
It's a challenge.
Hey, so kind of trying to wrap this thing up here, and I actually wonder, right, initially I thought the title of this episode is going to be CDV2. Now, looking back at my notes, yes, we talked about,
obviously, it continues to live,
but we talked about a much broader topic
with a problem we really want to solve.
It's basically smart.
I mean, I think you actually called it earlier somehow,
like smarter delivery strategies
for modern microservice architectures
or something like that.
Tracy, I want to actually give it back to you.
What would you call this episode?
What was the main topic?
Well, we have covered many topics, but I do think that we are talking about how to make this continuous delivery smarter.
How do we leverage data? How do we bring all of this information together
to stop the human factor of deciding a workflow
and instead using the data to create a strategy?
You could also, I mean, I just, you know,
because we had some political things earlier,
you could say, how to make continuous delivery great again.
No, don't do that hello but it's funny yeah it is because you said how to make continuous delivery smarter and i said
okay now i don't think i don't think we got into it as deep as we wanted to but in summary though
that's the that's the the crux of of CD2.0 that you're saying,
right, is the idea of it being the strategy per microservice as opposed to a workflow.
As opposed, yes, a predefined imperative kind of workflow.
Everything goes through this flow.
We can't do that anymore.
Yeah.
And in the end, it is smart automated decisions based on data. So it's data-driven decisions. And what we've been trying to do with Captain is where we try to put SLIs and SLOs at the center of everything we do. So every time we execute an action, we validate it against the data. of also using data upfront to really put a marker, a tag on a service and say,
you are risk level two, you go here.
You're risk level five, you go here.
Exactly.
Before it ever goes out the door,
gathering that information
so that we could do some smart processing on it.
We need to be able to apply that ML to that data.
And between the monitoring data and the configuration data, we have a majority of it.
We really do.
We have quite a bit of it.
And to borrow on Andy's political thing to maybe go back before Andy's awareness of U.S. politics, we'll go back maybe 15 to 20 years ago and say we'll do some data-driven strategery.
Data-driven strategery.
For anybody who remembers the strategery one.
I do, but I honestly don't recognize US politics today.
Cool. Hey, Tracy, you mentioned a couple of projects today, Utilius and others.
Artilius, what was the name of it?
Abraham Artilius was the first mapmaker.
And I often remind people that not only was he the first mapmaker, he created the first Atlas, World Atlas.
And how did he do that?
He went around to all these cartographers and said, please give me your material.
And he assembled one big map.
He was the first open source community.
Wow. Literally. I would have thought his last name would have been map.
It's Abraham Ortelius. And so we figured it was a really befitting name because we're basically
mapping a Death Star. If you think about a cluster and all the points of light, we are creating,
you know, we're mapping that and we're mapping that before it ever goes out to that cluster.
We're saying, if you do this,
this is what your cluster is going to look like today.
Sounds like our smartscape.
We also map everything in our smartscape,
but that's for another discussion.
Cool.
We will definitely make sure, Tracy,
to get the links out there to the folks.
Is there any, knowing that this airs in 2021, early 2021, are there any big events that are coming up that people should be aware of in, let's say, the first quarter of 2021?
Well, in April, I am leading a track for the DevOps Online Summit, which is really cool.
He does it through Slack. So certainly if anybody out there
is listening and they would like to submit a talk on any of these topics, I would love to have
their feedback. Tracy at deployhub.com is where you can reach me. So if you want to speak on a Slack-driven DevOps show, it's quite fun.
I did it last year.
There's a lot of discussion because what he does is he just runs the episode in Slack,
and then everybody's talking about it afterwards.
I think it's a really great platform for doing that.
Please reach out.
And also make sure that if you have a chance, sign up for one of your coffee chats.
Yes, yes.
That's really good.
Just send me an email and I'll send you my calendar link and we can, you know, chat away.
Because I learned a lot.
I have learned so much from everybody.
And I really have to thank everybody who's taken me up on those coffee chats in 2020.
Because I have been able to really pull together a pretty clean roadmap for the Artilius
project. Awesome. Well, hopefully we can get you back on. I know there were a couple of topics we
touched upon early on. Maybe we can get you back to just dive into those more. And if there's
anything more, if we want to go in deeper on CD 2.0 or anything there, I think it'd be great to
have you back on. This was great.
Service mesh.
Yeah, service mesh, all that. Yeah, I think there's,
I can see you becoming a very recurring guest,
but we'll try to reach that.
I know you have other work to do as well.
Still a little bit, right?
Just a little.
Well, thank you.
Thank you so much, both of you.
This has been a pleasure.
Awesome. Andy, any last final words or should i wrap it up just uh let's make sure that 2021 is going to be an awesome year
and we have it all in our hands and wear a mask yeah amen to that all right thanks everyone for
listening if you have any questions or comments for and or I, you can reach us at pure underscore DT on Twitter,
or you can send us an old fashioned email at pure performance at
dynatrace.com.
And we will have all of Tracy's links in the show notes.
So please make sure to check those out.
Thanks everyone for listening and happy new year.
Bye-bye.
Bye-bye.