The Changelog: Software Development, Open Source - Chasing the 9s (Interview)
Episode Date: March 9, 2023This week Adam talks with Marcin Kurc about chasing the 9s. Marcin is the Co-founder and CEO of Nobl9 where they build tools for managing service level objectives, aka SLOs. We also talk about service... level agreements (SLAs), service level indicators (SLIs), error budgets, and monitoring, and how it all comes together to help teams align on goals, improve customer satisfaction, manage risks, increase transparency, and of course, a favorite around here...continuous improvement. Kaizen! This is an awesome deep dive into the world of chasing those 9s, and how teams are levering SLOs to earn the trust of their customers as well showcase transparency.
Transcript
Discussion (0)
this week on the change log i'm talking to marching kirk about chasing the nines
marching is the co-founder and ceo of noble nine where they build tools for managing service level
objectives also known as slos we also talk about service level agreements, SLAs, service level indicators, SLIs, error budgets, monitoring, and how it all comes together to help teams align on goals, improve customer satisfaction, manage risks, increase transparency, and of course, a favorite around here, continuous improvement, Kaizen.
Today's show is an awesome deep dive into the world of chasing those nines.
I hope you enjoy it a massive
thank you to our friends and our partners at fastly and fly our pods are fast to download
globally because fastly they are fast globally check them out at fastly.com and our friends
at fly help us put our app and our database closer users all over the world with no ops
learn more at fly.io.
This episode is brought to you by our friends at Square. Develop on the platform that sellers trust. Here's what you can do with Square. You can bridge more experiences. You can build online,
mobile, and in-person commerce experiences that connect
more customers and sellers. You can build custom booking solutions. You can create and track orders.
You can accept payments. You can manage and curate inventory. You can organize customers. You can
manage employees. You can extend Square gift cards to your app. You can use Afterpay. And all this
is powered by the world-class Square APIs and SDKs that enable you to build full-featured business apps for yourself or millions of Square sellers.
So much is available as a Square Solutions partner.
Learn more and get started at changelog.com.
Again, changelog.com. so
so so marching you're the head of a very cool acronym is is becoming more and more hot i think
slos are important but i'm not really sure everybody understands what an slo
is how often do you find yourself just simply starting a conversation
describing that acronym and how that pertains to Noble9?
Yeah, that's a really good question.
I would say when we started this company in 2019,
there were very few people understanding that acronym. And those were usually the SREs coming out of Google, Facebook,
a few other companies, right?
I would say probably within the past year and a half or so,
it feels like it's becoming more of a mainstream.
So I would say 50% of the time,
maybe more people do understand what SLOs are.
And surprisingly, a lot of those people
also understand the application, the benefits,
and all the good things coming out of SLOs.
So the market is definitely maturing, expanding,
and the conversations we're having
are definitely at the level that
we can have a conversation.
We come in without educating people
and trying to push something on that, basically.
So what is an SLO?
How do you describe it?
What is an SLO?
SLO is a service level objective.
So for us and for
most of our customers and prospects,
this is a concept that
helps them understand
and build infrastructural
applications to the level
that
allows
them to operate in a way
that customers are happy.
So you got two different extremes.
You got the extreme of, you know, building application or infrastructure that's 100% available.
And I don't want to say it's impossible.
I'm sure some people will come out and say, of course, we do that.
I don't think I want to go in that direction.
And then you have the other extreme,
which is things are constantly breaking
and customers are not happy
and leaving your application or your company
and looking for other alternatives, right?
So SLO is really about finding this sweet spot
between those two extremes
where customers are not impacted,
they're happy,
they're not looking for different options, and you're not spending tons of money on you know trying to achieve the
100% availability and i think chasing the nines is what we call it around here chasing the nines
right i mean we all want as many nines as possible but like i think they get infinitely more expensive
and also potentially impossible to some degree to chase like the six or the seven nines.
It's just really, you know, five nines tend to be what most can adequately achieve.
Would you say? What nine do you chase?
Yeah, that is pretty expensive at that point, right?
Five nines is expensive too? Okay.
Oh, yeah, it's expensive. I think, you know, three and a half, 4.95, right? Or four nines. 3.999.
It's getting to that point where it's really, really hard, right? When you start
calculating how many minutes it can be done per year, then you finally realize like, oh yeah,
there's no way. There's no way. Right. However, right now, most people are thinking about the nines in terms of SLAs, right?
And SLAs are a legal construct.
Right.
Agreement is in the word, right?
Or in the acronym is the last letter of the acronym at least.
And there's five pages of, you know, what we're excluding from calculation of the nines
and so forth, right?
SLOs, on the other hand, are not that.
It's true, real-time, very visible and transparent information
to both you, internal customers, external customers, right?
So it's definitely a different concept.
And achieving those without any exclusions or definitions
around the legal calculation
is definitely a much different concept.
You can translate SLOs into SLAs.
You can make your SLOs SLAs,
but I would question how many people out there
are already ready for that type of approach.
So measuring performance of a service,
of an entire stack, whatever it might be becomes infinitely more
important as you begin to make the agreement rigid through an SLA but SLOs allow teams to
have that flexibility I kind of think of it like an act or a analogy of like maybe a stick
of bubble gum before you chew it is kind of the SLA where it's sort of rigid, right? It will eventually become sort of mungible,
so to speak, or flexible.
And maybe the SLOs are, you know,
the chewed up bubble gum.
It's kind of like mushy
and you can kind of move it around
and it allows for imperfections.
It's not that original thing, right?
It gives you a chance to sort of have bugs
because it's going to happen, right?
Or have downtime or, you know,
times in the day even when you've got more traffic
and maybe those SLOs or maybe, I don't know, you need to measure things essentially to give that
flexibility to the system, especially to the level that software has become more and more complex.
Very large systems, large monoliths, whatever you might have, entire services, microservices,
APIs, all these things are moving parts.
Latency alone and the often offender DNS, right?
I mean, things just happen in systems that are complex.
There you go.
Yeah, this is a very important point, right?
It's not necessarily about something going down.
In many cases, things are not going down, right? You've got to slow down in delivery of services.
Something else might happen.
Latency is a fairly simple concept,
but understanding how that latency is managing your customers
to your application, it's becoming complex, right?
For example, another part of the SLO is error budget, right?
You have this difference of how much of the error budget
you can burn before it becomes an issue and you violate the S budget, right? You have this difference of how much of the error budget you can burn before it becomes an issue
and you're violating your SLO, right?
The question is, like, how fast are you burning down that budget?
If it's, you know, burning slowly,
the impact on the customer is probably not very big, right?
But when you start seeing things going down quite quickly,
then you have a problem, right?
That's when you start thinking about,
are you waking people up in the middle of the night?
Are you failing over from region to region
or infrastructure to infrastructure?
Every single one of those operations is very, very costly.
So it really helps you also understand
how you should be acting
and helps you really make those decisions in real time.
So, I mean, with all the observability
that has been around the last five So, I mean, with all the observability that has been
around the last five years, I want to say, I got to imagine that it's kind of easy or it should be
easy to measure these things, but it's not. So at Noble9, this is kind of what you do, right?
That's your mission is to make measuring these things easier. How did you, you know, find this
gap in the marketplace, so to speak, to form Noble9?
And what hole did you fill?
Yeah, so my co-founder and I, we started a company before.
It was around marketplaces and billing from old days when AWS showed up, disrupted software vendors with this crazy consumption billing and things like that.
And it's been struggling.
How do I address that need for my customer?
And how do I align with AWS and other cloud providers for that matter?
You know, to exit to Google.
We find ourselves at Google,
and day one, we start rewriting this application
to handle Google levels of traffic and consumption.
And that's how we really learned how Google operates, how Google sets goals, how Google operates on a daily basis, how they release software.
And of course, all the concepts around SRE were very, very interesting to us. But SLOs in particular, you know, we came to this conclusion that it's really, really hard
to go into microservices,
Kubernetes, and, you know,
interconnected systems,
not having SLOs to understand
all the dependencies and impact
of one service on another on the application.
And then, you know,
with this constant push within
the IT towards,
you know, more of a business-oriented,
business-driven decisions on the IT side,
to us, the SLOs are really a very simple thing
to correlate IT to business and vice versa.
And for us, that was one of the biggest things
that we figured, if we go into that world of Kubernetes
and microservices, that's going to be it.
People will realize that they need SLOs to operate efficiently.
It seems like a good negotiating tactic, too.
Like, if you've got the rigidity of the SLA, which is like, okay, it's either black or white.
It's a one or a zero, right?
It's very binary in terms of like, did you or did you not know, okay, you're in breach? In terms of just simple contract terms, whether it's internal teams contract or with a customer contract, at some point you agree on an agreement of how things will work.
But an SLO kind of gives you that, okay, well, how flexible can the system be?
How flexible can we be to still achieve your goals, customer and or internal teams or whatever it might be?
That's a point of negotiation, right?
It gives you that flexibility.
Yeah, well, it gives you a point of negotiation and flexibility,
but also gives you a better communication across the teams, right?
You wouldn't believe how many times we come to a customer or prospect,
sit down, and they keep telling us how much they love SLOs.
They've been using SLOs for a while.
And after a year or two years, they find out that their definition of SLOs within different teams are much different.
So the four nines for one team don't necessarily mean four nines.
So with the complexity of today's systems, distributed systems, it's really, really hard to even define how we're looking at certain things, right?
What is the degradation in service for me versus what is the degradation service for
you?
And of course, there are levels that are just still amazing to me, although I'm not shocked,
where people are finding out that, you know, there's this one service they take a dependency
on and, you know, it's really running on the server under somebody's desk.
I wouldn't imagine that still happens,
but it does.
So getting those people to talk to each other
and define those SLOs
so everybody in the chain understands
how they're getting affected
are just amazing, right?
And I think that the best conclusion
out of most of those conversations
is looking at the legal contract in the SLA, a lot of people realize like, well,
there's really no effort for us to offer those five nines because we have a
piece that's, you know, two nines somewhere in the chain, right?
So the collaboration are standing across organizations,
across teams is very, very important. And that's really our focus.
Okay. So we kind of know what SLOs
are. We kind of know what they are used for. We kind of understand how they help teams
effectively build and manage software and communicate and also
communicate and provide assurances to customers. How do they manifest?
Like, is it a Google Doc? Since we're talking about Google.
I guess it's pre-Noble 9, there was one way.
And maybe now, you know, with the inception of your company and how you help organize these SLOs and, you know, pay attention to the observability of, or the data from different services.
And how do you establish an SLO?
How does it look in the world that's not Noble9?
And then how does Noble9 sort of like make that a better feature for teams to sort of like aggregate them together and all that good stuff?
How does it play out?
Yeah, so you're right.
People have been doing SLOs in different ways.
You know, spreadsheets.
We still see a lot of people doing spreadsheets.
And it kind of works, right?
You know, at the end of the month, you process your data.
And the application of that type of approach is fairly limited.
But a lot of people use SLOs for planning.
So if you get this data, process that data on a monthly basis,
then that's enough for you, right?
You have a really good understanding of what happened,
what maybe you should adjust,
you know, the teams.
What we do is we process information near real time, right?
I want to say real time,
but that's kind of hard as well.
And give you insight into what's happening,
you know, when things are happening.
So we give you really,
we don't need to use those SLOs for planning,
even if you process that, you know, monthly or weekly, but also give you really the knowledge to use those SLOs for planning, even if you process that monthly or weekly,
but also give you the ability to act in certain situations
in almost real time, right?
So like I mentioned, if you have to fail over,
if you have to file a ticket or have an understanding
if there is a huge impact happening right now to your customers,
if you get a signal that something is down,
it doesn't really mean that your customers? You know, if you get a signal that something is down, does it really mean that your customers
are getting affected, right?
It's, hey, the disk is down, right?
Or is not responding.
What does it really mean?
Are your customers impacted or not?
So our focus from that perspective
is really giving teams the ability to understand
if there's something that they have to do right now,
if there's something that's really affecting their customers
and they have to wake up teams across the globe
or failover application or roll back the code
that just pushed into production yesterday.
So for us, that is key.
And for most of our customers,
they might start with simple things like using SLOs for planning, but they really quickly ramp up to use SLOs on a daily, hourly basis.
This is their goal to take a look and understand how their customers are being impacted and how they should be responding to any given situation.
How much does this overlap with incident management or just incidents at large? Like SLOs sort of like are an indicator, but they're not necessarily an incident. So there's like lot of players in that market and they have fairly similar
but also different ways
that they allow you to manage those incidents.
And that's all about bringing people together
and start looking at things
and maybe deploying templates or things.
For us, it's really determining
if there is an incident
or there should be an incident declared, right?
So it all has to do with the error budget,
how much you're burning.
In many situations, things happen,
but we allow you to, for example,
open a ticket in Jira,
so somebody can take a look at it at some point,
you know, with a different level of severity.
It doesn't have to be an incident.
And if your SLO,
based on the SLO configuration
and the burn down of your error budget, we determine that there's incident, we integrate incident response systems, right?
We'll open the incident and let you deal with that incident within that particular system.
I asked that question because I was like, you know, I'm looking at your integration, something.
Okay, well, if Noble9 lets me, you know know pay attention to and define my slos and this
is like the agreement basically to the team this is where we define it there's a flow to define
as you said your error budget you know kind of figure out where you're kind of pulling your data
from what your data sources are i gotta imagine at some point like the next step might be an incident
but one of your integrations or is not an incident manager by any means.
It's data sources.
You've got events and alerts, which may trigger.
I suppose maybe you're throwing data into, say, Discord or Slack,
and that triggers something else.
But I didn't see the integration for the incident management part of it.
And then you've got data exports, which is like,
hey, how can we take this data with us and take it into a meeting or analyze it differently
or munch it somewhere else?
So we do integrate with incident management systems.
PagerDuty is one of them.
ServiceNow is one of them.
We've also done work with webhooks
and push data to other systems out there,
FireHydra and a few others.
But we try not to be in that space.
It is a completely different space. You deal with those
incidents in a very specific
way. We don't want to play in that space.
For us, it's really focused on
determining and
understanding where, based on
the configuration of SLO, of course,
when we should declare that
incident. And that's our input into incident management systems configuration of SLO, of course, when we should declare that incident, right?
And that's our input into incident management systems or paging systems out there, for example.
Gotcha.
I got to imagine that if you're using FireHardient or Incident or somebody else that's out in
that space, and I'm familiar with those two because we've worked with them before, that
pager duty might just trigger something in the incident management flows.
Like say something happened, you know, this may trigger it.
So I was just kind of curious because, I mean, like it's one thing to define
and sort of track, but then something's got to happen, right?
And maybe it's not an incident.
Like you said, maybe it's just, you know, outside of our normal range of our error budget.
You know, it's just a percent or two beyond where we want it to be.
And somebody just needs to put some eyeballs on it
and it's not really an incident.
But then in some cases, it might literally be downtime
or way beyond the threshold
and it's a more actionable thing,
which if you really mince the incident management
or the incident word,
some folks in that world will say,
well, most things, if not all things, are incidents
and we should track them because you've got to organize around it. And so it really becomes
an orchestration of who should be involved in checking this out.
Was it resolved? Not a catastrophic incident. Like small incidents
are still incidents, basically.
Tracking things, yes, of course. But we also integrate with Jira ServiceNow,
as I mentioned.
So opening a ticket
for someone to look at it at some point
with specific severity is
one thing, but declaring an
incident is, to us,
it's a completely different concept.
Somebody needs to declare that incident
because something happened at a certain level
with a certain severity, and
our customers are impacted beyond the point
that we believe is what they should experience, right?
So it's like calling, you know, fire department.
I have an issue, but they might respond over the phone
and tell you, deal with this in that way, whatever it might be,
and they get tons of calls like that, right?
You got an extinguisher. Take care of yourself.
Exactly.
Or, you know, that might be somewhere on this fine line.
Or should we go for that, right?
But people call fire department with all kinds of crazy things.
And a lot of those things are being handled on the phone, right?
And it's some kind of advice.
But when they fire up the engine,
that's where the incident is declared, right?
And they operate within a completely different concept
and framework and, you know, show up and work on that.
So to us, those are the things where, like, yes,
you know, you want to call it incident,
but you're not responding immediately, then that's fine.
It's still being tracked in JIRA or ServiceNow or whatever it might be,
and there's record of that, that people look into it.
And this kind of operates within that whole SRE concept.
SREs are there to make systems better, right?
So you know something happened,
you found out that there's ability or opportunity for optimization.
And then you go and figure out when you can prioritize those things,
when you're going to do one thing versus the other, because there's always not one thing, right? There are multiple different things that you have to address. So that's kind of how we deal with this. And there are a lot of those opportunities for optimizations, changes, fixes, but they're not necessarily ready to be done right now and getting the entire team just to shift their direction to work on that.
Take me a little further into this world before Noble 9.
It seems like if you, I mean, how were people doing this beforehand?
It mentioned spreadsheets.
Was it just that simple in most cases?
Were there any other systems built around this?
Do you have customers?
I know that you had an acquisition of Google and you sort of learned and did these things as part of that. Like, were there any other systems built around this? Like, do you have customers, you know,
I know that you had an acquisition of Google and you sort of learned and did these things as part of that.
But, you know, what was the world like
before you sort of organized it better?
It's a really good question.
I don't really, from my experience with prospects or customers,
I don't go before spreadsheets.
My question is, was there anything in there, really?
Right.
Something happened, especially if you have a monolithic application.
Well, now we know something is not working,
and we have to go and figure out how we're going to manage this.
Detecting issues like that within a monolithic application
is much easier, right?
A lot of large enterprise customers
just began their journey to the cloud,
right? So they had full control over their
systems and, you know, everything is
running on this big, you know,
sunfire system or whatever it
might be. You know, you
approach those things in a different way.
It's kind of like, you know, when we
VMware showed up many, many, many years
ago, right? They changed
how enterprises operated. They changed how enterprises, you know, accounted for many, many years ago, right? They changed how enterprises operated.
They changed how enterprises, you know,
accounted for systems, managed the systems,
alerted on systems.
And I think right now with microservices,
Kubernetes, and all the little pieces coming into play,
I think that's, you know, exponentially bigger issue
than what we saw with VMware, right, coming into play.
So we're just at the beginning of the evolution
going from what we know,
how we manage our systems, into something
completely different. And I think one of the biggest
elements of this play is
taking dependencies
on complete external systems
where we have absolutely zero understanding
how they operate, right?
It's, you know, a lot
of organizations out there are using Okta,
for example, right? A lot of organizations
are using similar systems like that, maybe
databases. They have
no way to see how things operate.
So we actually have a
lot of customers or
prospects coming
to us telling us that they need to
implement something because their customers don't really trust them,
how they define the SLAs.
They are asking questions like, okay, great,
but how you architected your application.
So that gives me a little bit of assurance
that you built in the right way,
and I can expect that your systems can operate.
Because maybe your SLAs, you know, five nines. That's great, and I'm going that your system is going to operate. Because maybe your SLA
is, you know,
five nines.
That's great.
Then I'm going to spend
a year integrating,
doing things.
And then it's, you know,
it kind of starts
going down every week.
That's a big issue.
Your customers
think that's your problem
and you have a dependency
on this outside system
that you can't really influence
and you don't know
how it's operating.
Just kind of think about it
in very similar terms
as what happened to security many years
ago, right?
10, 20 years ago.
We used to go on websites and buy things because it had this little logo that says, you know,
trust me, I'm super secure.
Just do it, right?
Well, there were many different things that we had and people used to do that, right?
And now you cannot, nobody's going to do business with you unless
you adhere to certain frameworks
and certain certifications and so forth.
And, you know, from a reliability
perspective, we're really getting
close to a very similar approach.
Tell me how you architected your systems,
how I can trust you that you did the right
thing. You build this on AWS, that's great,
but, like, is it multi-region?
Is it, you know,
I need to see some data
that really gives me
a good idea
or comfort
that I can make
a big investment
because, you know,
enterprise is not
going to go there
and say,
oh, three months later
we can just switch
all of our systems
to something different.
That doesn't happen, right?
So it's a big,
big investment.
So with the introductions
of SLOs or just maybe better orchestration and formation of them and monitoring them, how does that world change then?
So you can take on, say, maybe a loose cannon, so to speak, or just something that's less reliable and you have just better thresholds on that?
You have better observability of the actual performance of that for you within certain ranges?
It's all about transparency.
So we have a few customers that use SLOs and they expose those SLOs to their customers to sell them a higher availability system or higher assurance reliability system, right? If you pay
X, you're getting this shared
system that everybody's using and, you know,
it's been great, but we give you the accesses from
that perspective. You're getting your SLAs.
However,
they have a higher level
service that costs more,
but they provide a very transparent
SLO so the customer can actually see
if they're performing to the SLOs that they define.
And some of them even go to the point
where they will do SLOs per customer.
As you can imagine, that's a more expensive thing, of course.
But they will custom tailor that system
to provide the performance that the customer is asking for and they very transparently provide you with the data to back it up.
Yeah. This is interesting what you're talking about because this is a
sales tactic essentially. It's a value add in this case.
Having two different tiers. Here's the one that
has better objectives, maybe better assurances, etc.
Or just something that we're paying attention to more and therefore it costs more.
But here's the one that's sort of the on-ramp for, you know, the lower level customers are still amazing customers.
It's just this is when we, you know, we give less nines to, we give less assurances to.
And it's cheaper because it gets you in the door, it gets you using a product or whatever it might be. And then you determine if it's viable and if you actually need high
assurances, high availability, et cetera, well,
then you naturally graduate.
And of course you pay more because that's great assurances to have.
I love that.
Yeah.
How many people know about this?
I mean, are people doing that a lot with, with different plans?
Like, can you go to X, Y, Z service provider?
And you're seeing that more and more people communicating these,
these SLOs. we got a few customers i would probably say about somewhere between 10 and 20
percent of our customers are either there they implemented that type of offering or they're
working on it so it's starting can you share any names or speak behind? Unfortunately, I can't.
No customer's names?
Sorry.
Well, you know, it's a new concept.
And yeah, we're working with them to help them build that out.
But I would say that those were their concepts, their ideas.
They got inquiries from their customers to provide that type of service.
Well, I'll tell you one name and you don't have to say anything.
I'll say it because it's on your website.
I'm so glad they're your customer.
If this is true, it's Ticketmaster because I can't get my T-Swift tickets.
I can't get my other tickets.
I need to get these tickets, Ticketmaster.
Come on, SLO.
Anyways, I can imagine that's got to be somewhere in there.
Well, I've heard that the ticket sale went much better than some other ticket sales.
Yeah.
Oh, is that right?
Well, maybe that's a good thing.
I didn't hear any news about this, so maybe it went better, but yeah.
Right.
Gosh, the world would be on fire if you couldn't get your Beyonce tickets.
Oh, yes.
Oh, yeah.
I just bought some Jerry Seinfeld tickets here in Austin via Ticketmaster.
Had no problem, thankfully.
Jerry Seinfeld is a little less popular than, say, Taylor Swift or Beyonce, but still cool.
Still cool.
I agree.
I missed his performance in Santa Barbara a month ago or so.
You know, when I did some initial research on this, I like to go to a couple of different sources.
One that sort of is an easy button, but not everybody goes there for their first search. And it's YouTube.
And the reason why I go there is because I'm a premium YouTube user.
I cannot stand advertisements on YouTube.
They're just terrible.
I don't mind good ads.
I hate bad ads.
Yeah.
But I go to YouTube and I search SLOs and I start to get educated on SLOs and who's
using them, who's talking about them and whatnot.
And it's mostly Google and then you.
Right?
Okay.
So like the results were Google, Google, Google, and then Noble 9.
And I think it was a 90-second video.
It was like SLOs in 90 seconds.
So one, I would optimize more for maybe improving that video
or doing a follow-up that's better because the audio quality wasn't super amazing.
But you did commit to your objective. There you go, which was 90 seconds. So congratulations
on that. But I mean, it feels like this is an enterprise problem coming down to everyday
applications. Would you agree with that? Like where's the maturity with SLOs? They're becoming
more known. You're about a year or so into this more well-known space. But what's the maturity level of teams truly leveraging SLOs to their advantage?
So first of all, interesting, I got to go do the YouTube search because that's definitely not something that we see in real life.
I think the situation is that, you know, Google definitely has been pushing the concepts for a long time and they have teams that just focus on that 100%.
But within engagements that we have outside,
there are a couple other companies that focus on SLOs,
but every single monitoring company
or observability company out there
has got some kind of solution
or something to say about SLOs.
And that's really like the real life situation for us.
Data Trace, Data Dog, Neuralik.
I mean, everybody else, right?
Just name them.
So the real life, I guess, it's a little different than YouTube.
And then maturity, where is it?
You know, it's our point of view.
Like we haven't really done like a huge market, you know, research.
And we've had conversations with a number of
analysts and they of course
agree that the market is
maturing. People understand
how SLOs help them run their business
on the base of our
customers. You mentioned
one or two. Their
SLOs are becoming the
core of the operation.
I would say that way.
One of our customers called it tier zero of observability
that helps them really bring it all together,
allows them to see different teams and different operations
at the same level, right?
It's the same reference point, I would say.
So you don't have this issue where you have four nines that are completely
differently defined versus three nines and so forth.
And you really get a good idea of, you know, how things are performing,
where you take dependencies, what they can offer.
And then finally, a lot of customers, I would say probably every single one
of our customers is using SLS for planning.
And sometimes it's as simple as, you simple as if somebody shows up and says, I need another $5 million to spend
on AWS. The question is like, why? Well, we're running out of capacity.
And that's usually where the conversation ends, right? Now, SLOs really
enable you to provide a better insight into
what needs to happen.
Do we have an issue with capacity on the cloud provider?
Do we have an issue with our application hitting limits?
Do we have an issue of this monolith that cannot scale anymore?
And we have to figure out how we really transition to something different.
It really helps people to understand how the teams are performing too.
You're sometimes pushing out features
because everybody gets promoted on features,
not on maintenance.
And you start seeing degradation of your service,
degradation of your customer experience.
So you need to start thinking about
how we pull back, when do we pull back,
how much do we pull back.
We want to stay competitive,
but we don't want to get our system to break every hour, right?
So a lot of those concepts, like the more people are using SLOs,
the more mature they get with it very, very quickly.
Is this kind of where your service health dashboard comes into play, where you can sort of see at a glance
what you have sort of tracked, I suppose, within
Noble 9, but you have them sort of organized and they're color coded.
Well, this one's green and this one's red.
I'm assuming maybe there's a yellow or potentially,
but it's something like where this is like sort of in a degraded state
and it's not quite red, but it's getting close to red.
Like, is that where something like this comes into play
where you can sort of see at a glance where things are playing?
Yeah, for the organizations that are looking across, definitely.
It's one of those things
that gives them a very quick
idea of what's happening and they can drill down.
And sometimes for
teams, if they operate multiple
different services or they monitor
multiple different
inputs into their SLAs,
that becomes also very
interesting and very needed.
But like any dashboard of this type, you know,
it's a quick view of what's going on and how we can quickly get to the root
of the problem, for example.
Interesting.
Okay, so reactionary, of course,
because you've got integrations to PagerDuty so you can fire off incidents.
But then planning, I've got to imagine, is a big one.
Like you had said before, if you want to expand your spend with AWS or GCP
or what have you, and you don't have any data besides, you know,
we just need it.
Like this sort of fills that gap of like, okay, why do you need it?
More data is always good.
What is your plan then with Noble 9?
What is the big dream, so to speak?
It seems like your early innings, and this is, I don't want to say what you the, the big dream, so to speak? It seems like your early innings and this is,
I don't want to say what you build is not amazing,
but it seems pretty simple, right?
Track some objectives, establish some communication with your team,
give yourself a dashboard and then integrate with, you know,
the necessary players in the field,
whether it's data dog or pager duty or, you know,
the different data warehouses and whatnot.
What's next?
What's the next big thing for you all?
First of all, I would say that, yeah, most good software is simple, right?
That's the whole point.
For sure.
It's solving a complex problem.
That's what I was trying to caveat with.
This is not a negative simple.
It seems pretty straightforward.
This is, you know, you kind of got into the easy button for the most part.
That's what I'm trying to say.
No, of course, of course.
So that was a huge focus for us because dealing with those problems, it's not easy.
You know, finding a reference point for multiple different data sources, right?
Everybody's doing things in a different way.
And then customers store a lot of data and databases, right? Just pulling all that
information together
and allowing people
to have it in a simple view
is super complex, right?
And a lot of people
have already tried.
A lot of people failed
and a lot of them
are on version 2, 3,
and maybe 4.
So for us, you know,
yes, this is the beginnings.
I feel like we built
a very strong
base platform and
now we have at least
two years of roadmap
to build features that
help you
consume information easier, help
you share information easier, collaborate on the
platform, mostly
focus on that.
I think the big dream and, you know, pushing it in the direction of business data, right?
The whole concept of IT operates against business goals.
How do we start bringing those information together and, you know, helping people on
both sides understand the inputs and outputs much better, right?
So you have the business people like, all right, why do we and outputs much better, right? So we have the business people like,
all right, why do we just lose our margins, right?
Because we're spending $20 million more on infrastructure.
That just happened because we needed capacity, right?
And on the IT side, of course, you know,
what are our goals in terms of, you know,
customer growth, customer satisfaction, migration?
You know, that's a big thing for us.
People migrating from on-prem to cloud, as I mentioned,
they have a full understanding of what they have
versus a very small part of what they can understand
and change and configure.
So migrating with this reference point of where you are today,
it's a big issue.
You've probably heard a lot of stories of like,
oh, we migrated to cloud
saving no money as a matter of fact we're spending more money our applications are not performing
better we have more issues blah blah blah you know that's that's standard list of things right
so now have a better understanding of where you are how you're going to measure those things
because maybe sometimes you just don't see the benefit right or maybe sometimes somebody did
things in the wrong way,
configured it incorrectly, and now
you feel like all your
two years of work of migrating applications
went nowhere. You're in the worst
situation. So there's a lot of that
happening for us as well.
So let's paint a picture then. So imagine somebody's
listening to this and they're like, you know what, okay.
We've done SLOs in the
spreadsheet way. We've tracked them to some degree behind the scenes. We've been,
you know, willy-nilly about it. We've done some things, but not to the level that this
would do. What does it take to get started? Like, what is the initial
conversation? Is it a conversation with the team? Okay, these are the services we have. This is the data we
want to track. This is how we want to measure things. And how does that manifest into
actually having SLOs in place?
What's the time frame from
I want to do it to you've got it in production
to actually have an objective?
So this is a great situation for us.
No question. You are doing SLOs
whatever way you already sold an idea
on the concept. Your teams
are in some way bought into
this thing or maybe forced to do this.
You never know, right?
So you already are looking at certain inputs.
You have those defined.
We can very, very easily,
probably within a day or two,
configure you to be at the same point
where you are with your spreadsheets.
And then we have a number of tools
that help you build, configure SLOs in a very quick way.
So at AWS reInvent, we introduced Replay
that allows you to bring data from all your systems
for the past 90 days, 100 days, or a year,
and then look at that data so you can start to understand what SLOs would make sense.
And now we just released this thing called Analyzer
that can use that data and suggest SLOs to you.
Interesting.
So you can also, with the combination of Replay and Analyzer,
you can set this SLO and with Replay,
you can go back to your events,
like you had an outage three months ago.
You can look how your SLO would be affected
and how your error budget would be burned
so it gives you a good idea of how you should be acting.
And of course, you can keep tuning those SLOs,
but we give you a number of tools, like I said,
that are going to allow you to get operational
within a week, I would say.
But I think the biggest part that we bring to the table
that's been very successful for most of our customers
is SLOs as code.
A lot of people are struggling with bringing in another thing,
another concept.
With SLOs as code, you basically can get your teams
or your developers to only deploy code with SLOs defined, right?
So you don't have an SLO on this specific thing.
The code is not getting checked in.
We're not pushing it out.
And that really helps all the organizations
to have some kind of standard of like,
okay, at least we have SLOs.
And then, you know, the tools I mentioned,
you can play with them, you can tune them up,
get to the point where, you know,
it really benefits all the organizations.
And I think that, you know, with a few teams, 90 days,
it's most likely enough time to get it really tuned up
and set up for the organizations.
So in some cases, it may be, or most cases,
if you don't really have an idea of how to implement SLOs or where you might go,
essentially use past data to predict to some degree with your analyzer and whatnot.
Yeah.
Is there a scenario where, I'm sure you have great content out there,
and you've obviously got that 90-second YouTube that I mentioned, which is phenomenal as an on-ramp.
You should definitely revisit that. Are you finding that while you also have a service, you also
have to educate, have a consultant, so to speak? Do you have
sales folks? I know there's some things where it's like,
you know what? I would use it if you demystified how I use it.
Do you find that? What's the uphill battle here for
SLOs? Should they make sense?
But, like, getting people to buy into it, like, what is the selling point here?
Yeah, so quite often we are in those situations.
Not as much now.
As I said, the market is more mature, but we run into those things.
But, you know, there's this lead that's been hired into the organization,
either to create an SRE organization or implement SLOs or, in general, work on a strategy for observability.
And they fully understand the benefits.
But, of course, they have a number of teams that always have an excuse
and different arguments.
What we're doing is great, and there's no need.
We all been there, right?
So for those situations, we do bootcamps.
And those bootcamps could be anywhere from four hours to three days or even five days.
We can go through full training exercises.
Like if we do the three-day, I believe, at the end of the whole bootcamp,
you're coming out with your SLAs and, I mean, your SLIs and SLOs defined,
implemented in a system, and you can start rolling from that perspective.
If you need more with organizational adjustment changes, whatnot,
we have a number of consulting partners,
anywhere from boutique organizations to Accenture, Cognizant,
that we've been working with.
So we can tailor an approach for an organization from anywhere.
Hands-on, we send our SREs there.
They help you out.
They figure it out for you.
All the way to, you know, for organization,
onboarding and personal adjustments as well.
You mentioned a new acronym there, SLI.
What does that mean?
How does that play into SLOs?
That's the input.
You need your SLIs.
Service level indicators, right?
You pick those first.
So those are the things that you want to use
as signals for your SLOs.
Okay.
It could be, you know,
latency a couple of times, that's easy.
You could be looking at, you know,
number of logins or failed logins
or things like that,
that, you know, then you input into creating your SLO and
you build your SLO based on the inputs.
Gotcha.
How would you rate where you're at today in terms of, you know, market and product and
things like that?
Like, what are some things that you've done well and some things that you may have not
done so well?
Like, how would you rate yourself?
Like, if you were a scale of zero to 10,
zero being absolutely terrible,
go home, stop, to 10,
you're knocking out of the park, keep going,
more funding, go, go, go.
Well, I think from a product perspective,
with our first company, we made a lot of mistakes.
A lot of them.
I think my rating would be definitely under five
in the first place.
Good honesty. I like it. I like the rating would be definitely under five in the first place.
Good honesty. I like it.
I like the honesty.
Yeah, so we had issues.
And of course, that was also part of the reason why we started rewriting the system of Google the day we showed up, right?
We knew it.
We told them they knew it.
It was a whole concept.
But we learned a lot from that, right?
We also learned a lot from working within Google product organizations.
So I think, you know, from a building perspective,
from an architecture perspective, performance,
I think the product is somewhere around seven.
When it comes to market, you know, this market's been changing a lot.
And quite frankly, I know everybody experienced this.
You know, we started in 2019.
Then, of course, we had a pandemic a few months later.
Then other things happened, right?
So I would say we had a good idea.
We had a good idea, and we hope the market's going to develop in a certain way.
But, of course, we made some missteps in terms of, you know, who we market to, how we message things.
But that's kind of standard when it comes to a small organization.
So we're constantly evolving there.
We're somewhere around six,
I would say, on message.
We just had this conversation yesterday,
so we're adjusting the message,
getting better in who we market to.
But overall, like I said,
I feel very confident with the product itself.
I really focused on another thing that we didn't do when we worked with the previous company. I really focused on, you know, another thing that we didn't do
when we worked with the previous company.
We had remote teams,
we had teams in different countries,
and, you know, there was,
I don't think I put enough focus on culture,
which is very, very important to me.
And, you know, this time,
that was a huge, huge thing to focus on from day one.
And I think on culture,
we're actually probably the highest.
I would rate us an eight of the culture. So given all
those components, I think we're in a really good position to drive
to be one of the top players in this space. Well, that's good because I think messaging
is probably the one where everyone is always improving
for sure. I think if you have culture in place or at least a good intention
for culture, you've got a good foundation and therefore not so much easy, but it's easier with good culture
and good team and good morale, et cetera, to build the right product. And then, you know,
messaging is always sort of trailing, right? Like if the product is moving and especially being,
you know, like a new category, so to speak, in terms of SLOs, you know, I think it makes sense
why you're messaging is a low-loft
because you're probably still learning
who specifically is your customer.
Because SLOs affect everybody, but not everybody buys them.
Yeah, and our customers invent ways to use SLOs too.
So that's interesting.
There are a lot of very interesting use cases
that really come from our customers.
So that plays into how we message as well.
That's right.
When you piqued my interest with, you know,
leveraging SLOs as a product thing, you know,
like how do you, you know, have product tiers?
And that's really a chief revenue officer's opportunity potential.
I mean, so how do you market to a CRO, for example, like with SLOs?
Well, hey, adopt SLOs and, you know,
maybe you have healthier teams or
a healthier product if you have tiers, one that's more expensive and more premium, and you can
quantify the sell, for lack of better terms, with an SLO, right? I mean, that to me simplifies
things. So your customer there is like product owners, chief revenue officers, potentially
marketers, you know? So you're not really, you know,
you're not really marketing to say,
director of engineering in that case.
He probably cares a lot about SLOs.
You know, that's where we started, of course, right?
And that they cared, as you said,
but definitely things are expanding beyond that.
And that was our hope from the beginning.
Like I said, a lot of those things that happened
in the board in the past three years,
reshaped many things in this business.
So we're trying to adjust as quickly as we can.
What is it that keeps you up at night?
Do you get good sleep?
What are some of your healthy practices, you know, in terms of just like life?
Do you let things like your, you know, in quotes, your day job, your baby, your company keep you up at night?
Are there things that do keep you up at night?
And if so, what are they?
Do I look like I sleep well?
I don't know.
Maybe, maybe you do, maybe you don't.
I don't know.
Well, there's always something, right?
I think the one thing I learned with talking,
it's been the fact that there are certain things
that can affect certain things you cannot affect, right?
So if I wake up in the middle of the night,
it's usually with some idea to
think through i just had this revelation and i tried to solve this problem like yes i don't
think that fear plays a role at all uh there's always this oh let me take a step back and think
about it because i don't know if we're going in the right direction and there's always a little
bit of fear from that perspective but i think it's more of a healthy fear, right?
Check yourself if you're doing the right thing.
But part of the reason I like being a startup
is the fact that, I mean,
there's no shortage of issues
that you have to solve on a daily basis.
And that's what excites me.
I like that.
And you have a great team
that thinks in a very, very similar way.
So yeah, we love doing those things.
We love building a company.
That's where the fun is.
Even if sometimes we have a bad day and you have to check yourself and take a break, go for a walk, whatever you might do. But like I said, in general, the team is really, really good
and supporting each other, liking the same things,
driving the same direction.
That's the most important thing.
I know I can fall back on certain people in the organization.
Good. That's good for you.
How much can you share about the horizon?
You know, what's just over the horizon or right at it?
Like maybe something that not many people know about
around Noble9 or SLOs
or the next big thing.
What can you share about the future?
You know, I mentioned a couple of things
for us pushing and focusing more
on the business aspects,
relationship between business and IT,
making SLOs easier to use.
I know we push a number of tools
to help customers do that,
but that's one of the biggest teams for us.
And yeah, you know, a few partnerships out there
that I think are going to be very impactful.
I'm super excited about those.
Huge investment, of course.
But those are the next 12 months for sure.
Gotcha.
All right, anything else left unsaid?
What did I not ask you that you're like,
man, how do we miss this?
Is there things that I just totally gapped?
I don't know.
I really like the questions.
You know,
you did amazing research.
I am really surprised.
Great.
Nice.
Yeah, really like that.
Like the questions.
I was just going through it.
You know,
we talked about
where we are.
How market gets impacted
by SLOs,
how people are using them.
I can't really think of anything else that we missed.
Well, it's been fun having you here.
Thank you so much for your time today.
Appreciate the wild adventure into SLOs and all the ways they can be used.
It's so cool.
Big fan of the impact to teams and organizations, leveraging them the right ways.
And good to see you and Noble9 really doing it right.
So appreciate the time.
Thank you very much.
It's an honor to be here.
I appreciate the conversations.
They're very, very good.
Awesome.
Thank you again.
Thank you.
Okay, SLOs, is your team using them?
Are you using them?
If so, how are you using them?
What benefits do you get from using them? We want to
hear from you. Give us a shout in the comments. The link is in the show notes. Again, a massive
thank you to our friends at Fastly, Fly, and also TypeSense. And of course, the banging beats master
himself, Breakmaster Cylinder. Those beats, they're banging. And of course, to you, thank you for
listening. No bonus today, but still, we encourage you to become a Plus Plus subscriber. That's where you get the extended episodes, the bonus content, the deeper dives, the closer to the metal, the skip the ads section of the Change Law Podcast universe. Check it out at changelaw.com slash plus plus. But hey, that's it. The show is done. Thank you. Game on.