Screaming in the Cloud - OpsLevel and The Need for a Developer Portal with Kenneth Rose

Episode Date: June 15, 2023

Kenneth Rose, CTO at OpsLevel, joins Corey on Screaming in the Cloud to discuss how OpsLevel is helping developer teams to scale effectively. Kenneth reveals what a developer portal is, how h...e thinks about the functionality of a developer portal, and the problems a developer portal solves for large developer teams. Corey and Kenneth discuss how to drive adoption of a developer portal, and Kenneth explains why it’s so necessary to have executive buy-in throughout that process. Kenneth also discusses how using their own portal internally along with seeking out customer feedback has allowed OpsLevel to make impactful innovations. About KenKenneth (Ken) Rose is the CTO and Co-Founder of OpsLevel. Ken has spent over 15 years scaling engineering teams as an early engineer at PagerDuty and Shopify. Having in-the-trenches experience has allowed Ken a unique perspective on how some of the best teams are built and scaled and lends this viewpoint to building products for OpsLevel, a service ownership platform built to turn chaos into consistency for engineering leaders.Links Referenced:OpsLevel: https://www.opslevel.com/LinkedIn: https://www.linkedin.com/company/opslevel/Twitter: https://twitter.com/OpsLevelHQ

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. Welcome to Screaming in the Cloud. I'm Corey Quinn.
Starting point is 00:00:34 About, oh, I don't know, two years ago and change, I wound up writing a blog post titled Developer Portals Are an Anti-Pattern, and I haven't really spent a lot of time thinking about them since. This promoted guest episode is brought to us by our friends at Ops Level, and they have sent their CTO and co-founder, Ken Rose,
Starting point is 00:00:56 presumably in an attempt to change my perspective on these things. Let's find out. Ken, thank you for agreeing to, well, run the gauntlet, for lack of a better term. Hey, Corey, thanks again for having me. And I've heard, you know, heard and listened to your show a bunch and really excited to be here today. Let's begin with defining our terms. I'm curious to know what a developer portal is. What would you say a developer portal means to you, like it's a college entrance essay? Right, definitely. So really, a developer portal means to you like it's a college entrance essay. Right, definitely. So really,
Starting point is 00:01:26 a developer portal is this consolidated place for developers to come to, especially in large organizations, to be able to get their jobs done more easily, right? A large challenge that developers have in large organizations. There's just a lot to do and a lot to take care of. So a developer portal is a place for developers to be able to better own, manage, and run the services that they're responsible for that run in production. And they can do that through easy access to self-service tooling.
Starting point is 00:01:52 I guess on some level, this turns into one of those alignment charts of what is a database and how prescriptive you want to be. It's like, well, as a senior engineer, a database, because you can query them and they have information. Would you consider, for example, would Kubernetes be a developer platform and or would the AWS console? Yeah, that's actually an interesting question, right? So I think there's
Starting point is 00:02:14 actually two, we're going to get really niggly here. There's developer platform and developer portal, right? And the word portal for me is something that sits above a developer platform. I don't know if you remember like the late nineties, early 2000s, likeals were all the rage. Yahoo and all the Visties were like search portals that were trying to, at the time, consolidate all this information on a much smaller internet to make it easy to access. A developer portal is sort of the same thing, but custom built for developers and trying to consolidate a lot of the tooling that exists. Now, in terms of the AWS console, yeah, maybe. It has a suite of tools and a suite of offerings. It doesn't do a lot on the, well, how do I quickly find out what's running in production and who is responsible for it?
Starting point is 00:02:50 I don't know, I'm on this AWS chip, like the, you know, 300th new offering in the last week that I haven't, you know, kept on top of. But, you know, there's definitely some spectrum in terms of what goes into developer portal. For me, there's kind of three main things you need. You do need some kind of a catalog, like what's out there, who owns it. You need some kind of a way to measure like how good are those services, how well built are they? And then you need some access to self-service tooling. And that last part is where like the Kubernetes or AWS could be, you know, sort of a dev portal as well. My experience with developer portals, there was a time when I loved it. RightScale was what I used at some depth back in, I want to say, 2010, 2011, because the EC2 console was clearly not built or designed by anyone who had not built EC2
Starting point is 00:03:33 themselves with their bare hands and sweat of their brow. And in time, the EC2 console got better, where it wasn't written in hieroglyphics, as best we could tell, and it became click button to launch instance. And RightScale really didn as best we could tell, and it became click-button-to-launch-instance. And Ridescale really didn't have a second act, and they wound up getting acquired by our friends over at Flexera years later, and I haven't seen their developer portal in at least eight years as a direct result of this. So the problem, at least when I was viewing it purely in the context of AWS services, it feels like you are competing against AWS iterating forward on developer experience,
Starting point is 00:04:11 which they iterate slowly sometimes and unevenly across their breadth of services. But it does feel like, at some level, by building an internal portal, you are, first, trying to out-innovate AWS in some ways, and two, you are inherently making the trade-off of not using recent features and enhancements that have not themselves been incorporated into the portal. That's where, I guess, the start, the genesis of my opposition to the developer portal approach comes from. Is that philosophy valid these days? Not as much, because I can see an argument for it shifting. Yeah, I think it's slightly different. I think of a developer portal as, again, it's something that sort of sits on top of AWS or Google Cloud or whatever cloud provider you use, right?
Starting point is 00:04:54 You give the example, for example, with RightScale and EC2. So provisioning instances is one part of the activity you have to do as a developer. Now, in most modern organizations, you have like your product developers that ship features. They don't actually care about provisioning instances themselves. There are another group called the platform engineers or platform group that are responsible for building automation and tooling to help spin up instances and create CI, CD pipelines and get everything you need set up. And they might use AWS under the covers to do that. But the automation built on top and making that accessible to developers, that's really what a developer portal can provide. In addition, it also provides links to operational tooling that you need, technical documentation.
Starting point is 00:05:31 It's everything you need as a developer to do your job in one place. And though AWS builds itself as that, I think of them as more, they have a lot of platform offerings, right? They have a lot of infra offerings, but they still haven't been able to, I think, customize that. Unless you're an organization that builds, that has kind of gone in all on AWS and doesn't build any of your own tooling, that's where a developer portal helps. It really helps by consolidating all that information in one place, by making that information discoverable for every developer. So they have less, less cognitive load, right? We've asked developers to kind of do too much. We don't, we've asked to shift left and well, how do we, how do we make that information more accessible? Regarding the point of, you know, AWS adds new features or new capabilities all the time. And like, well, you have this dev portal,
Starting point is 00:06:14 that's sort of your interface for how to get things done. Like, how do you use those? Dev portal doesn't stop you from doing that, right? So my mental model is if I'm a developer and I want to spin up a new service, I can just press a button inside of my dev portal in my company and do that. And I have a service that is built according to the latest standards. It has a CICD pipeline. It already has, you know, it's registered in PagerD, it's registered in Datadog, it has all the various bits. And if then there's something else that I want to do that isn't really on the golden path, because maybe this is some new service or some experiment, nothing stops us from doing that. Like you still can use all those tools from AWS, you know, kind of raw. And if those Like you still can use all those tools from AWS,
Starting point is 00:06:45 you know, kind of raw. And if those prove to be valuable for the rest of the organization, great. They can make their way into the debt portal. They can actually become a source of leverage. But if they're not, then they can also just sit there on the vine. Like not everything that AWS ever produces
Starting point is 00:06:56 will be used by every company. Many years ago, I got a Cisco pair of certifications because recession was hitting and I needed to do better at networking. And taking those certifications in those days before Cisco became the sad corporate dragon with no friends that we all know today,
Starting point is 00:07:13 they were highly germane and relevant. But I distinctly remember even now, 15 years later, that there was this entire philosophy of pretend that the entire world is Cisco only, which in networking is absolutely never true. It feels like a lot of the AWS designs and patterns tend to assume, oh, you're going to use AWS services for everything. I have never yet found that to be true other than when I'm just trying to be obstinate. And hell is interoperability
Starting point is 00:07:41 between a bunch of different things. Yes, I may want to spin up an EC2 instance and an AWS load balancer and some S3 storage or whatnot, but I'm also going to want to monitor it with pager duty. I'm going to want to have a CDN that isn't cloud front because most CDNs these days don't hate you in quite the same economic ways and are simpler to work with, etc cetera, et cetera, et cetera. So there's definitely a story wherein I've found that the interoperability of tying these things together is helpful. How do you avoid falling down the trap of, oh, everyone should be multi-cloud, single pane of glass, et cetera, et cetera, in practice that always seems to turn to custard? Yeah, I think multi-cloud and single pane of glass are actually two different things. So multi-cloud, like I agree with you to some sense, like pick a cloud and go with it. Like, unless you have really good business reasons to go for multi-cloud and sometimes you do like years ago, I worked at PagerD, they were multi-cloud for a reliability reason that, hey,
Starting point is 00:08:37 if one cloud provider goes down, you don't want to- They were an example I used all the time for that story. Specifically, the thing that woke you up was homed in a bunch of different places. Whereas the marketing site, the onboarding flow, the periphery stuff around it was not because it didn't need to be. The core business need of wake you up was very much multi-cloud because once upon a time it wasn't, and it went down with the rest of US East 1, and people weren't woken up to be told their site was on fire. 100%. And on the application side, even then, pick a cloud and go with it, unless there's a really compelling business reason for your business to go multi-cloud. Maybe there's something, credits or compliance or availability, right? There might be reasons, but you have to be articulate about whether they're right for you. Now, single pane of glass, I think that's different,
Starting point is 00:09:22 right? I do think that's something that ultimately is a net boon for developers. There, in any large organization, there is a myriad of internal tools that have been built. And it's like, well, how do I provision a new topic in the Kafka cluster? How do I actually get access
Starting point is 00:09:36 to the AWS console? How do I spin up a new service? How do I kind of do these things? And if I'm a developer, I just want to ship features. Like that's what I'm incented to do. That's what I'm optimizing for. And all this other stuff, I have to do as part of do these things? And if I'm a developer, I just want to ship features. Like that's what I'm incented to do. That's what I'm optimizing for. And all this other stuff, I have to do as part of my job,
Starting point is 00:09:49 but I don't want to have to become like a Kubernetes guru to be able to do it, right? So what a developer portal is trying to do is be that single pane of glass, bringing all of these common set of tools and responsibilities that you have as a developer in one place. They're easy to search for, they're easy to find,
Starting point is 00:10:04 they're easy to query, they're easy to use. I should probably have asked this earlier on, but let's disambiguate for a little bit here. Because when I'm setting up to use a new service or product and kick the tires on it, no two explorations really look the same. Whereas at most responsible mature companies that are building products, that are services that are going to production use, they've standardized around a number of different approaches.
Starting point is 00:10:30 What does your target customer look like? Is there a certain point of scale, a certain level of complexity, a certain maturity of process? Absolutely. So a tool like Ops Level or a developer portal really only makes sense when you hit some critical mass
Starting point is 00:10:44 in terms of the number of services you have running in production or the number of developers that you have. So when you hit 20, 30, 50 developers or 20, 30, 50 services, an important part of a developer portal is this catalog of what's out there. Once you kind of hit the Dunbar number of services, like when you have more than you can keep in your head, that's when you start to need tooling like this. If you look at our customer base, they're all, you know, kind of medium to large size companies. If you're a startup with like 10 people, ops level is probably not right for you. We use ops level internally at ops level.
Starting point is 00:11:12 And you know, like we're still a small company. It's like, we make it work for us because we know how to get the most out of it. But like, it's not the perfect fit because it's not really meant for, you know, smaller companies. Oh, I hear you. I think I'm probably,
Starting point is 00:11:23 I have a better AWS bill analytics system running internally here at the Duckbill Group and some banks do. So I hear you on that front. But it also implies to me that there's no ops level prospect or customer deployment that has ever been Greenfield. It's always, you're building existing things. There's already infrastructure in place. Vendors have been selected across the board. You aren't, if no one is starting a company day one, they're going to, all right, time to spin up our AWS account. And we're also going to wind up signing up for ops level from the sound of it. Accurate, inaccurate. I think that's actually accurate. Like a lot of the problems we solve are the problems that come as you start to scale both your product and your engineering team.
Starting point is 00:12:02 And it's the problem. What do those painful problems look like? In other words, what is someone sitting at home right now listening to this or driving to work, debating whether they want to ram a bridge abutment or go into the office, depending on their mental state today. What painful problem do they have that ops level is designed to fix? Yeah, for sure.
Starting point is 00:12:20 So let's help people self-select. So here's my mental model for any end-jork. There are product developers, platform developers, and engineering leaders. Product developers, if you're asking questions like, I just got paid for the service. I don't know what this does. Or it's upstream from here. Where do I find the technical documentation? Or I think I have to do something with the payment service.
Starting point is 00:12:36 Where do I find the API for that? When you get to that scale, a developer portal can help you. If you're a platform engineer and you have questions like, okay, we got to migrate. We're migrating, I don't know, from a data dog to honeycomb, right? We got to get these 50s or a hundred or thousands of services and all these different owners to like switch to some new tool or, Hey, we've done all this work to ship the golden path. Like how do we actually measure the adoption of all this work that we're doing? And if it's actually valuable, right? Like we want everybody to be on a certain set of CI tooling or a certain minimum version of some library or framework. How do we do that? How do we measure that ops levels for you, right? We have a whole bunch of stuff around maturity.
Starting point is 00:13:12 And if you're an engineering leader, ultimately the questions you care about are like, how fast are my developers working? I have this massive team. We've made this massive investment in hiring all these humans to write software and bring value for our customers. How can we be more efficient as a business in terms of that value delivery? And that's where Offset Level can help as well. Guardrails, whether they be economic, regulatory, or otherwise, have to make it easier than doing things incorrectly. Because one of the miracle aspects of cloud also turns into a bit of a problem, which is shadow IT is only ever a corporate credit card away. Make it too difficult to comply with corporate policies, and people won't.
Starting point is 00:13:48 And they're good actors. They're trying to get work done. They're not trying to make people's lives harder, but they don't want to spend six weeks provisioning an EC2 cluster. So there's always that weird trade-off. Now, it feels, and please correct me if I'm wrong, once someone has rolled out ops level at
Starting point is 00:14:05 their organization, where it really shines is spinning up a new service where, okay, great, you're going to spin up the automatic observability portion of it. You're going to spin up the underlying infrastructure in certain ways that comply with our policies. It's going to build CICD pipelines around it. You're going to wind up having the various cost instrumentation rolled out to it. But for services that are already extant within the environment, is there an ops level story for them? Oh, absolutely.
Starting point is 00:14:34 So I look at it as like, the first problem ops level helps solve is the cataloging problem. What's out there and who owns it? So not even getting developers to spin up new services that are kind of on the golden path, but just understanding the taxonomy of what are the services we have? How do those services compose into higher level things like systems or domains? What's the whole set of infrastructure
Starting point is 00:14:52 we have? Like I have 58 AWS accounts, maybe I have a handful of GCP ones also, some Azure. I have all this infrastructure that like, how do I start to get a handle on like what's out there in prod and who's responsible for it? And that helps you get in front of compliance risks, security risks. That's really the starting point for OpsL is building that catalog. And we have a bunch of integrations that kind of slurp all this data to automatically assemble that catalog or YAML as well, if that's your thing. But that's the starting point is building that catalog and figuring out this assignment of like, okay, this service and this human or this certain team, they're paired together. A number of offerings in this space, which honestly, my exposure to it is bounded simultaneously to things that are 10 years old and no one uses anymore, or a bunch of things I found on GitHub.
Starting point is 00:15:37 And the challenge that both of those products tend to have is that they assume certain things to be true about a given environment, that they're using Terraform to manage everything, or they're always going to be using CloudFormation, or everyone there knows Python, or something else like that. What are the prerequisites to get started with Ops Level? Yeah, so we work pretty hard
Starting point is 00:16:02 to build just a ton of integrations. I would say integrations is our just continuing thing we have going on in the background. Like when we started, like we only supported GitHub. Now we support all the Gits, you know, like GitHub, GitLab, Bitbucket, Azure DevOps. I think we're building Gidia. There's just a whole like long tail of integrations. The same with APM tooling, the same with vulnerability management tooling, right? And the reason we do that is because there's just this huge vendor footprint and people want Ops Level to work for them.
Starting point is 00:16:31 Now, the other thing we try to do is we also build APIs. So anything we have as a core integration, we also have an underlying API for. So that no matter what, you have an escape hatch. If you're using some tool that we don't support or you have some homegrown thing, there's always a way to try to be able to integrate that into Ops Level. When people think about developer portals, the most common one that pops to mind is Backstage, which Spotify wound up building internally, championing open sourcing. And I believe on some level turning into a product because if there's one thing people want, it's to have their podcast music company become a SaaS vendor, which is weird to me. But the criticisms that I've seen about it across the board have all rung relatively true, including from people internal at Spotify who have used the thing, which is the first is underestimating the amount of effort that is necessary to maintain backstage itself. That the build versus buy discussion is always harder.
Starting point is 00:17:26 Engineers love to build, but they shouldn't be building things outside of their core competency half the time. And the other is driving adoption within the org, where you can have the most amazing developer portal in the known universe, but if people don't use it, it may as well not exist.
Starting point is 00:17:40 And doing the carrot and stick approach often doesn't work. I think you have a pretty good answer that I need to not even ask you to elaborate on. Well, how do we avoid having to maintain this ourselves since you have a company that does this? But how do you find companies are driving adoption successfully once they have deployed ops level?
Starting point is 00:17:57 Yeah, that's a great question. So absolutely, like, I think the biggest thing you need first is kind of cultural buy-in, that this is a tool that we want to invest in, right? I think one of the reasons Spotify was successful with Backstage, and I think it was System Z before that, was that they had this kind of flywheel of like, they saw that their developers were getting, you know, better, faster, working, happier by using this type of tooling, by reducing the cognitive load. The way that we approach it is sort of similar, right? We want to make sure that there is executive buy-in that like everybody agrees, this is like a problem that's worth solving. The first step we do is trying to build out that
Starting point is 00:18:31 catalog again and helping assign ownership. And that helps people understand like, Hey, these are the services I'm responsible for. Oh, look, and now here's this other context that I didn't have before. And then helping organizations, you know, what, it depends on the problem you're trying to solve, but whether that's rolling out self-serve automation to help developers reduce what was before a ton of cognitive load, or if it's helping platform teams define what good looks like so they can start to level up the overall health of what's running in production. We kind of work on different problems, but it's picking one problem and then kind of working with the customers and driving it forward. On some level, I think that this is going to be looked down upon inherently just by automatic reflex of folks with infrastructure engineering backgrounds. It's taken me some time to learn to overcome my own negative reaction to it because it's, I'm here to build things and I want to build things out in such a way that it's
Starting point is 00:19:26 portable and reusable without having to be tied to a particular vendor and move on. And it took me a long time to realize that what that instinct was whispering in my ear was in fact, no, you should be your own cloud provider. If that's really what I want to do, I probably should just brush up on, you know, computer science trivia from 20 years ago and then go see if I can pass Google's SRE interview. I'm not here to build the things that just provision infrastructure from scratch every company I wind up landing at. It feels like there's more important, impactful work that I can do. And let's be clear, people are never going to follow guardrails themselves when they have to do a bunch of manual steps.
Starting point is 00:20:05 It has to be something that is done for them. And I don't know how you necessarily get there without having some form of blueprint or something like that provided for them with something that is self-service because otherwise it's not going to work. I 100% agree, by the way, Corey, like the take that like automation
Starting point is 00:20:22 is the only way to drive a lot of this forward is true, right? If for every single thing you're trying, like we have a concept called a rubric and it's basically how you measure the service health and you can, it's very customizable. You have different dimensions, but if for any check that's on your rubric, it requires manual effort from all your developers. That is going to be harder than something you can just automate away. So vulnerability management is a great example. If you tell developers, hey, you have to go up there, get this library. Okay. Some percent of the example. If you tell developers, hey, you have to go up there in this library. Okay, 7% of the limits.
Starting point is 00:20:48 If you give developers, here's a pull request that's already been done and has a test passing and now you just need to merge it, you're going to have a much better adoption rate with that. Similarly with like applying templates and being able to up-level, you know, kind of apply the latest version of a template to an existing service. Those types of capabilities,
Starting point is 00:21:00 anything where you can automate what the fixes are, absolutely you're going to get better adoption. As you take a look at your existing reference customers, which is something I always look for on vendor websites, because like, oh, we have many customers who will absolutely not admit to being customers. It's like, that sounds like something that's easy to say. You have actual names tied to these things,
Starting point is 00:21:20 not just companies, but also individuals. If you were to sit down and ask your existing customer base, so why did you wind up implementing OMS level? And what has the value that's delivered to you been since that implementation? What do they say? Definitely. I actually had to check our website because we land new customers and put new logos on it. I was like, oh, I wonder what the current set is. I have the exact same challenge. Oh, we have some mutual customers. And it's okay. I don't know if I can mention them by name because I haven't checked our own list
Starting point is 00:21:48 of testimonial rights lately because say the wrong thing and that's how you wind up being sued and not having a company anymore. Yeah, so I definitely want to stay on side on that part. But in terms of like kind of sample reference customer, a lot of the folks that we initially work with are the platform teams, right? They're the teams that care about what's out there and they need to know who's responsible
Starting point is 00:22:09 for it because they're trying to drive some kind of cross-cutting change across the entire, you know, production footprint. And so the first thing that generally people will say is, and I love this quote, this came, I won't name them, but like it's in one of our case studies. It was like, I had like 50 different attempts at making a spreadsheet and they're all like in the graveyard, like to be able to capture what's out there and who's responsible for it. And just OpServo helping automate that has been one of the biggest values that they've gotten. The second point then is now being able to drive maturity and be able to measure how well those services are being built. And again, it's sort of this interesting thing where we start with the platform teams and then sometime later security teams find out about OpServo and they're like, oh, this is a tool I can use to get developers to do stuff.
Starting point is 00:22:46 I've been trying to get developers to do stuff for the longest time. And I filed JIRA tickets and they just sit there and nothing gets done. But when it becomes part of this overall health score that you're trying to increase across the board, yeah, it's just a way to kind of drive action. I think that there's a dichotomy of companies that emerge. And I tend to see the world through a lens of AWS bills. So let's go down that path. I feel like there are some companies, presumably like OpsLevel,
Starting point is 00:23:11 whereas if I, assuming you're running on top of AWS, if I were to pull your AWS bill, I would see upwards of 80% of your spend is going to be on this application called OpsLevel, the service that you provide to people is opposed to the other side of the world, which is large enterprises where they're spending hundreds of millions of dollars a year,
Starting point is 00:23:32 but the largest application they have is a million and a half a year in spend. It's just that they have thousands of these things scattered everywhere. That latter case is where I tend to see more platform teams where I start to see a lot of managing a whole bunch of relatively small workloads. And developer platforms really seem to be where a lot of the solutions lead. Whereas 80% of our workload is one application, we don't feel the need for that as much. Is that accurate? Am I misunderstanding some aspect of it? No, 100%. You hit the nail on the head. Like, okay, think about the typical, like, microservices adoption journey. Like, you started with, you know, some small company like us. You started with a monolith. Then you read on Hacker News and realize, oh, if we want to hire people, we've got to be doing what all the cool kids are up to.
Starting point is 00:24:16 Right. We've got to microservice all the things. But that's actually, you know, microservices should come later, right? As a response to, you need to scale your org and scale your... As someone who started building some applications with microservices, I could not agree more. 100%. So it's as you're starting to take that steps to having just more moving parts in your production infrastructure, right? If you have one moving part, unless it's like a really large moving part that you can internally break down, like kind of this majestic monolith where you do have kind of like individual domains that are owned by different teams. But really the problem we're trying to solve, it's
Starting point is 00:24:44 more about like who owns what. Now, if that's a single atomic unit, great, but can you decompose that? But if you just have like one small application and kind of like the whole team is owning everything, again, a developer portal is probably not the right tool for you. It really is a tool that you need as you start to scale your engineer work. And as you start to scale the number of moving parts in your production infrastructure. I tended to used to think of that in terms of boring companies versus innovative ones, and I don't think that's accurate. I think it is the question of maturity and where companies lead to on some level if ops level starts growing and becomes larger and larger in different ways and starts doing
Starting point is 00:25:19 acquisitions and launching into other areas. At some point, you don't have just one product offering. You have a multitude of them, at which point having something like that is going to be critical. But I have to ask, given that you are sort of not exactly your target customer profile,
Starting point is 00:25:38 what have the sharp edges been on using it for your use case? Yeah, so we actually have an internal Slack channel we call Ops Level on Ops Level. And finding those sharp edges actually has been really useful for us. You know, all the good stuff, dogfooding, and it makes your own product better. Okay. So we have our main app. We also do have a bunch of smaller things that it's like, oh yeah, we need, you know, we have like, I don't know, various hack day things that go on. It's important. We kind of wind those down for, you know, compliance. We have our marketing site. We have like a terraform. So there's like
Starting point is 00:26:03 stuff. It's not like hundreds or thousands of things, but there's more than just the main app. The second though, it's really on the maturity piece that we really try to get a lot of value out of our own product, right? Helping, we have our own platform team. They're also trying to drive certain initiatives with our product developers. There is that usual tension of our product, like our own product developers are like, I want to ship features. What's the security thing I have to go take care of right now? But ops level itself helps reflect that. We had an operational review today and it was like, oh, this one service is actually now, we have platinum as a level.
Starting point is 00:26:33 It's in gold instead of platinum. It's like, why? Oh, there's this thing that came up. We got to go fix that. Great. Let's actually go fix that. So we're back into platinum. Do you find that there's often a choice you have to make internally where you could make the product more effective for your specific use case, but that also diverges from where your typical customer needs or wants the product to go?
Starting point is 00:26:59 No, I think a lot of the things we find for our use case are like, there are more small paper cuts, right? That just as we're using, it's like, hey, like, as I'm using this, I want to see the report for this particular check. Why do I have to click six times to get, you know, like, wouldn't it be great if we had a button rate? And so it's those type of like small innovations that kind of come up and those ultimately lead to, you know, a better product for our customers. We also work really closely with our customers and, you know, developers are not shy about telling you
Starting point is 00:27:18 what they don't like about your product. And I say this with love, like a lot of our customers give us phenomenal feedback just on how our product can be made better. And we try to internalize that and roll that feedback into the product. You have a number of integrations of different SaaS providers, infrastructure providers, etc. that you wind up working with. I imagine that given your scale and scope and whatnot, those offerings are dictated by what customers say,
Starting point is 00:27:43 Hey, we're using this thing. Are you going to support that or are you not going to maintain our business? Which is a great way to wind up financing a lot of product development and figuring out what matters to people. My question for you is, if you look across the totality of your user base, what are the most popularly used integrations, if you can say? Yeah, for sure. I think right now I can actually dive in to pull the numbers. GitHub and GitLab are,
Starting point is 00:28:08 I think GitHub has slightly more adoption across our customer base. At least with our customers, almost nobody uses Bitbucket. I mean, we have a small number, but it's, I think, single digit percentage. A lot of people use PagerDuty, which, hey, I'm an ex-PagerDuty person,
Starting point is 00:28:21 ex-Deutonian, I'm glad to see that. I have a free tier PagerDuty account, ex-Dutonian, I'm glad to see that. I have a free-tier PagerDuty account that will automatically page me from my home automation stuff, specifically if, you know, the fire alarm goes off. Like, yeah, okay, there are certain things I want to be woken up for, but it's a very short list. Yeah, it's funny, the running default message
Starting point is 00:28:39 when we use a test PagerDuty was the server's on fire, but in your case, it'd be like, the house is on fire. Like, you know, go get that taken care of. There's one other tool also that's used a lot. Datadog actually is used a ton by just across our entire customer base. Despite it's, we're also a Datadog partner, we're a Datadog customer, you know, it's not cheap, but it's a good product for, you know, monitoring logs in their opinion. No, other than cloud infrastructure providers, I get the number one most common source of inquiries is Datadog optimization It has now risen to a board-level concern in many cases because observability is expensive. That's a sign of success on some level. Meanwhile, I'm sitting here like, date a dog? Oh
Starting point is 00:29:13 my god, that's disgusting. It's like Tinder for pets, which it turns out is not at all what they do. Nice. Yeah. As far as infrastructure providers, is that something that people wrap around on day one, or does that tend to be a later-in-time approach? Are they first in production. You know, if you have multiple AWS accounts, multiple Kubernetes clusters, dozens or even hundreds of teams, God help you if you're going to try to build a list manually to consolidate all that information.
Starting point is 00:29:54 That's really the first part is integrate Kubernetes, integrate your CICD pipelines, integrate Git, integrate your cloud account. Like we'll integrate with everything and we'll try to build that map of like, here's everything that's out there
Starting point is 00:30:02 and start to try to assign it. And here's people that we think might be responsible in terms of owning the software. That's generally the starting point. Which makes an awesome amount of sense. I think that going at it from the infrastructure first perspective is where I've seen most developer platforms founder. And to be fair, the job is easier now than it was years ago, because it used to be that you're being out-innovated by AWS constantly. Innovation has slowed down there, and you know that because of how much they say the pace of innovation has only sped up. And whenever AWS says something in a marketing context,
Starting point is 00:30:33 they're insecure about it. I've learned this through the fullness of time observing that company. And these days, most customers do not use the majority of features available for any given service. They have solidified to a point where you can responsibly build on top of these things. Now it seems that the problem is all the yes and stuff that gets built on top of. Yeah. Do you have an example, actually, like one of the yes and tools that you're thinking about? Oh, absolutely. We have a bunch of AWS environment stuff, so we should configure CloudWatch to look at all these things from an observability perspective. No, you should not. You should set up Datadog.
Starting point is 00:31:12 And the first time someone does that by hand, they enable all of the observability and the rest and suddenly get charged approximately the GDP of Guam. And okay, maybe we shouldn't do that because then you have the downstream impact of that on your CloudWatch bill. So, okay, how do we optimize this for the observability piece directly tied to that? How do we make sure that we get woken up when the site is down or preferably before that, but not every time basically a EBS volume starts to get a little bit toasty? You have to start dialing this stuff in. And once you've found a lot of those aspects, being able to templatize that and roll that out on an ongoing basis
Starting point is 00:31:40 and having the integrations all work together feels like it's the right problem to be solving. Yeah, absolutely. And the group that I think is responsible for that kind of, because it's a set of problems you described, is really like platform teams. Sometimes service owners are like,
Starting point is 00:31:53 how should we get paid? But really what you're describing are these kind of cross-cutting engineering concerns that platform teams are uniquely poised to help solve in an engineering organization, right? I was thinking about what you said earlier. Nobody just wants to rebuild the same info over and over, but it's sort of like,
Starting point is 00:32:05 it's not just building the inference. It's kind of like solving this. How do we ship? How do we actually run stuff in prod? And not just run it, but get observability and ensure that we're woken up for it. And like, what's that total end-to-end look like
Starting point is 00:32:14 from like developers writing code to running software in production that's serving traffic and solving all the problems that's with it. That's what I think of as platform engineering. So my last question before we wind up wrapping this episode comes down to, I am very adept
Starting point is 00:32:30 at two different programming languages, and those are brute force and enthusiasm. What implementation language is most of what you find yourself working with, and why is it invariably going to be YAML? Yeah, that's a great question. So I think there's, in terms of implementing ops level and implementing a service catalog, we support YAML.
Starting point is 00:32:50 Like, you know, there's this very common workflow. You just drop a YAML spec basically in your repo if you're a service owner and that we can support that. I don't think that's a great take though. We have other integrations. Again, if the problem you're trying to solve is I want to build a catalog of everything that's out there, asking each of your developers, hey, can you please all write YAML files that describe the services you own and drop them into this repo? You've inverted this database that essentially you're trying to build of what's out there and stored it in Git, potentially across several hundred or thousands of repos. You put a lot of toil now on individual product developers to go write and maintain these files. And if you ever have to like make a blanket update to these files, there's no atomic way to kind of do that.
Starting point is 00:33:29 Right. So I look at YAML as like, I get it. You know, like we use YAML for all the things in DevOps. So why not our service catalog as well? But I think it's toil. Like there are easier ways to build a catalog by kind of just integrate. Like hook up AWS, hook up GitHub, hook up Kubernetes, hook up your CIDC pipeline, hook up all these different sources that have information about what's running in prod, and let the software, let the tool automatically infer what's actually running, as opposed to
Starting point is 00:33:53 requiring humans to manually enter data. I find that there are remarkably few technical holy wars that I cannot unify both sides on by nominating something far worse, like the VI versus IMAX stuff, the tabs versus spaces, and of course the JSON versus YAML folks. My JSON versus YAML answer is XML, God's language. I find that as soon as you suggest that, people care a hell of a lot less about the differences between JSON and YAML because their job is to now kill the apostate, which is me. Right. Yeah. I remember XML like, oh man, 2002 SOAP. I remember SOAP as a protocol. That was the thing. Some of the earliest S3 API calls were done in SOAP. And I think they finally
Starting point is 00:34:36 just used it to wash their mouths out when all was said and done. Nice. Yeah. I really want to thank you for taking the time to do your level best to attempt to convert me. And I would argue in many respects, you have succeeded. I'm thinking about this differently than I did half an hour ago. If people want to learn more, where's the best place for them to find you? Absolutely. So you can always check out our website, Optible.com. We're also fairly active on LinkedIn.
Starting point is 00:35:00 If Twitter hasn't imploded by the time this episode becomes launched, then we can also check us out at twitter.com slash opslevelhq. We're always posting just different content on how to be successful with service maturity, DevOps, developer productivity, so that ultimately you can ship value to your customers faster. And we will, of course, put links to that in the show notes. Thank you so much for taking the time not just to speak with me, but also for sponsoring this episode. It is appreciated. Cheers. Ken Rose, CTO and co-founder at Ops Level. I'm cloud economist Corey Quinn, and this has been a promoted guest episode of Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas
Starting point is 00:35:40 if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment, which upon further reflection, you could have posted to all of the podcast platforms if only you had the right developer platform to pull it off. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.