Screaming in the Cloud - Episode 51: Size of Cloud Bill: Not About Number of Customers, but Number of Engineers You've Hired

Episode Date: March 6, 2019

Years ago, if you wanted to launch an Internet company or Web application, you had to own necessary hardware. Now, the economics have changed drastically with the ease of Cloud computing. It�...��s still a new industry that people are trying to figure out, especially when it comes to cost and optimization. Today, we’re talking to Dann Berg, a Cloud ops analyst at Datadog. He helps others understand and lower the cost of Cloud operations. Dann is a detective who is dedicated to figuring out why a company’s Cloud bill is so high. Some of the highlights of the show include: Companies struggle with field of Cloud economics; can be overwhelming because there’s so much to learn about products and implementation Companies use the Cloud to grow quickly, which makes their Cloud costs grow quickly and more than expected Only access to full list of every resource being used is the Cloud bill; there’s no comprehensive inventory service available Companies need to offer visibility to Cloud bill; not everyone has access to understand how their actions impact the bill Cost of Cloud bill is dependant on different factors, including new features, new users, and cost of goods sold (COGS) Scale and manage bill by using a platform app or hiring a consultant/team Understand pricing of AWS and learn best practices for cost controls early on Don’t leave money on the table by focusing on engineering time - not best use of resources; focus on the smallest things that have the biggest impact Cost is important, but don’t slow down those developing in the Cloud; open lines of communication to create culture to understand cost, value what’s measured Links: Dann Berg on Twitter Datadog re:Invent AWS Cost Explorer CloudHealth CloudCheckr Cloudability Lambda EC2 GCP Azure CHAOSSEARCH .

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode of Screaming in the Cloud has been sponsored by Chaos Search. Chaos Search is a cloud-native SaaS offering that extends the power of Elasticsearch's API
Starting point is 00:00:34 on top of your data that already lives in Amazon's S3. Chaos Search essentially turns your data in S3 into a warm Elasticsearch cluster, which finally gives you the ability to search, query, and visualize months or years worth of log and event data without the onerous cost of running a hot elk cluster for legacy data retention. Don't move your data out of S3, just connect the Chaos Search platform to your S3 buckets, and in minutes, the data is indexed into a highly compressed data format and written back into your S3 buckets, so it keeps the data under your control. You can then use tools like Kibana on top of that to search and visualize your data all on S3, querying across terabytes of data within seconds. Reduce the size of your hot elk clusters and waterfall your data to ChaosSearch
Starting point is 00:01:26 to get access to an unlimited amount of log and event data. Access more data, run fewer servers, spend less money. ChaosSearch. To learn more, visit ChaosSearch.io and sign up for a trial. Thanks to ChaosSearch for their support of this episode. Welcome to Screaming in the Cloud, I'm Corey Quinn. I'm joined today by Dan Berg, who's a cloud ops analyst at Datadog, and his job is generally to liaise between engineering and finance to help understand and lower cloud operations costs, which is a subject that's of course near and dear to my heart. More importantly than that, he has one of the most impressive mustaches I've ever seen. And even now when we're recording this a week
Starting point is 00:02:11 after reInvent, I'm still in awe over it. Dan, welcome to the show. Thank you so much. I really think it's the mustache that helps me save costs. It's done great things for my career. I'm sure it has to, because even if you're telling me something that's objectively wrong, no one is going to argue with that mustache. You know, mustaches are coming back. I shaved it last year for Movember, and then after November was over, I went back to my usual beard, and everybody missed it.
Starting point is 00:02:39 My wife missed it, I kind of missed it, and it's been here since then, so a little over a year now. I wish I could grow one, but that's a whole separate argument for another time and a sad over drinks at a bar conversation. Let's talk a little bit about cloud economics, starting with what is it? Yeah, fantastic. I mean, it's interesting because previously when somebody would want to launch a internet company or a web application or something, you'd have to actually own the physical hardware. Which, I mean, this is all stuff that your audience knows.
Starting point is 00:03:12 But the economics have just changed so drastically with the ease of cloud computing and AWS and everybody else. And since it is still such a new industry, we're still trying to figure things out in terms of cost and optimization and whether something is more cost effective to have it on-prem or have it in the cloud. And really being able to run all those numbers, being able to work with finance to understand the bill, first of all, be able to do cost projections and really understand your application, both how it works technically and how adding a certain number of users will impact your bill. I would put all of that into cloud economics. It winds up being a radical shift in how companies view these things. When you used to have to do capital expense planning that would span years, it was hard to accidentally order $6 million worth
Starting point is 00:04:11 of hardware and not get fired for that or be accused of embezzlement. But it's easy to wind up causing tremendous waste when everything is on demand and your five organizational layers of removal. When your five organizational levels of separation between the person who can cause changes to the bill and the person who gets the bill. I wound up calling myself a cloud economist because those are two words that almost no one understands, so no one was going to argue with me about it.
Starting point is 00:04:39 But it turns out this is a field that a lot of companies are continuing to struggle with. Yeah, it's interesting, because I came from a background of the actual physical hardware and dealing with CapEx and projections and things like that. And at my previous company, I kind of started getting more and more into the cloud, which is how I eventually ended up where I am. And it was interesting because it presented an entirely new set of challenges, not just from the money perspective, but I think anybody approaching it from the finance side of things, there's just so much to learn that it can be so overwhelming. Because it's not just the sheer number of products that Amazon has, it's the millions of different ways that they can bill each individual project or product. So
Starting point is 00:05:26 wrapping your head around it and really understanding it is, it's a continuing journey. I don't know if it'll ever end. Just before we dive into this any further, I want to give a quick conflict of interest statement here. I am not partnered with any vendor in this space. Datadog is not sponsoring this episode of the podcast. And as of the time of this recording, Datadog is not one of my customers because the hard sell is next week. So with that in mind, I also want to call out as well that you are speaking in general terms and not specific aspects of the bills that you see in any particular company as am I. Is that, I think, a fair way of calling this
Starting point is 00:06:05 out? Yeah, I mean, that's exactly true. I mean, I think that the things that I do, we can talk about in terms that are actually going to be useful to anybody listening. And I don't think that the actual numbers are relevant to that, because I think it's really about methods. And I think it's the ways of thinking about things that are really important here. So yeah, that is correct. Perfect. And for those who only know me as the stupider voice on this podcast, what I do day to day is I go into companies that have large infrastructures and I help optimize the AWS bill, which is why I care so much about having this conversation today.
Starting point is 00:06:40 So let's start at the very beginning. Dan, in your experience, what do companies care about with respect to cloud bills? Because everyone starts with the conversation of, oh, the number's too high, we need to make it lower. How do you see that manifesting? Yeah, I mean, that's exactly how it starts. And I view my job a lot as detective work, especially starting at a new company, or for you, I'm sure, starting at a company that you don't necessarily know well. The way that you get to know a company or what's going on is starting with the bill and working your way backwards. And hopefully you're working with a
Starting point is 00:07:16 company that has a good tagging strategy in place. Hopefully you have access to the people that can provide you answers. But I mean, it starts with companies that are going to grow fast and to grow fast, they just throw something up in the cloud. And then the company does well, and they're just growing bigger and bigger and bigger. And then pretty soon, their cloud costs are growing faster than expected. And somebody at the company, whether they have a finance department or somebody that's looking at the bills, takes a moment to be like, wait, what are we doing? And that's really where somebody who's dedicated to working on the bill starts. And that's really where the detective work starts from my experience. Organizational politics are always fascinating. You and I also both bias for companies that are quote-unquote born in the cloud or have a primary presence here, and even for startups. There's a whole other world out there of companies that actually have a business model and a history that isn't measured with a stopwatch so much as it is calendars. And their approach to this is often
Starting point is 00:08:22 very different. They'll wind up wanting enterprise agreements in place with a cloud vendor before they ever put anything in place. That's a bit of a different origin story, but I still find that in the work that I've done, it arrives at the same place. So regardless of how people get to the cloud, eventually all problems start tending to normalize around certain particular things.
Starting point is 00:08:47 In my experience, when I have conversations with clients and prospects, very often the person who's noticing the problem and is brought in to solve the problem and generally reaches out to me, sometimes they don't even have access to the entire bill for their entire company. They're only limited to a particular division, or they're getting an extract through some other tool. And shining the light on what actually is happening in a cloud environment is often sort of order number one. It's somewhat embarrassing that in 2018, the only way to get a full list of every resource you're using in AWS across regions and throughout your account is the bill. There's no inventory service that is comprehensive today.
Starting point is 00:09:31 Yeah. And it's interesting to me too, when you have developers that don't have access to the bill, just because having that view of your underlying costs, and if you make a change, like you switch from i3 XL to i3 2 XL, how that impacts the bill, even though you're just doing three quarters of the original number of nodes, like really being able to see those changes as they're happening. I think it's super important for companies
Starting point is 00:10:00 to give visibility to people, whether that's providing access to Cost Explorer, which can be kind of daunting because there's no way to just grant access to the UI of Cost Explorer without basically revealing or giving access to your entire billing center or finding some external tools and then granting access that way. The visibility, as you said, is the very first super important key. One thing I find as I talk to customers as well is, despite their initial approach of the bills too high make it lower, there's also an unspoken desire of people want to be able to accurately predict it. If you're spending $5 million last month,
Starting point is 00:10:45 and this month you spend $7 million, the CFO is going to have words, and those words are interpreted as you're spending too much money. But if you'd instead spent $3 million this month when $5 million had been predicted, you'd be having many of those same words, just because these things start to matter as far as trendline goes. What does this mean for cost of goods sold? What does this mean for our unit economic model? How do we wind up predicting accurately 18 to 36 months out when we can't even predict one month ahead what the bill is likely to look like? And that's often misunderstood as you're spending too much money. Yeah, I mean, that's exactly what I've seen too. And depending on the size of the organization, the cost of your bill can be dependent on so many different factors.
Starting point is 00:11:26 Because one, it's obviously whatever it takes to run your application for paying customers. You have the COGS cost. And then there's also people that are working on new features or people that are testing out new features. Maybe you have trial customers or you're doing all of these different things that can have an impact on the bill that will pop up. So really being able to understand what's going on, being able to attribute that back to actual usage and cost data, being able to communicate that to finance to understand, okay, I'm aware of this initiative, here is what it looks like on the bill, and really having all of those pieces of the puzzle.
Starting point is 00:12:08 So not only understanding how adding users to your platform impacts bills, but all the different initiatives that might be going on inside a medium to large-sized company that will impact the bill that you might not generally think of immediately. I have a client who, half tongue-in-cheek, once pointed out that the size of the bill is less a function of how many customers you have and more a function of how many engineers you hired. You're right. When you talk to companies, when I give talks to companies at the enterprise scale, I'll often ask the room, okay, raise your hand if you can spin up resources in AWS. And most people raise their hands. Cool. Keep them up if you're not
Starting point is 00:12:51 allowed to see the bill. And a surprising number of hands remain in the air. That is default behavior, but it's broken in some ways. It's hard to hold people to account for the resources they're spinning up if they don't know what makes sense. Yeah. I mean, as an example, just go open up one of Amazon's free tier accounts yourself, just personally, and start playing around with things and you forget to turn something off. And suddenly you have a bill that's hundreds of dollars. And I mean, imagine that at a company with tons of engineers who all have the ability to spin up servers and nobody is coming directly to them. Like if there's a spike in the bill, the chances that, especially at a medium or large company, the chances that myself or finance or somebody is going to come specifically to that single engineer and say, hey, we noticed you did this and it impacted our costs in this huge way is practically nothing. And so there needs to be, like I said, there needs to be visibility so that people have access to the bills on a regular cadence so they can see how their
Starting point is 00:13:57 actions are impacting the bill or some other way to really just have that awareness and that understanding. The challenge that you'll also see is, you're right, there is no way of directly attributing things back without some rudimentary tooling. Things like cost allocation tags are great, but they're not retroactive. So you have to start approaching this after a few whoopsie mistakes. I mean, my bill is nothing to speak of. I just got November's bill at the time of this recording, and it was $16 and change. Last month, I was doing some work and accidentally left a few VPC endpoints running, and surprise, it was a bit over 50 bucks. I mean, that is an over 2x surprise that I got on my bill,
Starting point is 00:14:39 and the dollar figure doesn't matter, but add a few zeros to the end of this, and you start to see how things start to be very confusing. The fact that it was bounded just to me means it was pretty easy for me to figure out what had happened, but even the 20-person development team, that becomes things start to be very confusing. The fact that it was bounded just to me means it was pretty easy for me to figure out what had happened. But even the 20-person development team, that becomes a big question mark. If you have 2,000 engineers and not a lot of instrumentation or visibility into it, it becomes almost impossible. And it more or less becomes the cost of doing business. To that end, let's talk a little bit about scale. I have my own opinions on this, but I'm curious at what point you wind up seeing in a company that it's time to start using a platform app to tell you where your bill's going, hire a consultant, hello, or hire a dedicated team of people like you to wind up managing this for a company. Yeah, I mean, that's such an interesting question. You know, I think that in terms of saving money in the cloud, a lot of people, you just mentioned three options.
Starting point is 00:15:30 And I think those options themselves might be something that people don't quite realize exists. Because when you're starting on Amazon and you're starting on there, you get your bill and you're like, shoot, I need to figure out how to lower this. So you're going in there, you're trying to learn best practices, you're trying to understand what this API call in Cost Explorer actually means and attribute it to something specific. And then you grow to a certain scale. And there are all these services. And I was going to bring this up too. It's crazy to me how you need external services like CloudHealth or CloudChecker
Starting point is 00:16:06 in order to really understand your bill and see things like, oops, you left something on. And that's not built into something like Cost Explorer. Cost Explorer is getting better, but Amazon is really lacking there. So the options you have are figure out the money stuff yourself, which is a whole job in and of itself. Use one of these third-party tools that ingests your bill. It provides recommendations, security recommendations. It lets you parse your usage a lot better than Cost Explorer and in more detail. And then you have the option of hiring a consultant such as yourself when you're of a certain, who can just come in and fix things, hopefully.
Starting point is 00:16:46 And then when you're of a certain scale, you might want to just have a dedicated person, depending on how fast you're moving, what sort of pieces or conversations you're having with Amazon on an ongoing basis to really manage and do that. And it's hard to give exact numbers for when that is the case because it's definitely a company-by-company basis. But when it comes down to it, as somebody who operates in the cloud, those are your options. And if you're not doing one of those,
Starting point is 00:17:17 you're going to have an outrageous bill at some point very soon. Just for those who may not spend their lives diving into the intricacies of AWS billing, Cost Explorer is a native tool that is either free or costs one penny per API call, depending on how you're interacting with it, that gives you a decent degree of visibility into your bill. There are companies such as CloudHealth, CloudChecker, Cloudability, Cloud Bandsaw, which I just made up, and a bunch of other companies that have similar sounding names to the tune of about a dozen of them now, where they all wind up doing this as a service.
Starting point is 00:17:54 And their model traditionally falls into the pay a percentage of your bill for those platform offerings. At the risk of alienating people who work for those companies, the honest assessment I can say from what I've seen after a few years of doing nothing but this, there is no single platform tool for sale out there that is so far better than the other folks in this space that if you're on one, I would suggest moving to another. They're all very decent at solving this problem. They take different approaches and come at it from different angles, but They're all very decent at solving this problem. They take different approaches and come at it from different angles, but they're all more or less equivalent. Now, I'm sure my email is going to blow up at that with angry notes, but that's my position on it at this time. If you believe that your product has a key differentiator, please let me know.
Starting point is 00:18:40 I am thrilled to modify that statement in a future episode if you can convince me of it. Again, I partner with no one in this space. I haven't had extensive experience in any of those tools. I mean, a little bit, but I would have to say that I agree with that for the most part. And the crazy thing for me is, as of right now, all of these tools that I've seen really operate a pricing model on percentage of your bill. And if your bill is still at a manageable level, using a tool like that that's a percentage of your bill compared to the cost savings that you're getting is definitely reasonable. You're going out of the startup category and into the small business category. Depending on your usage, your cloud bill might get to the point where it just doesn't make sense. And that's really the point where you start exploring other options, whether that's a consultant or that is getting a dedicated person on board. Absolutely. I've never yet spoken to a customer who heard any form of pricing model that involved a percentage of their bill or percentage of their savings and was happy to
Starting point is 00:19:52 hear it. It seems that it works mathematically, but there's something broken psychologically about charging percentages. I understand why people do it. To that end, I just didn't end run around the entire thing. My pricing model is I charge a fixed fee. If I don't find at least 10 times that fixed fee in first year savings, I give people their money back, which I've never had to do because surprise, I know what I'm doing. That said, there is a floor below which I can't do a whole lot for a company. In most cases, and there are rare exceptions to this, it starts at about a million dollars a year of bill spend. If you're spending $40,000 a month or so, I'm thrilled to have a conversation, but there
Starting point is 00:20:35 really isn't likely to be an engagement that makes sense from a pure cost reduction story. That's unfortunate because, frankly, you shouldn't need to be wasting X dollars before bringing someone in to help you with that makes fiscal sense. But that's the world we live in. Yeah, exactly. And if your bill is like 40k a year, you have a great opportunity to really, one, use these tools to charge a percentage of your bill, because it's probably worth it in that particular instance. And two, if you're listening to this podcast, you're already thinking about this stuff. Really start diving into the pricing of AWS. Learn the best practices when it comes to cost controls and get those things in place early while you're still at that scale. Because as you grow and you get to that $1 million a year level,
Starting point is 00:21:28 you're going to be glad that you have some of that experience under your belt. And you're going to be in a much better situation than 90% of companies. Maybe that's exaggerating, but yeah. The cost stuff is something that everybody can need dedicated attention paid to. Yeah, and what you're describing is sort of the edge case of where I can add value historically.
Starting point is 00:21:51 You're small now, you know you're growing, what do you need to instrument today so that a year from now you have data that's actionable, that points to business metrics that make sense. I guess the third stage of this, beyond using consultants and beyond using platform-as-a-service offerings, when is it time to hire a you or a team of people like you to build out a cloud-costing organization? I mean, that totally depends, one, on what your relationship is like with different cloud providers,
Starting point is 00:22:24 whether you're working with them on a regular basis on different parts of your bill, whether it makes sense to have somebody on board in a full-time position to be able to work with your different engineers to try to identify big cost spikes and get them down, where you need the dedicated person to be able to jump into those finance meetings to help people understand the bill, to take control of that. I mean, in terms of my full-time job, there's a lot of different people and teams that I interact with that takes a full-time job, for sure. I mean, right now at Datadog, it is just me working on this. That might not always be the case.
Starting point is 00:23:14 But when you have a consultant such as yourself, sometimes they have other clients, it might be a limited period of time where they're focusing on your company and maybe you have a retainer. There's a bunch of different deals that you can have, and I'm sure you can speak to this a little bit better than myself. But when you get to the scale where you need to be having these regular meetings, you need somebody that intimately understands your application and how it runs and how it interacts with different cloud providers,
Starting point is 00:23:39 then it might be time to start considering bringing on somebody full-time. What's fascinating to me is that I did this internally at companies in years past, and that gave rise to my current consultancy. I was convinced when I started my company over two years ago that I was already up to speed on everything that I needed to know for this. And what I've learned is that I left so much money on the table back then just because it was never the only thing that I got to focus on. That's something that caused a bit of a revelation and awakening for me. There's always another level. And I see that to some extent with customers I've had in the past.
Starting point is 00:24:17 A majority of customers implement some or most of what I recommend, all of which in my first pass is low or no engineering effort. And a couple of them implement everything and then go significantly beyond what I've identified to the tune of re-architecting applications, the tune of devoting a team of engineers for six months to build things. And I'll talk to them and they'll be incredibly excited about that when I do my follow-ups. Great. Okay. You save $200,000 on your annual bill. How'd you do it? Oh, we just had our team of six engineers working on this for the last six months. And unless there's a growth story or something else tied to that, you spent more in engineering time and loss of focus than you're ever going to recoup in the near future. So at some point,
Starting point is 00:25:06 cost no longer becomes the driving concern. In other words, you're never going to optimize your way to your next business milestone. It's, well, we were about to go out of business, but then we cut our cloud bill and then we raised a Series C is usually not a story that you hear in the real world. Yeah. I mean, it's interesting because as you said, the first thing that you present is kind of these low-hanging fruit. And if you're a large organization and you haven't spent a lot of focus on this,
Starting point is 00:25:36 there are quite a few that I'm sure that I could identify, I'm sure you could identify that it's like, okay, we'll do this, this, and this, and you'll be at a much better place. And having those engineers dedicating that time and full-salaried engineers working on saving the $200K a year isn't really the best use of resources. Right. With the caveat that there are, of course, exceptions, strategic objectives, and constraints that I'm not necessarily privy to. This is a third-party speaking in the general case perspective. This is not, oh, if you're doing this, you're easily doing things wrong. Context matters. And it's never immediately clear from the outside what that necessarily looks like internally to a company.
Starting point is 00:26:19 Yeah. And I don't want to discourage people from re-architecting their app in order to run better and more efficiently, because obviously that's important. But I think, I mean, the most important thing is to focus on the smallest things that have the biggest impact. Like the 70-30 rule or whatever it is where the 30% will give 70% of the savings or whatever, and really being able to identify those things by looking at your bill and seeing where your biggest opportunities are and really nailing those is going to give you much, much higher returns, both on your bill and otherwise. Absolutely. A common story I'll hear is when I'm presenting my findings where an engineer will chime in and say, hey, you didn't mention those unattached elastic IPs, at which point I can often hand them a quarter and say, here you go, you've now turned a profit on this hour of meeting. Now the next bullet point says $800,000 a month. Let's go back to that. It's almost an urge to go alphabetically rather than starting with the big numbers and working your way down. And globally, we've seen, and there have been reports published on this by vendors,
Starting point is 00:27:30 but EC2, for example, is ballpark of 60% of global AWS spend. If you add in S3, RDS, data transfer, Elastic Block Store, you're up to 85%. And then there's a very long tail. No one has ever hired me to optimize their Amazon Chime bill. That doesn't tend to happen. Something else that I think that people are still surprised by is I've never seen a significant Lambda bill. Anytime a company is spending thousands on Lambda, they're spending hundreds of thousands or millions on EC2. It's sure that you wind up with some spend in other places, but focusing on things that are easy, things that make more sense in the short term, and getting the quick wins in before focusing on the bigger stuff is something that people tend to gloss over. They think everything has to be a hard engineering problem,
Starting point is 00:28:21 and it's really not. Turn off stuff you're not using. Delete data you don't need anymore. Make sure that your applications are built in such a way that they're not speaking through a managed NAT gateway all the time. There are basic block and tackle stuff that you can look into before you wind up going down the road of building custom bots to spin up and down your developer environments and people leave the office because you've hooked them into a geo-tracking system. Yeah, I mean, one of the things that I've noticed from my experience that often surprises people who work in the cloud is data transfer. That's one that it might not be the biggest opportunity for savings, but it is often the biggest surprise. Just because you think of data transfer in terms of it's free coming in, you pay going out. But I mean, there are so many
Starting point is 00:29:07 different ways that they get you with data transfer, whether it's across AZs, across regions, doing different load balancers, different everything. They're all different pricing models. And I found almost always a surprise. Absolutely. Moving one gigabyte of data in AWS from one place to another is anywhere from free to 25 cents per gigabyte or more, depending upon exactly what you're doing. Sorry, 24 cents. I should be precise on that. There really isn't a rhyme or reason to a lot of this, but understanding exactly how your application is built for things like that is important.
Starting point is 00:29:47 And one thing I'm careful to do is to highlight this, but telling people, okay, time to stop and redo your entire software architecture so you can save data transfer money is usually not realistic either. There's things to consider in the next generation of your app, but virtually no one rebuilds their application from the ground up solely to save money. There are cost optimizations to consider when you're doing that, but it's never going to be the refactor decision point. Yeah, hopefully you're not in a place where that is your best option, because then you might not have built your application for the cloud, which is possible if you're a large organization that comes from a background of not being cloud native. But it's rare. It's rare, I would say. So my real thumb, to answer our earlier question then, and your numbers may vary, but use one of the applications when you're below a million bucks in annual spend. When you're between one to, let's say, $30 million a year, it's, hi, let's talk. And too far beyond
Starting point is 00:30:47 that, you start to rapidly hit a point where your needs become specialized enough that having at least one or two people internally, either on a fractional or full-time basis, focusing on cost optimization is valuable. Those are my tiers. Anything horribly objectionable in what I just said to you? No, and I think that's about right. I mean, the only thing that I would say is it's hard to add exact numbers to that. The range is good, but it really has to do with your business's needs and how you're operating in the cloud and what your relationship with those cloud providers is like. Yes, put a big ish on that, and I think that probably makes everyone more comfortable.
Starting point is 00:31:24 Exactly. But whether you're using Amazon exclusively or whether you're also using GCP, Azure, the other providers, there's so many different factors to take into account with whether you need somebody full-time. The question then becomes, and I want to make sure that we address it without assuming aspects of it as well, to beg the question. So how do you convince developers that costs are important without slowing them down? But to get back there first, is that something, it's really tricky because you really, the last thing you want is to slow down people who are developing in the cloud and definitely see my role specifically not as a gatekeeper. And that's always how I communicated. I'm like, you do not need to come to me for permission to do anything. Operate as usual, grow as the business needs dictate. But really, my goal is to open up those lines of communication where if there's temporary capacity coming up versus just general auto scaling growth, or whether you're going to be adding a certain amount of capacity that, oh, we should actually buy reserve instances for this. Being a part of that conversation and adding that additional voice to it is really valuable. And I hope that when working with developers,
Starting point is 00:32:53 my goal is to be able to express the value of just having additional eyes on what's going on and being able to share what you're working on with additional people to just build the amount of knowledge that's going towards a particular project or happening. The role can either become that of a gatekeeper where you keep people from provisioning things until certain criteria are met, or it can be a trailing function that cleans things up. I find the latter tends to be the much better approach. I mean, back in data center days, we dealt with
Starting point is 00:33:25 six-week provisioning cycles if we're fast to get a new set of servers spun up. So if you cut that in half to a three-week provisioning process to get EC2 resources spun up, you know what most line managers have is a company credit card, and suddenly you have shadow IT on the rise. The advent of cloud was in no small part due to the fact that, sure, it was more expensive, there were security and compliance concerns, data residency issues, but on balance, you didn't have to deal with those smug jerks in central IT. And now if you wind up recreating those patterns, you'll see the same type of thing start to emerge again. So it's working collaboratively with people and not yelling at them when they get it wrong. If you have a developer who spins up a testing environment
Starting point is 00:34:08 that happens to comprise a $20,000 a week cluster, great. It's time for a conversation, but that conversation doesn't need to start with screaming as you crash through the door to the building that they're in. It tends to wind up being something that has much more nuance to it. Sometimes it's intentional. Sometimes it's people don't know. And let's not kid ourselves. This stuff is not simple. Looking at the names of various instance sizes, there's nothing intuitive about the fact that between a T3 and a P3, One of them will cost you half a cent an hour. The other one will cost you, in some cases, upwards of $40 an hour. And there's no way to tell that by glancing at it. Building controls in, building things that report on strange usage patterns, that makes sense.
Starting point is 00:34:57 Yelling at humans is usually the most counterproductive thing you can do in this space. Yeah. I mean, you mentioned the two approaches. One is becoming a gatekeeper, and the two is kind of trailing afterwards. And I had mentioned that a lot of my work is detective work. So looking at the bill, trying to attribute the cost increases or anything else to different initiatives. And I think that's really where you can add the most value when you're trying to cut costs. But opening up those lines of communication so that you're minimizing your detective work, so that as people are doing new things, the company, or when I say the company, like finance,
Starting point is 00:35:37 the people that are not actually engineering that are aware of it and understand how that's going to affect cost. Being able to model that if possible is really where the value comes in. Absolutely. And it all comes down, in my experience anyway, to getting the people in finance and the people in engineering sitting down and talking to one another. That's something that doesn't necessarily come naturally to many of these groups because historically they don't talk. They don't need to talk. The world of cloud is changing that and being able to tie engineering more closely into business decision-making and strategic prioritization is important. I still talk to
Starting point is 00:36:17 clients and things I discover are fascinating where I come back with an assessment and I point out that the company does not value cost-cutting measures. And invariably, I'm told that I'm wrong, and that's not true at all. And then I point out that an engineer did a project that saved $8 million a year, and when they brought that up during their performance reviews, well, that wasn't tied to any of your KPIs, so we're going to ding you for not getting another feature shipped during that time period instead. And it comes down to a story of valuing what you measure. I think that there tends to be a sometimes fundamental misunderstanding.
Starting point is 00:36:53 When you start seeing engineers get bonuses for finding creative ways to save money, bounded of course, then you start to drive a culture of cost optimization. I'm not saying every company should or even most companies should, but if you are passionate as a company about saving money, you need to incentivize the behaviors that you want to see. Yeah, I think it's interesting that you said it's a culture of cost optimization because that kind of stuff is totally a culture thing. And that's really what makes it so tricky. When you have a lot of these larger, older companies that have a culture that might not fully understand and appreciate cloud usage at a higher executive type level, it can be really tricky just because the understanding isn't there. And there also has to be the ability to
Starting point is 00:37:46 want to learn from those higher up levels in order to appreciate it. And a lot of these changes are difficult. A lot of them, yeah, there's no easy answer there. Absolutely. I know how to fix the billing issues. I don't know how to make people care about it. I've spoken to companies spending nine figures a year, and they're just fine with that because it isn't in the strategic roadmap to worry about optimizing those things. Good for them. I'm not saying they're wrong. I am saying that when I see something like that and no one is empowered to care or do anything about the bill, from a pure business perspective, I have no market opportunity in those environments. And that's fine. I'm not going to be able to compel people to reprioritize things. And in many cases, I strongly suspect they're right. There's an upside potential to a lot of these businesses that goes far beyond
Starting point is 00:38:35 what you can do by optimizing costs. I can save you a theoretical maximum of 100% on your cloud bill, but you can triple that by launching the right feature to the right market at the right time. I can't tell you that it's time to optimize your bill. That has to come from you. Yeah. And that's so interesting too, because it's all business decisions and it's so much larger than just people focusing just on AWS and cloud spending because you launch a new feature, you gain X number of new clients. I think a lot of engineers might not necessarily appreciate sales as much as, I mean, this is an overgeneralization, but sales and engineering are often two very separate companies.
Starting point is 00:39:18 And a lot of time, the culture of a company doesn't necessarily mix them. And because like if you have a salesperson that brings in some large company that's bringing in millions of dollars a year, let's say, then saving the $200K on X thing doesn't really compare. But if you're just in the engineering world and not paying attention
Starting point is 00:39:37 to any of the sales side of things, you might say, okay, well, this optimization is super top priority. So you definitely need, one, people being able to view the business from the high level and make those calls. And two, just the culture of communication to be able to share the business as a whole with everybody on the team so that everybody is facing the same direction. Absolutely. Most of my clients are not engineering side, they're finance side. And talking to engineers about this, when I occasionally get outreach
Starting point is 00:40:10 from someone, the good citizen effort, as I tend to think about it, I will often find myself in conversations with engineers where they're incensed because their monthly bill is $80,000 and they think it should be $40,000. And they may be right, but I start asking questions. Okay, what does your boss say? Oh, I can't get her to pay attention to me. Okay. How many engineers are working on this? 50. Okay. What's the purpose of your group? Oh, we're chasing a market opportunity that might be worth $4 billion a year. And in six months, we'll know or not. And at that point, the answer largely becomes a, I've got news for you. Your team is embezzling more in office supplies than you're wasting on cloud costs.
Starting point is 00:40:48 And it's time bounded. And there's a bigger picture here. I appreciate what you're saying, but that's not valuable to the company strategically at this time. And that's why your boss doesn't care. That's why she's focusing on other things. I applaud the good citizen effort to save money, but that doesn't add value to what your company is working on right now. And understanding that distinction is in many ways part of the
Starting point is 00:41:10 educational process I wind up having to put some of my clients through. And it's fun. I enjoy having these conversations. I enjoy seeing how different organizations view the world. And for better or worse, you'd think that working on cloud costing would be an incredibly boring job. But I'm learning about how to stuff constantly. Every day I see something that I didn't know existed. It's really a privileged position. Yeah, I mean, if you look at anybody who actually focuses on costs and think that they know everything, that I think is a misstatement just because there are so many new things that come about every day and so many little intricacies. And especially in your role working at different companies and getting all
Starting point is 00:41:50 those different experiences, you can see how different people approach it, see one company might be using one service that another company isn't, and really get that wider perspective. And there's just so much there. So much there. There really is. So if people want to talk to you more, see what you have to say, or simply marvel at your majestic mustache, where can they find you? You can find me on Twitter. It is at Dan Berg, and Dan has two N's. So D-A-N-N-B-E-R-G. Perfect. And I'll throw a link to that in the show notes. Dan, thank you so much for taking the time to speak with me today. Yeah, thanks so much for having me. It was a pleasure.
Starting point is 00:42:28 Likewise. Dan Berg, CloudOps Analyst at Datadog. I'm Corey Quinn, and this is Screaming in the Cloud. This has been this week's episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com or wherever fine snark is sold.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.