Software at Scale - Software at Scale 56 - SaaS cost with Roi Rav-Hon

Episode Date: April 17, 2023

Roi Rav-Hon is the co-founder and CEO of Finout, a SaaS cost management platform.Apple Podcasts | Spotify | Google PodcastsIn this episode, we review the challenge of maintaining reasonable SaaS c...osts for tech companies. Usage-based pricing models of infrastructure costs lead to a gradual ramp-up of costs and always have sneakily come up as a priority in my career as an infrastructure/platform engineer. So I’m particularly interested in how engineering teams can better understand, track, and “shift left” infrastructure cost tracking and prevent regressions.We specifically go over Kubernetes cost management, and why cost management needs to be attributable to the most specific teams in order to be self-governing in an organization. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications. I'm your host, Utsav Shah, and thank you for listening. Hey, welcome to another episode of the Software at Scale podcast. Joining me today is Roy Ravhon, the founder of Finna, the cloud cost management platform. Previously, he was the director of engineering at Logs.io, an observability platform. Thank you for being on the show. Thank you so much for having me. Yeah, so I'd love to find out about your story, right? Like cloud cost is something that has been bugging me since a very long time like sooner or later it ends up on the
Starting point is 00:00:46 roadmap it's like oh my god why are we spending millions of dollars on load balancers or something silly like that got you interested in the problem and energized about solving it by creating your own company in my previous position in logs as you mentioned i was an engineering director responsible for the entire infrastructure. So, you know, part of my responsibilities was to balance between SLA and cloud financial management. And, you know, that balance is very contradicting in most cases, right? You know, as the one in charge of SLA, you want to have extra, you know, servers lying
Starting point is 00:01:23 around already for, you know it to be used when needed and you're easier on the trigger with scale-ups mechanisms. And you want to make sure that the infrastructure has enough capacity when we need it. But when you're putting your cloud financial management hat, suddenly everything is the opposite, right? You need to make sure that you're as efficient as possible and you're only scaling when you need it and everything is right size and everything is working super, right? You need to make sure that you're as efficient as possible and you're only scaling when you need it
Starting point is 00:01:45 and everything is right size and everything is working, you know, super, super efficient. So finding that balance and what's acceptable, what's not acceptable, where can we be better? How do we even educate the engineers
Starting point is 00:01:58 to really understand the implication of each of their decisions was something that, you know, bugged me for a very long time in LogZero. And most honestly, like we used some of the tooling available in the market and we felt like, you know, nothing really matched what we need and nothing really helps us in getting better at,
Starting point is 00:02:18 you know, with that balance, getting better at implementing a FinOps culture in the organization, even before it was called FinOps. You know, this is what started to push us out of Flux.io to really understand that, you know, we need to build the tool that we wanted to use. And there's a major gap in the market in terms of what is currently offered
Starting point is 00:02:37 versus what modern company really wants to use. That's the story behind FinOps. A lot of these cloud providers like AWS, GCP, they have their own like cloud billing dashboards. What makes them not enough in your view? So two main factors. One is that the incentives are never aligned. cost management company, sorry, for the cloud vendor to build the best cost management solution that is designed to, you know, hurt their revenue is not always something that is mutually aligned
Starting point is 00:03:12 with what the company wants to do. So it is to some extent, right, because the cloud vendor wants to keep you incentivized and well used of their solution and not overpaying where you shouldn't, because then you will churn. But eventually, the alignment are never the same. And the second biggest problem is that as far as the cloud provider is concerned, they live alone. So AWS will never support something related to Google or Snowflake or something like that. So they always want to encourage you to continue and use their own solutions.
Starting point is 00:03:45 And they will never help you with a cloud cost management tool to analyze costs for different providers. But the reality is just different, right? So most companies are using multi-cloud and multi-service. And we start to use more and more technologies in our stack. And they're all usage-based price. So in order to get, you know, just a simple overview of how much money do we spend across, you know, our entire infrastructure,
Starting point is 00:04:09 we need to log into five different systems in order just to get the overview. And then every allocation that we have, every budget, every forecast, every like everything, we need to start to like implement those solutions again and again and again and again. So it starts to be like very, very cumbersome to manage costs and even get like an accurate image of what's really happening. And sorry, this is somewhat
Starting point is 00:04:30 of a philosophical question. But the root of all of these like hard to manage or hard to control cost problem seems to be this idea of like usage based pricing, right? It sounds very simple, where like, use, you know, one EC2 instance, and you get billed a certain amount of time per hour, but there's so many different configurations and so many different services that we end up using that it all becomes a nightmare. What are your philosophical thoughts on this idea of usage-based pricing?
Starting point is 00:04:56 Is it actually helpful to customers? Is it too confusing in its current state? Should everything just be like a monthly subscription? Is that even feasible? What do you think? I think eventually users-based pricing is a huge catalyst in modern software buying processes, right? So when talking to an engineer and you want to sell something to an engineer, an engineer wants to buy something, in order to start and use, you know, a solution
Starting point is 00:05:19 that they picked and just paid in based on what they used, it's really, you know, the most natural thing to do. It's like you're going to a restaurant, you pay for what you eat, you don't negotiate a price a month in advance for food consumption. This is the way that we used to buy in every single
Starting point is 00:05:37 aspect of our life except for software. It only makes sense for engineers to feel more comfortable, engineers more empowered to take decisions. But on the other hand, when looking at the company cost governance kind of situation, it starts to get a lot more complicated because we don't even figure out what our budget is going to be and how do we make sure that we are tight and we're predictable into the future. It's a new major task that the finance team has to deal with that they never had to deal with before.
Starting point is 00:06:09 So I think it's kind of a double-edged sword because it's an enabler for the organization to run super efficient, super fast, and to only pay for what they're actually using. But on the other hand, it's also a big headache into cost governance. And so it's a concept that's very not natural for finance teams or the finance tools even to grasp like ERPs are not designed for business-based pricing. So I think that we need to have that kind of balance between what's good and what's an enabler for the company versus just not doing anything because we're afraid. So software buying should be better, should be easier, but also like finance should not be, you know, left in the dark and left unanswered.
Starting point is 00:06:52 So I think, you know, user-based pricing is a very good trend, but you also need to have some kind of financial responsibility on what you're purchasing. And this is like part of the reasons we built Finna. And initially, at least a few years ago, I think cloud dashboards were primarily focused on, or like cost dashboards were primarily focused on cloud providers. But now, as you said, there's more and more tools that are usage-based price, like Snowflake, your data management, but also your observability tools like Datadog. You can kind of see from the stock price and market cap of Datadog that they have a pretty heavy margin, right? So it seems like it's increasing over time, the amount of tools that are usage-based priced and need to be monitored.
Starting point is 00:07:33 Is that what you're seeing as well? Yeah, so I know 100% the market is changing and shifting. And it's all happening within our budgets and within our financial allocation. So 10 years ago, you would say to a company that most of their expense is going to be OPEX and not COPEX, and they would stop buying servers. It would look like you're insane. So this kind of change within the way that we manage our financial is moving very strong towards the OPEX kind of process. And one of the major catalysts is usage-based pricing. And I think that we're picking more and more different aspects of our company and movings is usage-based pricing. And I think that, you know, we're picking more and more different aspects of our company
Starting point is 00:08:07 and moving them toward usage-based pricing. So now we see usage-based across everything, right? From our sales tools to our marketing tools to our infrastructure tools to, you know, basically everything that we're buying. And I really believe that, you know, this is the, you know, the pricing model of the future. It's going to be 100% usage-based, like picking, you know, and is the, you know, the pricing model of the future. It's going to be 100% usage-based, like picking, I know,
Starting point is 00:08:27 and then choosing, like, can negotiate any contracts or can think of the past or at least a combination of them to prevail. So for a tool or, like, for a platform like Finout to succeed, you also need to get enough information
Starting point is 00:08:40 from the actual vendor itself, right? So, for example, you need to call an API of Datadog to know how much metrics or logs a particular customer is using is my guess. Do these systems provide enough introspection ability to give that rich amount of data? Like how does that work behind the scenes? How are you able to measure or accurately estimate how much I'm spending in Snowflake or Datadog or some other tool? So it really depends on the maturity of the vendor. AWS, for example, are very mature when it comes to show their bills. So AWS has a format called
Starting point is 00:09:16 cost and usage report where you can just get all the accurate billing information directly into your concept. It's a complicated format, but it's still like the dollar sign is present. But there are other solutions, you know, that are not as advanced. So solutions like Datalog and Snowflake, you mentioned, do not have like a billing API. They just charge you at the end of the month and you need to extrapolate what's going to happen in the mid-month. What you can do for a solution like that is essentially reverse engineering their billing structure.
Starting point is 00:09:43 And, you know, for Snowflake, it's easier because it's just credits and storage. For Datalog, it's a lot more complicated because it's priced by, you know, numerous different factors.
Starting point is 00:09:51 But eventually, we can just take the usage metrics, which are always apparent in usage-based price software, because you can just charge based on a metric you don't
Starting point is 00:10:00 provide. So it's a very, very common thing to do to show, like, your usage. And then we can reconstruct the invoice based on, you know know our reverse engineering how they construct their bills so it really depends on on the vendor the data log is a bit more complicated we just supported the databricks a few months ago and database with like a few hours before because
Starting point is 00:10:18 again a very mature vendor we just have the billing billing format you can just ingest it and get the right thing to fin out so it really really depends on them yeah and sure, you can just ingest it and get the right thing to finish. So it really, really depends on them. Yeah, and sure, like you can also keep track of how the industry's pricing is changing. So I'm sure you can keep track of, oh, it looks like Datadog's increasing price for X and over time. And like there's all sorts of interesting trends and benchmarks you can probably come up with. So I'm sure there are people who would love to pay for that feature.
Starting point is 00:10:42 It's like, oh, how much am I paying for Datadog versus a similar SaaS company of my size? Yeah, so that's a future roadmap item. Once we start doing that, we're going to start fighting with everyone. So we're still trying to be a good sports. Yeah, makes sense. So it's a really broad aspect of how you can plug in with so many different systems so that you can kind of create a unified cost report for my company it would be really cool to have one bill right that's some of your material online also talks about this idea of a mega bill which is you can
Starting point is 00:11:15 have like one infrastructure bill how valuable is that for customers like what do you see is that like one of the main like selling points like I can keep track of my infrastructure in one place, infrastructure costs in one place. How useful does that end up being? Yeah, so 100%. I mean, this is our core technology is that mega build that you mentioned. Essentially, a mega build is a data model
Starting point is 00:11:37 that we have behind the scenes. The way that we structure that data, the way that we save it and make it available for search and how we can query it, this is our huge IP and differentiator in the market. Essentially, what this means is that we're not building features for AWS or for Google or for DataBlock or Databricks. We're building features for our megabit. So once we integrated another solution in that megabit, every feature that we have in Finout is natively supported. We have budgeting and
Starting point is 00:12:06 forecasting and anomaly detection and those kinds of stuff that we just released. This integrated with the mega build as a whole. So now that every new solution that we're adding is just automatically getting added into that. So what we see with companies is
Starting point is 00:12:22 supporting their migrations between different providers and between like moving even data warehouse from AWS to Snowflake is a super common thing to do. So once migrating that, they're starting to lose their visibility into what actually happened. So using a solution like Finna can help them maintain that visibility. And creating showbacks and chargebacks within an organization is a very, very big problem that organizations are facing nowadays. And usually it's solved using Excel. You know, they're dumping a bunch of invoices
Starting point is 00:12:53 into Excel sheets and then they start to extrapolate, like, what was the size out of that bill for each of my teams or for each of my features or for each of my customers? And doing that across different cloud providers meant that they need to redevelop the entire system
Starting point is 00:13:06 over and over and over again. So they started to be afraid of purchasing new solutions or they're just giving up on their accuracy and their features that they can really attribute. So this is indeed one of the major selling points of finalities, our mega bill and our ability to deal with those kinds of stuff. And the flip side that i thought was pretty interesting was that you try not to charge based on percentage of cost savings like why is that an important thing to call out or like why
Starting point is 00:13:36 is it something that you've seen resonate with the market so the market is usually either built based on savings currently or based on you know a ridiculous amount of percentage out of the total spend that you're analyzing. So if it's a fixed price or percentage of the saving, very often it starts to eat up a significant part of what you could save. So you need to generate an exponential amount of value in order to make this viable for the long run. And taking percentage out of the entire spend, especially in high percentages, also kind of a greedy thing to do because, for example, if
Starting point is 00:14:13 I decided to double a specific instance that I'm using, so I'm sizing one level up, it automatically means that I'm paying double for the cloud provider, but also I'm paying double for the instance for the cost management solution, but I'm not getting double the value. I'm seeing the same amount of value per resource.
Starting point is 00:14:32 We really believe that the cost management solution is a commodity, so we need to do it significantly cheaper than what the market is currently offering and also be very transparent and open with our pricing and really solve another problem instead of just becoming one of those. We can price based on resources and we can price a flat fee for the year. So companies don't need to worry about fee not costing fluctuating when they're cost increasing. So we really believe that pricing models should be incentivized together with the customers and not against them. And then going deep into cloud providers and cloud systems, right?
Starting point is 00:15:09 I've had a couple of conversations about Kubernetes, which seems to be all the rage and only increasing regardless of the macroeconomic conditions. So you're the building system to specifically understand and help you like help customers understand their Kubernetes usage? Like why is it important to measure Kubernetes directly rather than, you know, just using a high level metric like AWS instances or like some EKS API? Why is it important to look at Kubernetes directly?
Starting point is 00:15:40 So think of it as eventually you're using different units that what you're consuming when running kubernetes right so lbs charges you by the instance and by uh you know the disk and and those kinds of uh you know more of a low level kind of solutions but you're actually running pods so now you want to understand like how much the application costs and you just have no idea you need to start guessing right and the more usage usage of Kubernetes that you have, the less aware you get on what's happening. So for example, in ChainOut, we built our service to be 100% Kubernetes. So if we don't have the ability to understand how much we're spending for each pod or deployment
Starting point is 00:16:18 or namespace or whatever, we are just completely blind. We just know how much money we spend on AWS and that's it. So it became one of the most modern problems. And even for companies that are public, it's a financial problem, really. Kubernetes is not only a technological one because when you just start producing metrics on what's part of our gross margin, what's not, and how do we allocate costs across organizations
Starting point is 00:16:41 start to be a very important and like business heavy question that we don't have any way to just answer based on the current tooling that we get for free from the cloud vendors. Yeah, Kubernetes cost is a major problem when adopting Kubernetes. Okay. And I'm guessing it also gives you
Starting point is 00:16:58 like richer information about, you know, specific deployments, specific groups, which you can probably tie back to specific organizational units or teams or groups, and see exactly which department is pending. Does it kind of help you with deeper introspection like that? So think of it that Kubernetes is essentially
Starting point is 00:17:16 another level of obstruction on top of our infrastructure. When we solve that problem for cloud, like it's an industry, it will solve. We could take a server and based on its tags or name or account or whatever, we can allocate it somewhere. So we know that all the instances that starts with the letter prod are, with the word prod are part of the production environment. And everything that is tagged with a specific team is part of that team.
Starting point is 00:17:39 So great, now it's solved. But now that we're installing Kubernetes, we're just creating another level of construction. So we're still asking ourselves the same questions, but we don't have the data because we're not built by the units that we start to consume. So we just offloaded that problem into another level on top of it. And from time to time, we even have like a third or fourth level of abstraction. So if we install Elasticsearch, for example, on top of those Kubernetes pods, and then we want to allocate the specific Elasticsearch indices on top of that.
Starting point is 00:18:06 So we always can jump another level deeper in terms of abstraction and cost management solutions need to support that. So it doesn't matter where the truth is in terms of how far it is from the actual unit that you price. You still need to allocate it to something. You still need to allocate it to a customer. You need to allocate it to someone. This is a major problem in cloud cost management, allocating shared costs, and Kubernetes is just one of the private cases of shared costs, but it's still, you know, one of the biggest problems that we have. And going back from that, how do I, as an engineering leader,
Starting point is 00:18:38 like prevent a cost regression early, right? Like, that's one of those questions that comes to my mind is, is there a way to, you know, maybe even prevent like a regression at like PR time or before the commit is merged? Like, is that something you can guard against? What do you think?
Starting point is 00:18:56 So to really understand the cost implication of every commit based on code changes is something that I think will be possible in the future, but it's a very, very, very difficult problem. So there are solutions that can help you understand, you know, terraform changes and how it's going to impact your environment.
Starting point is 00:19:11 But that's, you know, an easier problem to solve. Where, you know, we start to really mess up is usually with code or configuration change that, you know, just change the entire efficiency or, you know, just it's enough that you change from, you know, every report from 100 milliseconds to 200 milliseconds. And now, you know, you need to pay double for the service and you didn't even know. So this is where it starts to get like harder and harder. And when it comes to FinOps and adapting FinOps, I think there are two very important aspects
Starting point is 00:19:39 that engineering leaders can and should, you know, take. The first is visualize what's happening. Be responsible for your own costs. If you're going to be unaware of how much money you're spending and you have just no way of building that over time, you won't even know that something happened. And until you realize that something is not working as it should, you're going to have a couple of months worth of work that you now need to, you know, reverse.
Starting point is 00:20:07 And I can understand, like, what commit really changed our financial model? Like, why did we do that? That is really, really hurting us. So that's really, really hard to do. So just, you know, measure, visualize, even better, create a unit economics
Starting point is 00:20:19 directly out of your environment. So if you're, you know, you're in charge of a service that, again, getting back to the reports example, that is measured by the amount of reports that you're getting, so measure your price per report. And as long as the price per report is the same, it's okay to spend more money as long as you need
Starting point is 00:20:34 to process more, but you need to make sure that that kind of alignment remains the same. The second tip is revisit decisions that you have and constantly optimize your environment. So a common use case is you develop a new service, you have a new Kubernetes deployment, you spin it out to production, and then you get into the point that you need to configure your requests, right?
Starting point is 00:20:53 So how much CPU and memory do you need? And let's be honest, you don't know, right? So no one knows. So you just need to guess something, deploy it to production, and you're telling yourself a story that you will revisit that decision. And you're going to measure how your service is behaving in production in the upcoming month.
Starting point is 00:21:10 And then you're going to right-size based on what's really happening. In reality, that's a great thing to say and do, but it just never happens because you move on to your next Jira ticket and you have a new service that you need to do that. So if engineers are not going to be responsible for the services that they deploy to right-size them, to terminate them when it's not used, to pick the right technologies behind the scenes that can support it, it's going to be very, very, very hard
Starting point is 00:21:33 to create that financial governance. So I think as an engineering leader, it's very important for, you know, the team to constantly optimize their Azure service and to measure and be aware of everything that you're doing. And then, you know, combining the both, you can be like, really contribute towards the organizational, you know, financial governance as a whole and take your specific part.
Starting point is 00:21:55 And if everyone does that, like cost is just going to decrease. You're kind of talking about even decentralizing that decision making, right? Like often what happens probably in an organization is you have four or five people who have access to the cloud cost monitoring dashboard or the cloud cost management platform. And they're grumbling about the problem and the engineers behind the scenes or like in some other part of the organization
Starting point is 00:22:19 have no idea that cloud cost is such a big deal and like how to measure it or like which services are the most expensive or how they should be thinking about it. But even exposing that information consistently and in an easy to use way can kind of drive the cultural change that you need for this.
Starting point is 00:22:36 Yeah, so, you know, this is the FinOps end game, right? So when we're talking about implementing FinOps, getting all the way to engineers' ability to understand and be responsible for their spend is like, this is our goal. So think of it like exactly like DevOps, right?
Starting point is 00:22:52 A few years ago, like no one even thought that engineers are going to be responsible for their deployment and their SLA metrics, right? And now it's a super common thing to do. So like the same thing is happening with cloud financial management as well.
Starting point is 00:23:04 So same as you don't throw a code that you wrote into the centralized operation team and expect them to get it all the way to production, you don't just neglect your cloud financial management and expect the centralized team to do everything for you. You start to get more and more responsible to what you're doing and engineers are starting to get measured
Starting point is 00:23:20 based on the price of their service, same as they were measured on SLA. I think that industry trend certainly makes sense. You need to bring more information closer to the person who is working on the system. I think that whole idea of shifting left is like, I've heard it across security, definitely across operations, developer experience.
Starting point is 00:23:43 You kind of want engineers to know what the impact of their work is on other engineers. And it's similar on cost, right? So that trend makes sense. With the new, the current macro environment, are you seeing a shift in how people are thinking about cost management platforms? My guess from the complete outside is that there's more demand for these platforms. But what are you seeing?
Starting point is 00:24:06 Yeah, so really 100%. In 2021, money was infinite and no one really thought about creating, you know, a company that is financially viable and spends as much money as we need it to earn anything. And now we're going to turn it into a free game and everything changed. CFOs are getting stronger in the organization and demanding better answers. And suddenly the gross margin is one of the top priorities. So we can reduce headcount in order to reduce burn right now to be more responsible with the investor money. But the end game here and really create a company that is better and more stable, we need to be able to
Starting point is 00:24:45 sell our service in higher margins and to contribute to better lifetime values and to reduce our customer acquisition costs. And so every team in the company starts to have more financial-oriented KPIs. And the cost of our service is one of the main ones. So companies can no longer just push that for a later date. We need to start to deal with reality. We need to start measure and adapt towards success and make sure that we're running on the right path and direction. So yeah, 2023 is the year of cloud financial management.
Starting point is 00:25:21 Just take a look at the Google Trends for the word FinOps to get a better understanding of that. I think I'm going to try to embed a screenshot of that trend in our episode show notes. Yeah, but I think that makes sense to me. I think the idea of giving deeper insights with tools that people directly use, like Kubernetes makes sense.
Starting point is 00:25:42 I certainly have this problem at work, but it's just Datadog is way too expensive for me to use and i have one question actually on that note like are you seeing customers move to more like on-prem kind of systems so that they're not paying these like usage-based price models like i can think of you know moving off datadog and moving to like something like an open telemetry, Prometheus, Grafana kind of setup because it's too expensive. I'm curious if you have any insight into that actually happening across the industry or not. So most CFOs are not going to require you to turn from a service that gives you value and that you enjoy and really make you better at what you're doing just because of budgets.
Starting point is 00:26:29 We just want to make sure that we're utilizing that service to the fullest. So it's okay to spend money for Datalog. Datalog is the best monitoring service out there probably, right? So you can spend money on Datalog or you can start to build your own using Prometheus and OpenElementary, but you're going to spend so many hours just making sure that this solution is working.
Starting point is 00:26:49 And even when you're done spending so many hours making it work and managing it, you still won't get the same outcome as you would with Datadog. So it's okay to spend money for Datadog, but you need to make sure that you're spending the right amount. Are you utilizing all services properly? Maybe you're paying for Datadog products that you're not using. Maybe you have a huge part of your environment that you're spending the right amount. Are you utilizing all services properly? Maybe you're paying for data box products that you're not using. Maybe you have a huge part of your environments
Starting point is 00:27:08 that you never look. So you can just drop all their logs and drop all their metrics because you don't care about it. It's very important to make sure that we can justify our expense and that we're optimizing it to where it should be.
Starting point is 00:27:20 And it's same with AWS, same with Snowflake, same with every service, but it's important not to waste money it's important to use money wisely yeah i think that resonates because i don't see a cost driven migration ever like get executed or get prioritized unless you know the cost has radically gone up like 10x in a single year especially if you have so many levers to control cost like that's always going to be the first step and then you're going to think about okay in a single year, especially if you have so many levers to control cost.
Starting point is 00:27:46 That's always going to be the first step. And then you're going to think about, OK, long term, this isn't sustainable for us to stay on this platform. So we have to move. But yeah, that's right. You kind of have to try to clean up your own home before you say, I need to move and do something else. Yep. Well, Roy, thank you so much for being on the show.
Starting point is 00:28:03 I think it's been really informative, at least for me to understand how to think about cost management. Thanks so much. Thank you so much for having me. I had a blast.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.