Software at Scale - Software at Scale 56 - SaaS cost with Roi Rav-Hon
Episode Date: April 17, 2023Roi Rav-Hon is the co-founder and CEO of Finout, a SaaS cost management platform.Apple Podcasts | Spotify | Google PodcastsIn this episode, we review the challenge of maintaining reasonable SaaS c...osts for tech companies. Usage-based pricing models of infrastructure costs lead to a gradual ramp-up of costs and always have sneakily come up as a priority in my career as an infrastructure/platform engineer. So I’m particularly interested in how engineering teams can better understand, track, and “shift left” infrastructure cost tracking and prevent regressions.We specifically go over Kubernetes cost management, and why cost management needs to be attributable to the most specific teams in order to be self-governing in an organization. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev
Transcript
Discussion (0)
Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications.
I'm your host, Utsav Shah, and thank you for listening.
Hey, welcome to another episode of the Software at Scale podcast.
Joining me today is Roy Ravhon, the founder of Finna, the cloud cost management
platform. Previously, he was the director of engineering at Logs.io, an observability platform.
Thank you for being on the show. Thank you so much for having me.
Yeah, so I'd love to find out about your story, right? Like cloud cost is something that has been
bugging me since a very long time like sooner or later it ends up on the
roadmap it's like oh my god why are we spending millions of dollars on load balancers or something
silly like that got you interested in the problem and energized about solving it by creating your
own company in my previous position in logs as you mentioned i was an engineering director
responsible for the entire infrastructure.
So, you know, part of my responsibilities was to balance between SLA and cloud financial
management.
And, you know, that balance is very contradicting in most cases, right?
You know, as the one in charge of SLA, you want to have extra, you know, servers lying
around already for, you know it to be used when needed
and you're easier on the trigger with scale-ups mechanisms.
And you want to make sure that the infrastructure
has enough capacity when we need it.
But when you're putting your cloud financial management hat,
suddenly everything is the opposite, right?
You need to make sure that you're as efficient as possible
and you're only scaling when you need it and everything is right size and everything is working super, right? You need to make sure that you're as efficient as possible and you're only scaling when you need it
and everything is right size
and everything is working,
you know, super, super efficient.
So finding that balance
and what's acceptable,
what's not acceptable,
where can we be better?
How do we even educate the engineers
to really understand the implication
of each of their decisions
was something that, you know,
bugged me for a very long time in LogZero.
And most honestly, like we used some of the tooling
available in the market and we felt like, you know,
nothing really matched what we need
and nothing really helps us in getting better at,
you know, with that balance, getting better at implementing
a FinOps culture in the organization,
even before it was called FinOps.
You know, this is what started to push us out of Flux.io
to really understand that, you know,
we need to build the tool that we wanted to use.
And there's a major gap in the market
in terms of what is currently offered
versus what modern company really wants to use.
That's the story behind FinOps.
A lot of these cloud providers like AWS, GCP,
they have their own like cloud billing dashboards.
What makes them not enough in your view?
So two main factors.
One is that the incentives are never aligned. cost management company, sorry, for the cloud vendor to build the best cost management solution
that is designed to, you know, hurt their revenue is not always something that is mutually aligned
with what the company wants to do. So it is to some extent, right, because the cloud vendor wants
to keep you incentivized and well used of their solution and not overpaying where you shouldn't,
because then you will churn. But eventually, the alignment are never the same.
And the second biggest problem is that as far as the cloud provider is concerned,
they live alone.
So AWS will never support something related to Google or Snowflake
or something like that.
So they always want to encourage you to continue and use their own solutions.
And they will never help you with a cloud cost management tool to analyze costs for
different providers.
But the reality is just different, right?
So most companies are using multi-cloud and multi-service.
And we start to use more and more technologies in our stack.
And they're all usage-based price.
So in order to get, you know, just a simple overview of how much money do we spend across,
you know, our entire infrastructure,
we need to log into five different systems in order just to get the
overview. And then every allocation that we have, every budget,
every forecast, every like everything,
we need to start to like implement those solutions again and again and
again and again.
So it starts to be like very,
very cumbersome to manage costs
and even get like an accurate image of what's really happening. And sorry, this is somewhat
of a philosophical question. But the root of all of these like hard to manage or hard to control
cost problem seems to be this idea of like usage based pricing, right? It sounds very simple,
where like, use, you know, one EC2 instance, and you get billed a certain amount of time per hour,
but there's so many different configurations
and so many different services that we end up using
that it all becomes a nightmare.
What are your philosophical thoughts
on this idea of usage-based pricing?
Is it actually helpful to customers?
Is it too confusing in its current state?
Should everything just be like a monthly subscription?
Is that even feasible?
What do you think?
I think eventually users-based pricing is a huge catalyst in modern software
buying processes, right? So when talking to an engineer and you want to sell something
to an engineer, an engineer wants to buy something, in order to start and use, you know, a solution
that they picked and just paid in based on what they used, it's really, you know, the
most natural thing to do. It's like you're
going to a restaurant, you pay for
what you eat, you don't negotiate
a price a month in advance for
food consumption.
This is the way that we used to
buy in every single
aspect of our life except for software.
It only makes sense for
engineers to feel more comfortable,
engineers more empowered to take decisions.
But on the other hand, when looking at the company cost governance kind of situation,
it starts to get a lot more complicated because we don't even figure out what our budget is going to be
and how do we make sure that we are tight and we're predictable into the future.
It's a new major task that the finance team has to deal with that they never had to deal with before.
So I think it's kind of a double-edged sword because it's an enabler for the organization to run super efficient, super fast, and to only pay for what they're actually using.
But on the other hand, it's also a big headache into cost governance. And so it's a concept that's very not natural for finance teams or the finance tools even
to grasp like ERPs are not designed for business-based pricing.
So I think that we need to have that kind of balance between what's good and what's
an enabler for the company versus just not doing anything because we're afraid.
So software buying should be better, should be easier,
but also like finance should not be, you know,
left in the dark and left unanswered.
So I think, you know, user-based pricing is a very good trend,
but you also need to have some kind of financial responsibility on what you're purchasing.
And this is like part of the reasons we built Finna.
And initially, at least a few years ago,
I think cloud dashboards were primarily focused on, or like cost dashboards were primarily focused on cloud providers.
But now, as you said, there's more and more tools that are usage-based price, like Snowflake, your data management, but also your observability tools like Datadog.
You can kind of see from the stock price and market cap of Datadog that they have a pretty heavy margin, right?
So it seems like it's increasing over time, the amount of tools that are usage-based priced and need to be monitored.
Is that what you're seeing as well?
Yeah, so I know 100% the market is changing and shifting.
And it's all happening within our budgets and within our financial allocation.
So 10 years ago, you would say to a company that most of their expense is going to be OPEX and not COPEX, and they would stop buying servers.
It would look like you're insane.
So this kind of change within the way that we manage our financial is moving very strong towards the OPEX kind of process.
And one of the major catalysts is usage-based pricing.
And I think that we're picking more and more different aspects of our company and movings is usage-based pricing. And I think that, you know, we're picking more and more different aspects of our company
and moving them toward usage-based pricing.
So now we see usage-based across everything, right?
From our sales tools to our marketing tools to our infrastructure tools to, you know,
basically everything that we're buying.
And I really believe that, you know, this is the, you know, the pricing model of the
future.
It's going to be 100% usage-based, like picking, you know, and is the, you know, the pricing model of the future. It's going to be 100% usage-based,
like picking, I know,
and then choosing, like,
can negotiate any contracts
or can think of the past
or at least a combination of them
to prevail.
So for a tool or, like,
for a platform like Finout to succeed,
you also need to get enough information
from the actual vendor itself, right?
So, for example,
you need to call an API of Datadog
to know how much metrics or logs a particular customer is using is my guess. Do these systems
provide enough introspection ability to give that rich amount of data? Like how does that work
behind the scenes? How are you able to measure or accurately estimate how much I'm spending in
Snowflake or Datadog or some other tool? So it really depends on the maturity of the vendor.
AWS, for example, are very mature when it comes to show their bills. So AWS has a format called
cost and usage report where you can just get all the accurate billing information directly into
your concept. It's a complicated format, but it's still like the dollar sign is present.
But there are other solutions, you know, that are not as advanced.
So solutions like Datalog and Snowflake, you mentioned, do not have like a billing API.
They just charge you at the end of the month and you need to extrapolate what's going to
happen in the mid-month.
What you can do for a solution like that is essentially reverse engineering their billing
structure.
And, you know, for Snowflake, it's easier
because it's just
credits and storage.
For Datalog, it's a
lot more complicated
because it's priced by,
you know, numerous
different factors.
But eventually, we can
just take the usage
metrics, which are
always apparent in
usage-based price
software, because you
can just charge based
on a metric you don't
provide.
So it's a very, very
common thing to do to
show, like, your usage.
And then we can
reconstruct the invoice based on, you know know our reverse engineering how they construct their
bills so it really depends on on the vendor the data log is a bit more complicated we just
supported the databricks a few months ago and database with like a few hours before because
again a very mature vendor we just have the billing billing format you can just ingest it
and get the right thing to fin out so it really really depends on them yeah and sure, you can just ingest it and get the right thing to finish. So it really, really depends on them.
Yeah, and sure, like you can also keep track of how the industry's pricing is changing.
So I'm sure you can keep track of, oh, it looks like Datadog's increasing price for
X and over time.
And like there's all sorts of interesting trends and benchmarks you can probably come
up with.
So I'm sure there are people who would love to pay for that feature.
It's like, oh, how much am I paying for Datadog versus a similar SaaS company of my size?
Yeah, so that's a future roadmap item.
Once we start doing that, we're going to start fighting with everyone.
So we're still trying to be a good sports.
Yeah, makes sense.
So it's a really broad aspect of how you can plug in with so many different systems
so that you can kind of create a unified cost report for my company it would be really cool to have one bill right
that's some of your material online also talks about this idea of a mega bill which is you can
have like one infrastructure bill how valuable is that for customers like what do you see is that
like one of the main like selling points like I can keep track of my infrastructure in one place,
infrastructure costs in one place.
How useful does that end up being?
Yeah, so 100%.
I mean, this is our core technology
is that mega build that you mentioned.
Essentially, a mega build is a data model
that we have behind the scenes.
The way that we structure that data,
the way that we save it and make it available for search
and how we can query it,
this is our huge IP and differentiator in the market. Essentially, what this means is that
we're not building features for AWS or for Google or for DataBlock or Databricks. We're building
features for our megabit. So once we integrated another solution in that megabit, every feature
that we have in Finout is natively supported. We have budgeting and
forecasting and anomaly detection and those kinds
of stuff that we just released.
This integrated with the
mega build as a whole. So now that
every new solution that we're adding is
just automatically getting added into
that. So
what we see with companies is
supporting their migrations
between different providers and between like moving even data warehouse from AWS to Snowflake is a super common thing to do.
So once migrating that, they're starting to lose their visibility into what actually happened.
So using a solution like Finna can help them maintain that visibility.
And creating showbacks and chargebacks within an organization is a very, very big problem
that organizations are facing nowadays.
And usually it's solved using Excel.
You know, they're dumping a bunch of invoices
into Excel sheets
and then they start to extrapolate,
like, what was the size out of that bill
for each of my teams
or for each of my features
or for each of my customers?
And doing that across different cloud providers
meant that they need to redevelop the entire system
over and over and over again.
So they started to be afraid of purchasing new solutions
or they're just giving up on their accuracy
and their features that they can really attribute.
So this is indeed one of the major selling points
of finalities, our mega bill and our ability
to deal with those kinds of stuff. And the flip side that i thought was pretty interesting was that you try not to charge
based on percentage of cost savings like why is that an important thing to call out or like why
is it something that you've seen resonate with the market so the market is usually either built
based on savings currently or based on you know a ridiculous amount of percentage out of the total spend that you're analyzing.
So if it's a fixed price or percentage of the saving, very often it starts to eat up a significant part of what you could save.
So you need to generate an exponential amount of value in order to make this viable for the long run. And taking percentage
out of the entire
spend, especially in high percentages,
also kind of a greedy thing
to do because, for example, if
I decided to double
a specific instance that I'm using,
so I'm sizing one level up,
it automatically means that I'm paying double
for the cloud provider, but also
I'm paying double for the instance for the cost management solution,
but I'm not getting double the value.
I'm seeing the same amount of value per resource.
We really believe that the cost management solution is a commodity,
so we need to do it significantly cheaper than what the market is currently offering
and also be very transparent and open with our pricing
and really solve another problem instead of just becoming one of those.
We can price based on resources and we can price a flat fee for the year.
So companies don't need to worry about fee not costing fluctuating when they're cost increasing.
So we really believe that pricing models should be incentivized together with the customers and not against them.
And then going deep into cloud providers and cloud systems, right?
I've had a couple of conversations about Kubernetes, which seems to be all the rage and only
increasing regardless of the macroeconomic conditions.
So you're the building system to specifically understand and help you like help customers
understand their Kubernetes usage?
Like why is it important to measure Kubernetes directly
rather than, you know, just using a high level metric
like AWS instances or like some EKS API?
Why is it important to look at Kubernetes directly?
So think of it as eventually you're using different units
that what you're consuming
when running kubernetes right so lbs charges you by the instance and by uh you know the disk and
and those kinds of uh you know more of a low level kind of solutions but you're actually running pods
so now you want to understand like how much the application costs and you just have no idea you
need to start guessing right and the more usage usage of Kubernetes that you have, the less aware you get on what's happening.
So for example, in ChainOut, we built our service to be 100% Kubernetes.
So if we don't have the ability to understand how much we're spending for each pod or deployment
or namespace or whatever, we are just completely blind.
We just know how much money we spend on AWS and that's it. So it became one of the most modern problems.
And even for companies that are public,
it's a financial problem, really.
Kubernetes is not only a technological one
because when you just start producing metrics
on what's part of our gross margin, what's not,
and how do we allocate costs across organizations
start to be a very important and like business heavy question
that we don't have any way to just answer
based on the current tooling
that we get for free from the cloud vendors.
Yeah, Kubernetes cost is a major problem
when adopting Kubernetes.
Okay.
And I'm guessing it also gives you
like richer information about,
you know, specific deployments,
specific groups,
which you can probably tie back to specific organizational units
or teams or groups,
and see exactly which department is pending.
Does it kind of help you with deeper introspection like that?
So think of it that Kubernetes is essentially
another level of obstruction on top of our infrastructure.
When we solve that problem for cloud,
like it's an industry, it will solve.
We could take a server and based on its tags or name or account or whatever, we can
allocate it somewhere.
So we know that all the instances that starts with the letter prod are, with the word prod
are part of the production environment.
And everything that is tagged with a specific team is part of that team.
So great, now it's solved.
But now that we're installing Kubernetes, we're just creating another level of construction.
So we're still asking ourselves the same questions, but we don't have the data because we're not
built by the units that we start to consume.
So we just offloaded that problem into another level on top of it.
And from time to time, we even have like a third or fourth level of abstraction.
So if we install Elasticsearch, for example, on top of those Kubernetes pods, and then
we want to allocate the specific Elasticsearch indices on top of that.
So we always can jump another level deeper in terms of abstraction and cost management solutions need to support that.
So it doesn't matter where the truth is in terms of how far it is from the actual unit that you price.
You still need to allocate it to something.
You still need to allocate it to a customer.
You need to allocate it to someone.
This is a major problem in cloud cost management, allocating shared costs,
and Kubernetes is just one of the private cases of shared costs, but it's still, you know, one of
the biggest problems that we have. And going back from that, how do I, as an engineering leader,
like prevent a cost regression early, right? Like, that's one of those questions that comes to my mind is,
is there a way to, you know,
maybe even prevent like a regression
at like PR time
or before the commit is merged?
Like, is that something
you can guard against?
What do you think?
So to really understand
the cost implication of every commit
based on code changes
is something that I think
will be possible in the future,
but it's a very, very, very difficult problem.
So there are solutions that can help you understand, you know,
terraform changes and how it's going to impact your environment.
But that's, you know, an easier problem to solve.
Where, you know, we start to really mess up is usually with code
or configuration change that, you know, just change the entire efficiency
or, you know, just it's enough that you change from, you know,
every report from 100 milliseconds to 200 milliseconds.
And now, you know, you need to pay double for the service and you didn't even know.
So this is where it starts to get like harder and harder.
And when it comes to FinOps and adapting FinOps, I think there are two very important aspects
that engineering leaders can and should, you know, take.
The first is visualize what's happening.
Be responsible for your own costs.
If you're going to be unaware of how much money you're spending and you have just no
way of building that over time, you won't even know that something happened.
And until you realize that something is not working as it should, you're going to have
a couple of months worth of work that you now need to,
you know, reverse.
And I can understand,
like, what commit
really changed our financial model?
Like, why did we do that?
That is really, really hurting us.
So that's really, really hard to do.
So just, you know, measure, visualize,
even better, create a unit economics
directly out of your environment.
So if you're, you know,
you're in charge of a service that,
again, getting back to the reports example, that is
measured by the amount of reports that you're getting,
so measure your price per report.
And as long as the price per report is the same, it's okay
to spend more money as long as you need
to process more, but you need to make sure that
that kind of alignment remains the same.
The second tip is revisit
decisions that you have and constantly
optimize your environment.
So a common use case is you develop a new service, you have a new Kubernetes deployment,
you spin it out to production, and then you get into the point that you need to configure
your requests, right?
So how much CPU and memory do you need?
And let's be honest, you don't know, right?
So no one knows.
So you just need to guess something, deploy it to production, and you're telling yourself
a story that you will revisit
that decision.
And you're going to measure how your service is behaving in production in the upcoming
month.
And then you're going to right-size based on what's really happening.
In reality, that's a great thing to say and do, but it just never happens because you
move on to your next Jira ticket and you have a new service that you need to do that.
So if engineers are not going to be responsible for the services that they deploy to right-size them,
to terminate them when it's not used,
to pick the right technologies behind the scenes
that can support it,
it's going to be very, very, very hard
to create that financial governance.
So I think as an engineering leader,
it's very important for, you know,
the team to constantly optimize their Azure service
and to measure and be aware of everything
that you're doing.
And then, you know, combining the both, you can be like, really contribute towards the
organizational, you know, financial governance as a whole and take your specific part.
And if everyone does that, like cost is just going to decrease.
You're kind of talking about even decentralizing that decision making, right?
Like often what happens probably in an organization is you have four or five people
who have access to the cloud cost monitoring dashboard
or the cloud cost management platform.
And they're grumbling about the problem
and the engineers behind the scenes
or like in some other part of the organization
have no idea that cloud cost is such a big deal
and like how to measure it
or like which services are the most expensive
or how they should be thinking about it.
But even exposing that information consistently
and in an easy to use way
can kind of drive the cultural change
that you need for this.
Yeah, so, you know,
this is the FinOps end game, right?
So when we're talking about implementing FinOps,
getting all the way to engineers' ability
to understand and be responsible
for their spend is like,
this is our goal.
So think of it like exactly like DevOps, right?
A few years ago,
like no one even thought
that engineers are going to be responsible
for their deployment
and their SLA metrics, right?
And now it's a super common thing to do.
So like the same thing is happening
with cloud financial management as well.
So same as you don't throw a code
that you wrote into the centralized operation team
and expect them to get it all the way to production,
you don't just neglect your cloud financial management
and expect the centralized team to do everything for you.
You start to get more and more responsible
to what you're doing
and engineers are starting to get measured
based on the price of their service,
same as they were measured on SLA.
I think that industry trend certainly makes sense.
You need to bring more information closer
to the person who is working on the system.
I think that whole idea of shifting left is like,
I've heard it across security,
definitely across operations, developer experience.
You kind of want engineers to know
what the impact of their work is on other engineers.
And it's similar on cost, right?
So that trend makes sense.
With the new, the current macro environment,
are you seeing a shift in how people are thinking about cost management platforms?
My guess from the complete outside is that there's more demand for these platforms.
But what are you seeing?
Yeah, so really 100%.
In 2021, money was infinite and no one really thought about creating, you know, a company that is financially viable and spends as much money as we need it to earn anything.
And now we're going to turn it into a free game and everything changed. CFOs are getting stronger in the organization and demanding better answers.
And suddenly the gross margin is one of the top priorities.
So we can reduce headcount in order to reduce burn right now to be more responsible with
the investor money.
But the end game here and really create a company that is better and more stable, we
need to be able to
sell our service in higher margins and to contribute to better lifetime values and to
reduce our customer acquisition costs. And so every team in the company starts to have more
financial-oriented KPIs. And the cost of our service is one of the main ones. So companies
can no longer just push that for a later date.
We need to start to deal with reality.
We need to start measure and adapt towards success
and make sure that we're running on the right path and direction.
So yeah, 2023 is the year of cloud financial management.
Just take a look at the Google Trends for the word FinOps
to get a better understanding of that.
I think I'm going to try to embed a screenshot
of that trend in our episode show notes.
Yeah, but I think that makes sense to me.
I think the idea of giving deeper insights
with tools that people directly use,
like Kubernetes makes sense.
I certainly have this problem at work,
but it's just Datadog is way too expensive for me to use and i have one question actually
on that note like are you seeing customers move to more like on-prem kind of systems so that they're
not paying these like usage-based price models like i can think of you know moving off datadog
and moving to like something like an open telemetry, Prometheus, Grafana kind of setup because it's too expensive.
I'm curious if you have any insight into that actually happening across the industry or not.
So most CFOs are not going to require you to turn from a service that gives you value and that you enjoy and really make you better at what you're doing
just because of budgets.
We just want to make sure that we're utilizing that service
to the fullest.
So it's okay to spend money for Datalog.
Datalog is the best monitoring service out there probably, right?
So you can spend money on Datalog
or you can start to build your own
using Prometheus and OpenElementary,
but you're going to spend so many hours just making sure that this solution is working.
And even when you're done spending so many hours making it work and managing it, you
still won't get the same outcome as you would with Datadog.
So it's okay to spend money for Datadog, but you need to make sure that you're spending
the right amount.
Are you utilizing all services properly?
Maybe you're paying for Datadog products that you're not using. Maybe you have a huge part of your environment that you're spending the right amount. Are you utilizing all services properly? Maybe you're paying for data box products
that you're not using.
Maybe you have a huge part of your environments
that you never look.
So you can just drop all their logs
and drop all their metrics
because you don't care about it.
It's very important to make sure
that we can justify our expense
and that we're optimizing it
to where it should be.
And it's same with AWS,
same with Snowflake,
same with every service,
but it's important not to waste
money it's important to use money wisely yeah i think that resonates because i don't see a cost
driven migration ever like get executed or get prioritized unless you know the cost has
radically gone up like 10x in a single year especially if you have so many levers to control
cost like that's always going to be the first step and then you're going to think about okay in a single year, especially if you have so many levers to control cost.
That's always going to be the first step.
And then you're going to think about, OK, long term, this isn't sustainable for us to stay on this platform.
So we have to move.
But yeah, that's right.
You kind of have to try to clean up your own home before you say, I need to move and do
something else.
Yep.
Well, Roy, thank you so much for being on the show.
I think it's been really informative,
at least for me to understand how to think about cost management. Thanks so much.
Thank you so much for having me. I had a blast.