The Good Tech Companies - The Economics of Public Cloud Repatriation and Why It Is Cost-prohibitive at Scale
Episode Date: September 16, 2024This story was originally published on HackerNoon at: https://hackernoon.com/the-economics-of-public-cloud-repatriation-and-why-it-is-cost-prohibitive-at-scale. The publ...ic cloud doesn't deliver cost savings at scale. It delivers productivity gains, to a point, but it will not reduce your costs. Check more stories related to cloud at: https://hackernoon.com/c/cloud. You can also check exclusive content about #minio, #minio-blog, #public-cloud, #public-cloud-repatriation, #cloud-computing, #modern-datalake, #kubernetes, #good-company, and more. This story was written by: @minio. Learn more about this writer by checking @minio's about page, and for more stories, please visit hackernoon.com. The public cloud doesn't deliver cost savings at scale. It delivers productivity gains, to a point, but it will not reduce your costs.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
The Economics of Public Cloud Repatriation and Why It is Cost-Prohibitive at Scale, by Minio.
What has become clear over the past couple of years is that the public cloud,
for all of its benefits, doesn't deliver cost savings at scale.
It delivers productivity gains, to a point, but it will not reduce your costs.
There is goodness in the public cloud as
it offers an incredibly powerful value proposition, infrastructure available immediately, at exactly
the scale needed by the business, driving efficiencies both in operations and economics.
The cloud also helps cultivate innovation as company resources are freed up to focus on new
products and growth. However, the mere act of interacting with your
data generates egress costs, which have been shown to be egregiously predatory.
Thesis particularly true when the applications and workloads are persistent, consistent,
and data-intensive, high-volume, velocity, variety of read and write calls, or involve
high-performance analytics. They just are not sustainable in the public cloud as they grow.
As industry experience with the cloud matures, and we see a more complete picture of cloud
lifecycle on a company's economics, it's becoming evident that while cloud clearly
delivers on its promise early on in a company's journey, the pressure it puts on margins can
start to outweigh the benefits, as a company scales and growth slows.
Sarah Wong and Martin Casado, Andreessen Horowitz,
2021 that take, while incredibly prescient, was from 2021. In 2024, data has grown an average of
approximately 20% per year according to an IDC study from 2022. The workload shave gotten bigger
and scale has become the problem. Not the technology of scaling, but the cost, specifically,
of scaling in the public cloud. According to David Linthicum, there are three main reasons
the public cloud is being kicked to the curb. Cost. For certain workloads, it's just too expensive
to run them in the cloud. Commodity hardware prices have fallen so far in the last few years
that hardware isn't the huge capex that it used to be. Failed migrations, workloads that have not been refactored optimally or adjusted to be
cloud-native have ended up costing approximately 2.5x what they were originally projected to cost.
Inefficient apps on-premise turned out to be inefficient in the cloud.
Making them more efficient is costing too much and ending up not being worth it.
Diminishing need, applications that originally needed to be spun up quickly and efficiently as well as able to scale have scaled in the cloud but now are just a machine of repetitive tasks and
data storage these apps no longer benefit from the fast scalability the cloud can provide
and are now just utilizing a lot of expensive storage. The need is no longer there
for a flexible, quickly scalable model. The commoditization of hardware has presented a new,
cost-efficient way to run these workloads. According to a recent Barclays CIO poll,
many CIOs agree. From that same A16Z article, in 2017, Dropbox detailed in its S1 a whopping
$75 million in cumulative savings over the two
years prior to IPO due to their infrastructure optimization overhaul, the majority of which
entailed repatriating workloads from public cloud. When your cloud costs start to hover around 50%
or more of your cost of revenue, like Asana, Datadog, Prerender, EO, and others, it's time
to start looking at what your workloads are
doing in the public cloud. Organizational and business leadership need to be aware of this
so they can pivot. Certain workloads, such as running a data analytics cube, in-memory database,
or a data analytics cluster are better fits for on-prem infrastructure. But these are just a few
examples. To focus in on a particular trend that will be impacted by
this scale problem, let's look at AI, ML, and specifically, LLMs, large language models.
If your current AI initiative has you building your own LLM or foundation model,
consider the cons of doing it the public cloud one. High costs of scale, training and running
LLMs at scale is expensive, and as the LLM gets bigger, so do the costs of public cloud. 2. Loss of control. You have less control
and visibility over implementation, infrastructure, and performance.
3. Vendor lock-in. If you have trained LLMs on one cloud platform,
it will be difficult to port to a different platform. Furthermore, depending solely on a
single cloud provider entails inherent
risks, particularly concerning policy and price fluctuations. Backslash.4. Data privacy and
security. I would also mention data sovereignty here. The bottom line is that you are trusting
your data to a provider with servers spread in worldwide regions. Backslash. If your enterprise
is dealing with petabytes or
trending to that kind of scale, the economics favor the private cloud. Yes, that means building
out the infrastructure or leasing it from someone like Equinix, including real estate, HW, power,
cooling, but the economics are still highly favorable. The public cloud is an amazing place
to learn the cloud-native way and to get access to a
portfolio of cloud-native applications, but it is not an amazing place to scale.
A.N. Example of the economics So, what are the economics?
For illustration, let's take a 10-petabytes modern data lake that uses Kubernetes to manage
Apache Spark and Dremio for persistent and consistent analytics workloads.
These types of workloads require
frequent data reads and writes from object storage for analysis, updating and refreshing,
and presentation. From a cost structure perspective, we will use some assumptions
for the main cost drivers these data lakes and workloads have limited utility if we can't use
the data. The data provides insights, serves other applications, and may need to be processed
outside of the storage environment. This requires the data to be transferred out of storage.
If we assume 500 terabytes per month being accessed, that only represents only 5% of the
data being accessed per month. Backslash dot. For data, object requests, puts, gets, heads, etc.
We have worked with customers of similar consistent and persistent
workloads that see over 10B object requests per month. So, we can use 10B as a conservative
assumption for this type of workload. Backslash. Similarly, those same customers see around the
same number of encryption requests for those objects, so again using 10B as a conservative
assumption for our example. Backslash dot.
With those assumptions, the cost of public cloud could look something like this.
Annual public cloud costs for 10 petabytes equals $7. 3 meters or $0. 061 per gigabyte,
moth assumptions above are just that, and the fact that there are so many tells you how variable the
costs can be depending on the particular usage and workload factors. This creates significant challenges in trying to budget.
In addition, having no tiering or any data lifecycle activity is also somewhat rare,
AS organizations usually move data to colder tiers if the data becomes less
active. But all of that just adds to the cost, as different tiers have different prices per
gigabyte per month, as well as a cost for automatically moving objects into those tiers.
Minio allows you to scale on the private cloud, colo or a data center, using the same technologies
that are used on the public cloud. S3 API-compatible object storage, dense compute,
high-speed networking, Kubernetes, containers and microservices.
One major difference is there are no costs for object requests, gets, puts, etc. Nor
are there any limits on the number of requests, as long as the infrastructure supports it.
In addition, encryption is included with the Minio Enterprise and Community versions and
there are no limits on the number of encrypted objects requested. This optionality offers the ideal mix of operational costs, flexibility, and control.
It is true that you will take on capex for hardware, but by starting small and taking
advantage of key cloud lessons, elasticity, scaling by component, decoupling compute from
storage, enterprises can minimize the initial outlay and maximize the operational savings.
When paired with commodity hardware and operating in a colo or proprietary data center,
Minio can reduce those public cloud costs, as well as costs associated with managing those
cloud costs, by anywhere between 50% to 70%, on Dean some cases, higher. Annual colo, mini-o costs for 10 petabytes equals $1.7 meters per year,
or $0.014 per gigabyte, mo that equates to approximately 77% reduction in storage costs
for 10 petabytes of storage compared to public cloud. Even for smaller storage capacity needs,
200 terabytes to 2 petabytes, the savings are worth exploring. Not to mention you
get the industry's best storage performance, a built-in firewall for bucket-level security,
observability that is specifically designed for object storage, and many other value-added
features that would cost you extra in a public cloud. The resource factor one additional element
that is worth a quick analysis is resources, the humankind.
We have heard from our customers that the number of resources required to manage public cloud infrastructures can range from 5 to 10 FTEs depending on the size of the cloud infrastructure.
That includes cloud engineers, cloud team leads, DevOps engineers, and cloud PMs.
Using salary ranges and medians from Glassdoor, those FTE costs can range from $700,000
minus $1.5 meters per year, fully loaded. We also hear from our customers, 76% of them,
in a recent survey, that one-off Minio's key value drivers is its ease of use and manageability.
That same survey found that 60% of them cited Minio's ability to deliver improved operational
efficiency. Greater than Minio has reduced the cost of support and maintenance for us.
Professional services company. Greater than Minio as a product is a very good storage solution.
It has reduced cost greater than of resources by more than 50%.
A leading technology solution provider specializing in end-to-end DevOps
offerings. Internally, we use Minio for lots of different workloads, storage needs, testing,
etc. And our estimates are that Minio can be managed by 1FTE3FTE for PB plus infrastructures.
That allows for massive infrastructure at scale with minimal resources.
Getting started now that you've seen how and why the economics work for private cloud, I am sure you are wondering what the steps are to
begin down this path. My colleagues have written about this here and here, and I suggest your cloud
teams and dev ops teams look at these blogs for the details on migrating away from the public cloud.
We have seen dozens of our customers repatriate their data using commodity hardware and either
their own data centers or a colo, and realize some real savings and benefits from Minio's high-performing
simple object storage solution. As the above analysis demonstrates, businesses can realize
significant cost savings above 50% of their existing implied annual public cloud S3 bill
by repatriating data to their own hardware in a data center or a co-location service.
In the above scenario, with only 10 petabytes, your business could save about $6.5 million over
the next five years. The truth of the matter is that the public cloud is cost-prohibitive at scale.
The inherently elastic nature of the public cloud makes scaling there appear attractive,
but it is almost always the wrong choice from an economic perspective. This is particularly true for data-intensive tasks like AI,
ML, where the costs and loss of control in the public cloud can be substantial.
As data scales, private cloud solutions with Minio become economically superior,
offering equivalent, arguably better, technologies at reduced costs.
By leveraging commodity hardware and private cloud infrastructure,
companies can achieve significant cost savings and performance benefits compared to the public
cloud, sometimes as much as 70%. We suggest exploring migration away from the public cloud
for your workloads and using Minio to modernize and scale your critical business applications.
If you want to learn more
and take advantage of our value engineering function to run your own models, please reach
out to us at helloadmin.io and we can start the conversation. Thank you for listening to this
Hackernoon story, read by Artificial Intelligence. Visit hackernoon.com to read, write, learn and
publish.