PurePerformance - How performance engineering saves the euro cup, holidays and keeps cloud costs low with Almudena Vivanco
Episode Date: June 3, 2024Requesting more CPU for your database used to take 6 months of planning 20 years ago. Now it takes the execution of a Terraform script. What has stayed the same all those years is Almudena Vivanco's p...assion for performance engineering to keep systems optimized. Ensuring that systems are available, scalable and resilient even during spike events such as the upcoming Euro Cup or any holiday specials.Tune in and hear from Almudena, who is currently working for SCRM Lidl, on how moving to the cloud gave new justification to performance engineering. She explains the importance of connecting business with service level objectives and gives insights on how Lidl makes sure to sell 50000 pieces of pork without breaking the cloud bankHere the additional links we discussedSlides from Barcelona Meetup: https://docs.google.com/presentation/d/1h83V4gUyqAmIWeAAtKb4BcRvuJV-XirLk-9Xq077nbwVideo from TestCon: https://www.youtube.com/watch?v=rIP_G-YBy04LinkedIn: https://www.linkedin.com/in/almudenavivanco/
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance that's okay no that's okay and people may wonder why do we speak spanish today they might they may wonder maybe not maybe we're just practicing because we're going to go take a
lovely trip to barcelona or wherever our guests might be i don't want to choose locations
well maybe somebody yeah maybe somebody has been to barcelona recently and found out that his
spanish is not good enough,
even though he's been practicing for many years with Duolingo.
But I think without further ado, the reason why we speak Spanish, or at least try to use
our most limited version of Spanish that we have, is because of our guest today.
Hola, Almudena, como estas?
It's great to have you here. We saw each other just a couple of weeks ago in Barcelona at the
Cloud Native Meetup we both presented. And then you actually reminded me that we go way, way back
in history. And actually this is now your moment.
Maybe you can quickly give our audience an introduction of who you are,
why performance engineering, how long you've been in the field,
what keeps you motivated.
And then we dive into some of the topics that you've presented at the meetup
because this was really a great presentation,
despite my lack of Spanish skills. I was really fascinated. So, but now over
to you. Who are you?
I'm really happy to be here with you. As you said, our history is like, no history, 20
years ago, I started in performance engineering like 18 years ago, I was a developer first.
My first contact was with a SIL performer.
At that point, it was from Borland.
I was like pre-sales.
So I was like selling SIL performance at some point to public administrations, basically.
It was like a main role at that point to public administrations, basically. It was like a main goal at that point.
It was 2004, 2005.
It was this huge impact in the technology world
when everyone wanted to migrate their administration,
for example, to electronic administration.
That was my duty in the public administration. It was great.
So that was my first contact. CWI, Boardland, to performance and test.
Why performance engineering?
My background is mathematical. I have a degree in computing and
computability we said here, and applied maths.
So basically, I love numbers.
I love doing models, mathematical models and simulating stuff. So I think that it was actually what I had to do as a performance engineer.
It was the field I'm most comfortable with. And I love it. Of course, in the start,
it was like NBC, so it was like a client server, that kind of stuff. I remember like
too many years ago, but 15, 16 years ago with Oracle database that you have to think ahead, six months ahead, how much CPU would you need?
You have to pre-discover how much CPU you could have in your Exadata and your Oracle database.
And now it's all scalable.
Everything has changed.
Now you have just to press the slider and it goes.
You have to pay, of course, you have to have your credit card.
But it's way more easy.
There was a point, and I think that it was like the no return point for us, for performance engineering.
We thought that performance engineering had no meaning
in the cloud world.
And then we realized that the costs of the cloud,
of the cloud providers,
were giving us a reason to be still,
to have a meaning in the world of IT.
And maybe we changed the name to SRE,
to whatever they want to call us now,
but we are still doing the same.
Just trying to guess the costs,
the scalability, the resilience of our systems,
of our solutions.
So that's what I do, basically.
Hey, Andy, before we dive in, I just wanted to
you gave me a flashback there about
you had to know how much CPU you needed.
I remember the days where we're maxing out our server,
we have to go order a new one, wait for it to come in,
get it in the rack, get everything installed, test, make sure that's running.
And then we can start pushing things over to it.
It's like I forget about that.
We forget what we have now.
It's crazy.
Holy crap.
Anyway, I forgot.
Going to the bunkers, to the data center with a cable that there you have a cable.
Okay.
So, yeah, it was a very different world.
Anyway, I just wanted to point that out because I'm sure a lot of our listeners
are unfamiliar with that world.
But anyhow, I'm being an old man here
reminiscing about the good old days, quote unquote.
Andy, I know you have a lot you want to...
Yeah, it also reminds me,
and this before going down into current topics,
a little bit more memory lane.
18 years ago, you said you started with Silverformer.
We just recently had Ernst Ambichli on the podcast,
the chief software architect and creator of Silverformer.
And it was also just phenomenal to see, you know,
how we all love, obviously, that product, yeah, I mean, but
whether it's Silverform or Lodrunner, there's so many great tools out there that have really
revolutionized and inspired so many performance engineers, and it's just nice to remember
all this, and then you said, I'm looking at your LinkedIn profile, right? So back then you worked at Aventia, right?
Yeah.
Aventia, yeah. And then you
I also like the fact
you said you started as a software
engineer, is this right? Yeah.
As a developer in C Sharp.
As a developer in C Sharp.
For me it was similar.
I also started as a developer but when I
joined Segway back then,
before they got acquired by Borland,
we had to go as an engineer through,
I think it was three months of QA.
So we had to start in QA and quality assurance.
And I was, I think I mentioned this
in the podcast with Ernst,
I was testing Silk Performer with Silk Performer,
which was a great way to learn the product,
a great way to learn the strengths and weaknesses,
and then a great way to then become an advocate
for performance engineering.
And I learned a lot of these things on how to do performance testing
from my colleagues back then.
But it's Ernst or Didi Strasser, and there's so many great people
that I had the luxury.
One thing that you said, and I think this was an interesting sentence you said, you said you thought that performance engineering does no longer have a place in the cloud.
But then you realized that the costs are obviously very important to keep track of.
While it may be obvious for us, but could you explain quickly
how does performance engineering help you with the cloud costs?
So I remember before joining Lidl, I was in Telefonica R&D.
And I was in Expo QA that starts in a couple of days here in Madrid.
There is a huge event about quality in Spain.
And there was this roundtable that we were talking about
how the cloud improves the general feeling about performance
or removes these boundaries.
Of course, you just pay more and you have more CPU,
more memory, more everything
and you just could mitigate whatever bottleneck in your software could have. I was like yeah but you
are still wasting your money. If you do that you waste your money and that's I think that's for performance data. You just want to optimize everything.
Not only for
costs and money
as well as carbon print.
I think that's a really important
subject right now that
we have to talk more about the carbon print
of our solutions.
And that was like
six years ago. Of it was like, of course, you have
to optimize your costs, you have to optimize your solution. Because if you are not investing
in optimizing, you're just wasting your money and your time. And you don't need to have
the best developers, you just have monkeys just coding. That's not what we are supposed to do.
And of course, you need a performance engineer
just to test everything,
like the scalability of the solution.
Maybe, okay, you have Kubernetes
and you have a thousand nodes,
but that's your probes are properly set up.
You have your objectives are clear,
whatever you need in order to scale properly and efficiently. So I think that was the goal for our performance in the union. You just have
to rebrand a bit. You are not only testing, you have more to deal with or to cope with,
like the scalability. But I think that's a way to go.
It's just a new improvement in our careers.
I think, Andy, it's interesting.
I'm going to butcher your name.
Let me see if I can get it right.
Amudena?
You can say Almu.
Almu, okay.
Almu is, yeah.
You would think, going back to even this idea,
you had to get a new server and rack it, right,
especially if it was a CPU, right?
Memory, you could often add memory into it,
but you'd still have to get someone in there.
But you would think that code optimization
would have been a hotter topic back in that day
because it was so much harder to get more CPU power.
It was much harder to get a bigger server.
I think back then, though,
the tooling didn't allow so much
for looking at optimization of code.
Once we can start looking at traces
and once we can start looking at architecture,
service flows and things like that,
we started having the ability
to look at the optimization.
But it is still striking that back then
it was only like, well, we need more.
And then you move to VMware
and you could assign more CPUs on VMware
or any virtualization platform,
but that was the big one.
But then you'd run out of space in your cluster
and you'd have to get another thing for the cluster.
And then as we transitioned to cloud,
there was still the habit of,
because it was so
easy to just add new components in the cloud to add that. I guess the thought that's going through
my mind is what made people, and I don't know if there's an answer, but what made people finally
look and say, instead of throwing hardware at this, maybe we should look at the code we're
writing. Because at some time there was a shift, right. At least for some people. And I don't know if it's hand in hand
with the tooling that allowed that to happen, or was it
just since people could
so easily change the configuration in the cloud
regions, when the finance team started getting the bill, did they
suddenly realize, oh my
gosh, all these things are coming in?
Well, about the tooling, I think that it has always been there.
I mean, I remember being in a talk with Brendan Gregg, they broke Linux tools.
S-trace has been around since the 80s.
You just have to know what to look at.
So
the tooling, or at least
well, some of the tooling has
been there. Maybe it was not human-visible
or not easy to read, but
it has always been there. I remember
just fighting with, not CPU,
but I was working in a proxy
at some point in my
life, and we have problems with the connections with all the but I was working in a proxy at some point in my life.
And we have problems with the connections,
with all the list of files.
And it was hard to learn how to read a Wireshark, a TCP dump.
And it was not easy, but you have the tool in there. Maybe it makes everything easier
just to add observability to this layer. It was harder in 2013 and you have the start of and to scale upon some human-readable data.
Like I have to scale based on a list of files, for example,
in the proxy context.
And yeah, I think the tooling was there,
but it's just like it was hard to understand
or hard to report to someone else
that it was not involved in the performance engineering or the scalability. It was hard to understand or hard to report to someone else that it was not involved in the performance
engineering and scalability. It was hard to understand, okay, we have these limitations
in our hardware. Just so it was hardware based. And we moved from VMware. I remember when
I was in Movistar TV, we have to allocate the different uh machines in the in the in the hosts like uh this
is this one is like eating all the all the resources from the other from the other virtual
machines so we have to put it in another host that kind of stuff that we were like moving
all the virtual machines around just to make them work uh it was a streaming platform and
it was in windows because we have media room.
It was like, okay, it was hell.
I'm not going to talk about that.
I don't want to remember that one.
But yeah, it was like, I think that observability,
nowadays we have the observability that it's more reachable,
that you can understand data easily,
and that you can just give data to someone else,
and report it, make everything easier.
You have the cloud, and usually you have a cloud watch,
or you have insights, or you have something else
that helps you just to give the report,
that kind of report of the scalability performance report
to the POs, to the PMs, to product people.
That just fills the gap between
the two worlds, that is business
and systems. I think the performance
engineers are always in the middle.
We have to be aware of the business
and
of the systems,
of the monitoring systems that we have.
I think that now it's easier.
A couple of thoughts quickly,
because you mentioned performance engineering evolved
from performance testing to just running load
and then basically analyzing the results
and then maybe giving suggestions
and now really being like this day the reliability
call call them sre whatever you call them in your organization but you need to know much more
and you also i think give more guidance and mentorship to application teams to right size
to right configure to do everything right from the start because you with the background of
performance engineering know much more how the systems really interact,
especially as we're moving into this complex world,
like how to properly configure your resource limits,
your request limits, how to properly configure your queues,
how you properly do everything to make sure
that your system is properly sized.
So I really like what you said, kind of performance engineering
with the emergence of the cloud and also now with Kubernetes
has really
shifted to from just being maybe performance testers to really true engineers and not just
performance but it's really about resiliency availability i guess security is also a topic
even though i'm not sure how often you touch on security uh now it's not that much. When I was in Telefónica R&D, I was in the cybersecurity department.
I was going to say uptown.
Sorry, in the department of security. So it was a proxy.
So it was security, everything. It was like a huge topic.
Right now here, I have under my responsibility pen tests and that kind of stuff but not
service meshes, the normal stuff
but not security as a product
security as a
system but not as a product
itself
and then the other thing you said
observability has changed
over the years for the good because
you know 15 years ago
20 years ago when we started in that
space, observability was
something that you turned on
when you had to and then it was
really hard because I remember the early days
of Dynatrace. You had to install
your Java agent, your.NET agent.
You had to enable it. It was
impacting the startup time. People were not
comfortable with it. You could kill
applications if you made configuration mistakes.
But now, 2024, observability, as we always say, is no longer optional.
It's mandatory and it's baked in.
Observability is baked in into our cloud vendors.
You're getting all the metrics.
You're getting logs.
You're getting traces.
It's just there.
And then additionally, with frameworks like OpenTelemetry,
we give people the chance to enrich that telemetry data
with what they think is important, but using a standard,
which also then makes it easier to make these tools better
because we all work on the standard.
So I like that a lot.
Now to your current job, because I think you're working for Lidl.
And for those people that don't know Lidl, maybe you can give a little bit of context what Lidl is doing.
So, actually, I work for Schwarz.
That is the group where Lidl belongs to.
So, I work for Kaufland and Lidl and Monsieur Cousin, everything like that.
I'm in the performance engineer of the company, basically.
So I mentor what you said is very important for me.
I don't have any team.
I just mentor people in the squads to be aware of performance.
It's not a task.
It's like a culture.
It's like the DevOps culture. So I try to implement the performance culture in the teams and in the company.
But I still run tests.
I do some jam all the time.
But I work along with the product teams.
And Lidl, Schwarz, it's a retailer.
We have the Lidl online, and we have a loyalty program that is Lidl Plus.
And we have, like, a lot of users all over Europe, 32 countries.
And soon we will be in the States as well.
Well, the loyalty program is not like a retailer itself.
So usually the conversion rate engagement is way higher than in a retailer.
So it involves a lot of scalability and a lot of campaigns
and a lot of research.
So it's a performance challenge itself.
And Little Plus started like six years ago.
And I started like five and a half years ago.
So we were like in the pilot.
And it was called native from the start.
That was cool.
That is pretty nice.
And it's like we are part of a corporate, a huge corporate.
But STRM, the Little Plus project, is like, I don't know how to say,
but innovation, that we are allowed to go cloud native.
We are allowed to go open telemetry,
Kubernetes, all the stuff that is state of the art, right? So it is a pretty cool project.
And it's pretty close to the users. That's something I... Coming from a proxy that you
don't see a user in your life,
it's like you have these people on Saturday that go for their shopping and they have to have their coupons ready and discounts in the stores.
So you are pretty close to the users.
It was not only monitoring stuff
to have performance monitors
or KPIs or whatever.
We needed to see the user experience
that was when we implemented Dynatrace,
for example,
that allows us to see
the two applications that we have,
the two mobile applications,
the user experience,
how they are using the applications
and how good or bad is their experience
in the application itself, in the solution.
So that's cool for me.
It's a good project and it's still growing.
So there's a lot of stuff to do.
We have the COVID period where we work a lot.
Contactless, everything that was improving,
like the experience of the users during the pandemic.
Yeah, it's a good project.
When you gave the presentation, first of all, thanks for giving us some background on also the organizational structure that you have basically kind of like an innovation hub within Schwarz IT where you can play around and use the latest technology.
I have the slides open from your presentation, which, by the way, if you're okay, we will also share it with the listeners. I see things here like Kubernetes, Argo, KEDA for event-driven scaling,
all really important components for a cellular ability engineer.
Prometheus is on here.
Really, really cool that you could explore these new technologies for that.
You mentioned that campaigns campaigns
are super important right and not only in retail but everywhere campaigns are important
i remember at least if and also if you look at the slides classical campaigns if all of a sudden
a lot of traffic comes in and kind of houses burn down or in this case,
maybe servers go crazy.
You have a lot of houses in like little stores in your presentation to visualize when things
go wrong.
But can you fill us in a little bit on campaigns, right?
If you run campaigns, if your organization run campaigns,
and if you as a performance engineer,
a site reliability engineer,
are actually the link between engineering
and the business,
what are some of the lessons learned
that you had?
Because I'm pretty sure many of our listeners
are in a similar spot.
So you have always,
as a performance engineer or SRE,
you have to always be close to business, business analysts.
Sometimes you are not aware when the campaigns are going to be out in the jungle.
Because you have 32 countries and maybe the campaign in Cyprus is not that important.
Sorry, Cyprus is not that important. Sorry, Cyprus, but of course
you have to be aware when there's a
huge campaign like Easter campaign
or Christmas campaign or
Black Friday, that kind of stuff
that usually it's
you know, you have
a precise
date in your calendar.
But for example, now we are
right now we are in the
championships, European
FIFA,
whatever it's called in English,
in the football championships.
And sometimes
you are not aware that they are like,
I don't know, but
we have a raffle to take the
kids to the fields, to the
match. And as a performance engineer, you're not aware. Maybe not take the kids to the fields, to the match.
And as a performance engineer, you're not aware.
Maybe not even the business units are aware because that depends on the countries.
But usually I try to talk all the time with them.
And I have a calendar shared with the business units that they tell me, okay, this time, this time in the year, we will have the start of this campaign for this country.
We will have like a TV
advertisement
at this point.
Just try to
scale or whatever
if you want to, if you have to.
Like, for example, in Black Friday
or like in Christmas, we scale up
preventively.
So,
beforehand. or like in Christmas we scale up preventively so beforehand in the FIFA
in the football ones
we are not scaling that much
but countries come and
go it's like every day
you have suddenly a
peak in one of the countries that did
why is this?
why is this coming from Croatia?
maybe they are selling a cheese
or something or a Playstation
was sold in a Black Friday
in Holland that it lasted like
10 minutes, something like that
and
we were not aware of that
a Playstation coming for 200 euros
it was like insane
these kind of campaigns I think that the thing that you have to learn a PlayStation coming for 200 euros. It was like insane.
These kind of campaigns.
But I think that the thing that you have to learn is that one, that you,
if you're going to fail, you have to fail.
That's for sure.
Even in a campaign that it's like
the revenue will be huge.
Maybe the costs in your infrastructure
will be higher than the revenue of the campaign.
So you have to be aware sometimes that there's like a balance.
Maybe the brand is, for example, you cannot fail during Christmas.
But maybe during the UEFA, the football, you can maybe 10% you can fail a bit.
You don't have the revenue that you expect.
I don't know. It's not as important
branding-wise as Christmas or Black Friday.
So you have to create a balance between that and to talk always to the
business units. And you have to measure
how long does it take for you to create
a new region, to get up, to absorb the load, whatever, to be resilient?
How long does it take for me to be up again?
And to train not only your software, your solution, but as well your teams.
To have a procedure of how do we do replicate a region and you have to do it
beforehand that's that's things that we try to do in order to avoid downtime during the campaigns
are you doing proactively like does this mean you're running game days? You're doing chaos engineering? You have a talk about chaos engineering in Lidl, and you took one of the links, it's
in Montevideo, in the Wolver. I talk about chaos engineering, how we do it.
So I guess you also have to have, besides the chaos engineering,
a deep understanding of
what are your most
sensitive
areas of the architecture, which are
the ones that are likely to fall over first,
so that when you do have
these unexpected lows, and then as you were
saying, you have a plan for what to do
if that does go down, this way
everybody's ready. I mean, you have a plan for what to do if that does go down. This way, everybody's ready.
I mean, you have one of these unexpected campaigns.
You see the one system falls over.
You say, well, that's expected.
And we have something in place to remediate that because we planned for it.
So a lot of it comes down to planning, right?
Whether or not you're planning for a known event like Black Friday.
Andy, we go back to those old ideas of stripping everything out of the website that you don't
need just to handle that scalability of the event. But this is more
of the unknown events that are going to suddenly spike up
like those local country situations.
So it really sounds like it's about being prepared, knowing your system, knowing
where your risk areas are, and having a plan for those risk areas.
I think that's one of the main issues that we have when we move to microservices.
That there was not this guy, Superman, that knew everything and was going to fix everything because he knew all the money on it.
So we
lost that.
We have the resilience now because it's microservices, but
I always say the same.
When our home, that
is a microservice, fails,
for the user, it's not the home
microservice, it's the whole little plus
that is not working. If the
login is not working, it's the whole little plus that
is not working.
If the payment doesn't work, it's the whole little plus that is not working. If the logging is not working, it's a whole little plus that it's not working. If the payment doesn't work,
it's a whole little plus
that it's not working.
And the performance engineer,
I think, or SREs,
we have this full vision
of all the products,
all the solutions,
not only microservice
by microservice.
You have the observability.
You have to centralize observability,
and you have everything more or less in your mind,
maybe not product-wise or like the last feature that you know,
but you know all the flows.
You know where things can fail,
which is the weakness part of the chain.
How do you have to monitor and alert that?
And I think that's one of the reasons
why performance engineers are important
in organizations like this one,
that it's Agile, it's Scrum,
microservices everywhere,
because we have this global vision of everything.
And it's just because we have been running tests,
chaos engineering, performance, scalability, resilience,
so we know where the difficult parts of the solution are.
So I think that gives us a pretty good spot
in the organization.
Hey, in your slides, in your presentation, you also talk about a topic that is very dear to my heart, which is SLOs, service level objectives.
Can you help me understand how you end up with SLOs?
Who do you talk to?
How do you define what are good SLOs that you've seen?
How do you enforce them?
What do they mean?
I mean, give me. What are good SLOs that you've seen? How do you enforce them? What do they mean?
So when we started with SLOs like two years ago, the CEO and the CTO gave this
task to the agile coach. And it was like, okay. But they didn't have the vision when i when i landed in the project like at the start of this year it was like let's let's start from from from from the beginning because you have um okay you
have the smart business objective maybe that's a product wise that we are going to sell in
in christmas we're going to say one five thousand uh uh fifty thousand perks during Christmas.
But we have to put that in number of
requests,
experience of the
user, response time,
updates, that kind of stuff that you
have to just translate that into
something that is more technical,
more tied to the
infrastructure itself.
That's when I joined the team of creating the service level objectives.
Just to translate these smart business objectives into service level.
And that was when actually we were like, I think that all the organizations,
and when we have this huge growth during four years,
we started to think of the costs and just to reduce costs,
to remove the vendor block, to go to the standards,
Kubernetes, OpenTelemetry.
And then it was like the time to say, okay, our objective is this one.
We have to reduce the cost of this one. So, our Kubernetes
has to cost less than our web apps in nature. Our response time has to be lower than the ones that
we have in nature. Our CPU usage lower, our requests, the experience of the users, the number of issues that become problems in
production, all that kind of objectives. We focus in three milestones. It was like scalability,
we have to scale still. At the campaigns, we have to scale properly and efficiently.
We have to be resilient. If we have a campaign and it just creates an outage,
all the other microservices have to be resilient.
We have to mitigate the outage of these microservices.
So you have to be resilient, you have to read the request,
blah, blah, blah, blah, and availability.
Depending on the
severity or the
criticity of the product
for example
for instance the single
synonyms have five lines
but I don't know
the campaign of
of
open gift at the same Christmas
you don't need more than three
so you don't have to have three
data availability zones or redundancy and geo-replication and that kind of stuff. And that
was an objective. This service has five lines, we have to be geo-replicated, availability zones
redundant, blah blah blah blah blah blah blah blah and I think that the point is like
we have to sell
50,000 porks
how does that impact in our servers
availability, scalability
and resilience wise
because in the end
little sells potatoes
what we do
with the pork,
hopefully.
Pork and potato is definitely
important.
We have
one really big one.
It was an SLA
because it's the other part of the stores.
It's not within our organization.
It's GK.
It's another company.
So we have this SLA that it was like the time that spends tickets to arrive to the user.
But I think it's pretty important.
If a user is in the queue and he's paying, how long does it take for the ticket to arrive to the application?
And from that one, tickets, coupons, discounts, everything else just
appeared. It was like, you just have to know where to look. It was like, okay, the tickets,
but then if we have the ticket, we have the summary and the process of the discounts that
is going to the scratch that is going to the winning moments, the scratch that is going to the scratch that is going to to the winning moments the scratch
that is going to be win after the tickets all came along was like we have these big ones okay
the other ones just the other objectives were like really easy to achieve or to know where to look at
so yeah i think but we're working on that. So we have to implement some more
to make them more observable, to make them easy to reach. Right now we have them in Dynatrans,
but we wanted to move them to somewhere else. I'm not going to say it now. You can say it
now. We want to make them more reachable for the whole company
and then
the training was for the whole company
all the whole squads from
the product data
to the product owners, product managers
everyone was involved in the training of Celo
I think that was
very important, it was not only for
the technical people but it was like
you know the business
tell me what do we have to achieve
do we have to go to a new country
do we have to
be better in the engagement
what do we have to do business wise
okay we translate that into
the service level
I'm taking a lot of notes here and I think
you know a lot
I learn from you
we learn from our guests too
I've seen all the videos from Andy
talking about this
I think it's a great confirmation
that what we see and what we have seen is really stuff that happens and matters in the real world.
Because we are working for a vendor and we try to be as close as possible to our end users.
And with our history, we've been in that space for a while.
But I made a note of the how we sell 50 000 pork for christmas
this could be an interesting uh title for uh for a conference talk maybe
is it gonna be iberico i'm on no it's in romania it's a custom it's a christmas
custom in romania that they they eat pork and they buy a half of pork. So yeah, it's a use case.
It's every year we have that. We have the same in Germany, we have fireworks. We sell the fireworks
for Germany. What happens in Austria? Because I know you're also active in Austria.
Any strange customs I should know about
my own country?
No, not really.
Not that I'm aware of.
They sell trips to the hills
so you can go sing.
The one of the fireworks,
I love the one of the
porcs because it's like
we have the
software solution is tied
to the warehouse, not to
the data warehouse, not to the warehouse itself
where the
ports are
there.
And the fireworks one, it's the
same. The warehouse has the
fireworks, but there's a security
chain there.
Logistics, because fireworks are
flammable and you have just to
deliver them in a safety
way. So it's a
really complicated flow.
It's a tricky one, but it's
pretty cool. But it only works in Germany
and only during Christmas.
I don't know what the Germans do with their fireworks.
Well, I guess they buy it probably for New Year's,
but they already buy it for their kids as a Christmas present
to then fire off maybe for New Year's.
What I want to also recap, I think, again,
what I've also seen in many organizations now
where the former performance engineers now turns site reliability
engineers are really the ones that are connecting the dots to all the different stakeholders to come
up with good slos that are tied to business objectives i think you said it very nicely
said you have to translate smart business objectives into service level agreements or
service level objectives and once you have kind of the first two layers figured out,
it's very easy to trickle down to the technical metrics.
And I think that's just, folks, something if you struggle,
we should be responsible and you are necessary.
Maybe you take the responsibility on you and drive this initiative.
But because you have the overview of everything
and you should be able to talk to everyone
because you need to know what the business is planning
for your campaigns, for your capacity planning, for your scaling.
So you are in a perfect position to then also define SLAs
and then enforce them.
That was what shocked me when I was a LIAI coach
that it was like doing the task.
It was like, why are you doing that?
And in the end, they just tell me, you're the best suitable person to do it.
Just go ahead.
Just free me from this.
And I run the trainings and I try to standardize all over the organization.
And that's because what you have just said,
but we have the vision.
So we have to talk to the business units.
We have to talk to the SREs themselves,
the platform engineering guys
and the SREs from each of the domains and the products.
So we have this vision that
we have to achieve this
objective.
How are we going to
do that? How are we going to scale?
How resilient are we going to be?
How much do we have to
improve our solution or how much
do we have to
pay for the solution.
That as well.
So yes, I think that's why I took over the responsibility of taking the silos in my organization.
One last question for you.
I assume you're running fully in the cloud.
Is this right?
With your Kubernetes, everything is in the cloud?
Is there any thoughts on whether at some point in time
you actually reach a size in the cloud
where it again makes sense to think about building
and pulling things back in on-premise data centers
that you may still have?
Or is the cloud really working out for you
when you've figured out how to be cost-efficient?
We are considering some of this in the data layers.
We are running the, how do you call that one?
The polyglot data architecture.
We have the data platform and this kind of stuff.
And the cost is huge.
The cost of that is huge.
And at some point, I think that we will consider it.
There are like some milestones.
Like if you reach the 100, I'm just an example,
but 100 million users, or when we are moving to the States, that we will have data from the States
and we will have to share, you know, the laws of the data protection are different,
so we have to store them in different places.
We will have to reconsider if we are moving some of our data storages
back to a data center.
And as well, we are creating,
so Lidl, Schwartz are creating, so, no, Lido Schwarz
are creating
a new cloud
and it's
StackIt.
StackIt is
a German cloud,
European cloud
that will be
like following
the standards
of the European law.
And that's something that we will be using.
So it's like going back to the data center,
but like in our own cloud.
So it's more or less like a middle way of it.
I think that the stack it will be covering
one gap that we have right now,
that there are like three major players,
Google, Amazon, and Azure, and Microsoft.
And they are all from states. And we will have, Google, Amazon, and Azure, Microsoft, and they are all from the States, and we'll have,
at some point, we'll have to think,
as a European, we'll have to think of
a cloud in Europe
following the
European rules and
want to have something in Europe, I think.
And it's like it's going to fill that gap,
hopefully. And that's
for us, it's good.
The data will be in our own cloud, and it's our data centers.
It will be cool.
Awesome.
Andy, that left you speechless, huh?
I think that we lost him.
Oh, no.
Andy's having some, some well he just lost audio
well I will then start
is there any final
thoughts you wanted to get out
we're pretty much at the end here
anything that you wanted to say
that you didn't get a chance to
or do you have any speaking
engagements coming up
I will be
in Berlin in the DevOps in Berlin,
on June 19th.
Okay, perfect.
So hopefully I will see you there.
And I'm just back.
I'm not sure if you will be coming to Berlin,
to DevOps Berlin.
I will be there.
I was actually suggesting, Almudena,
that you
should put in a CFP for
KubeCon North America.
I think that would actually be
a really cool
thing. Talking about your experience
on selling
50,000 porks
all by
Kubernetes, OpenTelemetry,
Captain,
whatever else. For the next year one. That is in London, right?
Next year.
I could go that one.
But North America is still...
Could be a good way to make the name Lidl and Schwarz-en known in the US as you are in the market.
When I was in Nylatrace performance here, there was like this woman that was like, what
is Lidl?
Well, it's a huge, you know, and when I started to show her numbers, it was like, oh, this
is huge.
Yeah, it's huge.
So yeah, I think that I will take your advice and i will and i will write down a paper
like how to sell 5 000 50 000 porks for christmas without breaking the cloud costs
and how to sell fireworks to children in Germany for Christmas presents
what can go wrong right
and that's more from Andy's way of putting it
not little
awesome really appreciate you being on today
I do want to say that
that
and
it's one of my favorite sayings
and it's not really even a
saying it's just made up really appreciate being on I'm glad you're
learning from Andy we learn from our guests like you it's always fantastic to
have people on so thank you so much and thank you for such short notice that we
really everybody I'm really saved our butts today. So thank you. Thanks for the invitation.
I love to be here.
Alright.
I guess that'll wrap the show. Thanks everyone.
Bye-bye.
Bye-bye.