PurePerformance - Why is it always DNS, TLS or Bad Config? This and many other learnings from Philipp Krenn
Episode Date: September 12, 2022We all want to leverage technology to solve problems. New and shiny toys are appealing to look which sometimes means we loose the insights on the base technologies that powers most of our connected li...ves, such as DNS or TLS.In this podcast we invited Philipp Krenn (@xeraa), Dev Advocate Team Lead at Elastic, and learn about DNS, TLS and other bad config changes. We learn about Log4Shell, how the Java Security Manager was a big help in fighting Log4Shell, why its been deprecated and also get his thoughts into CDD (Conference Driven Development)And if you ever visit Vienna – chances are you meet Philipp dancing Waltz with tourists 😊Show Links:To learn more from Philipp start withHis personal website: https://xeraa.net/Twitter: https://twitter.com/xeraaLinkedIn: https://www.linkedin.com/in/philippkrennHis conference schedule (past & future): https://xeraa.net/events/
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always I have with me my very wonderful co-host Andy Grabner.
Ladies and gentlemen. Hi Andy.
Hi. I'm actually surprised that you don't come up with something more hilarious than the last time.
You know I forgot. It's been a while since we had a recording session.
And I'm like, usually when we do it back to back,
I start getting snarkier.
This time it's been a while and I'm like, no, I forgot.
Like no instant, no nothing.
What's wrong with you?
We're back to baseline.
Back to baseline, okay.
Our guest just made a mistake.
He started speaking before he was introduced.
Oh my gosh.
Oh my gosh.
We need to go.
We're going to, let me get that shock button.
No, but maybe that's actually a good segue, right?
Because maybe we do it a little unusual today.
We have a guest today and Brian,
today you're actually one US citizen against two Austrians.
I know.
Don't be afraid.
Well, you are allowed to add a little more Arnold references today,
at least twice as many as you typically do.
I'll go with Mozart, but I don't know any good Mozart trivia.
I'm very happy that we have Philippe Crenn.
I'll try to pronounce you in the English way, and I'm really sorry about that, Philippe.
Philippe, developer advocate at Elastic.
I think you can probably do a much better job than introducing, than me introducing you,
because I already messed up your name.
So, Philippe, please introduce yourself to our audience.
Thanks a lot.
And as you could tell, once you invite me, you cannot shut me up anymore.
So I will just get started early on here.
So hi, I'm Philipp.
I generally just skip my last name because that always leads to this weird pronunciation issue.
So I have been with Elastic, the company behind Elasticsearch, Kibana, etc., for more than six years.
I was actually the first one in Austria, and I had my own country for some time.
By now, I have to share.
You owned your own country?
Well, I was the only one in Austria, so I was always jokingly saying,
I own Austria for Elastic, even though I'm not sales or anything.
But at some point, I started to share my country,
and we have a dozen others or so in Austria by now,
all in different functions since we are fully distributed.
So it's a bit of a different culture approach.
Well, I've been a developer advocate because I like to do conferences.
So before COVID, I was doing like
probably 220 travel days a year.
So I was always going from one city in
Europe to the next that's why I know the conference season quite well I have started to pick up again
but by now I'm leading the EMEA team and I need to spend a lot more time on internal stuff so
I guess I'm down to like 100 travel days a year or so now let's see hopefully fingers crossed covid will stay light
or whatever it is it that it's doing um to so we can actually do some stuff and i think we also
learned um in general to be a bit more resourceful and do better planning around what is in person
what is virtual what is the right mix though i think we're all still trying to figure out what
is the right mix at least that that's my takeaway so far.
You know, I think unrelated to our topic,
I think somebody has to write some sort of expose, tell-all style book
of life on the road as a conference speaker.
Like the good, the dirty, the everything.
I think it could be quite salacious, I imagine.
I think my trick or tip is that I'm just good at suffering and I don't care so
much or maybe that's the frugal approach that I was brought up in I don't know um at least that's
mine um Andy what is yours yeah I think the uh first of all it's um your friends should never
believe that what you post on social media that this is uh the real
story behind it because typically we only post the nice things when we travel to a new city
but there is long days a lot of weekends also on the road where you don't get you don't have
your regular um let's say kind of rhythm right that others say if you don't go and do certain
things on a monday tuesday or wednesday evening because you're always in a different place i think
it's good at least for me to find to find something, though, that you can do
that makes you happy.
For me, outside of doing conferences, it's dancing.
I'll try to find myself some spots in every city.
Maybe you have something, Philipp, as well.
What do you try to do, what do you like to do on a regular basis
when you travel to new places?
So when I travel, i mostly just want to
see the city and just walk around a lot but it's true at home um you can actually tell that we're
austrian so i'm doing viennese waltz um and i'm actually dancing for tourists and everything in
vienna still like i'm with a big dancing school and i i often do like i'm a bit old by now but
the others are a bit younger uh but we often do performances at balls and for often
for american tourists to come to visit to show them like how vienna really is i'm never sure
that's really how vienna is but um yeah i don't want to bash too much on the americans but it
seems to be very popular with americans um so it seems to be the the austrian wonderland that we
are projecting here so it's a bit like the the nice pictures that you post on social media about conferences is that you portray Vienna in a way that it was a hundred years ago and
that's how we like it.
Well, yeah, we like to pigeonhole every city in every country into a specific
time period of our ideal vision of it.
What an amazing story because I always tell people that Sound of Music is just
an American invention. Now you come here and completely cliche
the austrian waltzing in vienna i have never seen sound of music though really no other no austrian
has seen the sound of music i have seen it twice i have to admit maybe you have too many american
colleagues but like i want to say like they're proper austrians who don't interact too much
with americans like nobody even knows about sound of music and hey um quickly picking up on something you said and kind of
switching over to the topic um over the last couple of years i think we managed to escape
each other even though we were presenting sometimes at the same conferences but this
year in july we both finally saw each other in stuttgart and at Java Forum Stuttgart.
You had a presentation on Log4J, and I think it's obviously a pretty hot topic.
But then in preparation of this podcast, I actually looked at your blog post,
which, by the way, we are going to link to.
And I was browsing a little bit through kind of your more recent presentations that you did kind of this year and last year probably as well.
And one really piqued my interest and it was called, well, for two reasons.
First of all, the title is why is it always DNS, TLS and bad configs?
And the second thing is it seems you're also a Harry Potter fan.
And because in your slides, you used a lot of references to Harry Potter.
But I'm really curious because Brian and I, we often have
conversations with either our users
or colleagues
or even in podcasts and talk
about, damn, why is it
always the M plus one query problem
that is bringing systems to a crash?
Why is it always bad hibernate
that is wrongly configured that brings
system to a crawl?
And now you are telling us three things
that I think we have not talked about, at least as much.
Why is it always DNS, TLS, and bad configs?
And I think I would like to talk about this
and just get your perspective on what you see out there,
and especially also what people can do
to kind of secure themselves
from actually these three things becoming a problem.
Yeah, so the the harry potter reference is actually an interesting starting point um it's i mean i i've seen the
movies and i i think they're nice movies um i just wanted to make the talk a bit more fun
and the kind of like starting point was maybe you remember there is this scene where mcgonigal
um says to harry ron and hermione, why is it always the three of you when
something happens? And for me, this is like, in the slide deck, there's also a picture where there's
like DNS, TLS, and bad config written on them. Why is it always the three of you? And that's pretty
much the picture I have in my head when something happens. And then I looked at through some of the
big outages that's happened over the last two, three, whatever years.
And it is so often these three.
I mean, people probably still remember that Facebook outage from a year ago where they took everything down,
where they allegedly had to chainsaw back into the data center because they couldn't even access anything anymore
because the system was so dead and down.
And that was a mix of dns and bad
conflicts and it just seems to be there are these like underpinnings that we often like to forget
that keep everything running and a lot of these technologies are also like super old like dns is
not new or fancy it's like this old thing that is actually quite smart. So it actually works around quite a few things.
So I recently read a blog post where somebody tried to point like two C names at each other.
So it's A foo.com points to B foo.com and the other way around to create a loop.
But for example, DNS is smart enough to figure stuff like that out.
So DNS is not stupid.
It's just very old um and badly understood
and that's i think why for example it's so often done wrong or dls has the classic problem that
it's a complex and then b stuff always expires and nobody checks for that and we could do a bit
of a better job to do that because it's kind of straightforward but we always forget like some
important certificates somewhere in the back end um that
nobody checks or nobody sees and then it runs out everything suddenly stops and then it's probably
nobody who has access to that uh can regenerate it that weekend or whenever it happens so it's
it's always a bad combination of things but i think coming from the my background is part in
in ops and running infrastructure and these are things things that I always feel close to my heart. And I feel like we're often spending so much time on the fancier
problems. And n plus one queries are not even the fancier problems. But we spend so much time on,
I don't know, you're doing something stupid in the application and your tracing can show it and
whatever. But oftentimes, it's like the cheaper and almost dumber stuff that goes wrong
and then it's not just like one page fails or one page is slow but it's like your entire application
is dead um those kind of things in infrastructure that can can hit you well also because i guess you
know dns and tls are just so so fundamental to everything and it's like you know if i don't know
if if you know we don't have electricity anymore, that's fundamental.
Nothing works anymore. If DNS is misconfigured, you know,
you have to, you brought up the examples.
Yeah.
Or the other fun thing is that
that's what I have done wrong in a previous job
is where I configured like the wrong name servers
or I copied for multiple domains
and for some domains it was the wrong name server.
And then because of the timeout,
everything works and works and works until it doesn't.
And then you can just watch as stuff kind of like disappears from the internet.
And then you know that it will take the timeout again until you can actually fix it.
And so you can just sit there and watch and fiddle your thumbs to wait until stuff works again,
which is like a very interesting attribute of DNS.
It's not just like, oh, we'll just redeploy the new application
and it's fixed and done and we move on. But the time to live of DNS, for example, is something
that's kind of also like its own interesting problem. And I feel like observability is the
fancy new thing and everybody thinks in like these complex tools that we have. And then like
DNS and TLS are very classic monitoring things it's like you ping the
certificate how how long is it still valid if it's less than 100 days send somebody an email or start
shouting or whatever so so it gets fixed it's not sexy or new or anything but it's still like the
fundamentals that we get wrong so often that's that's why i kind of like like the talk or just
to say like oh it's not just this fancy stuff and open telemetry and all the the new things that we can do and it's the the great
new world but it's oftentimes it's the fundamentals that will catch up with us again you also did a
great job in your slide to think also provide examples right how you can write a simple
synthetic test that actually you know validates if your certificate is still valid like you bring
up some good points on um you know check it on a daily basis and then 100 days before 60 days
before whatever you are then sending alerts and and kind of raising the awareness i'm just wondering
why you know we are also representing obviously a monitoring or observability company um you as well right it's like if why
don't we make it easier then for people to automatically do these things we should put
these checks into the system by default and just help people unless you say this is counterproductive
because then people actually forget that these things exist and then we kind of run into this
situation that nobody understands these technologies anymore and if then something
breaks nobody knows how to fix it.
Like a chicken and egg problem a little bit.
Yeah, my take is that it's like we're always running after the new hotness and look at what I can do and what I can figure out.
And this is so cool and so smart.
It's almost like conference-driven development where you just pick up the latest and greatest from the conference.
And we often forget, like, I mean,
I think good old Nagios could do stuff like that already.
And nobody wants to use Nagios anymore.
And it's just like, it's kind of like a solved problem
and then you forget about it until it's fighting you again.
So CDD, conference-driven development,
but in your role, aren't you kind of feeding this whole thing
because you are showing new cool stuff i guess you also talk about these fundamental things
which is important but um i assume you also show new cool stuff and then hopefully not inspiring
too many people to just look at the new cool stuff yeah so i'm i will just point at the other
side now it's like somebody shows the cool stuff which is, so I will just point at the other side now.
It's like somebody shows the cool stuff, which is fine,
but it's not how you should build your product or your production environment.
You shouldn't just take what somebody has shown you, what looks great at first.
I think there's a fine line or mix between showing what is kind of like possible,
but also only picking what is mature.
And for example, that's something that I think is possible, but also only picking what is mature. And for example, that's something that
I think is important that I will generally only try to show stuff that I believe are reasonable
to use in production or whatever, and not something like, oh, we built something and
it feels half baked or it solves the wrong problem. Then I will generally try not to actually
demo or show that. we like everybody we have
some things that are just in the making or maybe that are are also never going to to make it out
of the making because you need to experiment um and i i generally try to stay away from those i
think that's that's kind of the responsible thing to do not just show like what is hot but also like what is potentially reasonable to use but
obviously everybody like we always like to say it depends on what is the right solution for you so
everybody will need to decide for their environment and their skills and tools and whatever what is
the right thing so i i think it's like a two-sided argument here who is responsible yeah i still love
your cdd conference driven development it's an awesome turn and i think brian who are also a two-sided argument here. Who is responsible? I still love your CDD,
Conference Driven Development. It's an awesome
term, and I think Brian, who was also chuckling
when he said this.
I also like what you said, too.
I was going to say another fundamental along with
that was you said, I'm not
going to show you something
that's not solving the right problem.
I think that's another key thing is
before demonstrating something or even before using a new tool or functionality.
It's like, well, what are you trying to solve for?
And I don't think enough people are asking that question.
They just want to use that cool new thing.
And that can apply not just in development, but any situation.
I do all this audio production and I throw in these cool plugins because they're cool.
I'm like, well, what am I actually trying to do?
I put it on to see what I could do
instead of thinking what I want ahead of time.
Anyhow, I think those go hand in hand,
the conference-driven development
and what problem are you trying to solve?
Those are fantastic.
We can end here.
It's an awesome show, but let's keep going.
Just to add on top of that,
I always say part of the job is to make people successful
and successful can be to solve their problems or to show them what is possible,
but by showing them the wrong stuff that leads into a dead end or brings them
into a spot that is not so great.
I think you're not doing yourself or your company or your product to favor by
leading people into something that looks good,
but will not stand the test of time or might not go so well in production.
I think that's part of the making people successful.
And just one other small addition, like next to the conference-driven development,
there's also the CV-driven development,
where people just throw everything into their production just to say afterwards,
like, I have used this list of tools.
And that's how we end up with these zoos of technology of what people
often use which is by the way great for tool providers like us because somebody needs to
actually make sure that stays up and running in production afterwards so
it's um there's something in there for everybody yeah i think that i think the dns thing
it's there's an analogy i was thinking of it, but that analogy then brought up another problem.
So the analogy I was thinking of was tickets for an event.
So before we had apps, we actually had to get printed out tickets.
And I know it happened to everybody at some point.
You do all your prep, you get all your stuff ready, whether or not you're going to go have a party in the parking lot ahead of time or whatever.
You have all your stuff ready, you arrive at the parking lot and like, oh my God,
I forgot my tickets at home. The fundamental, the DNS, the underlying thing, one of the most
critical things, that's just the basic, not all the little, I got my little hibachi grill and
all these other things. So there was a great solution for that, which was the app. Now you
don't have to worry. You have your phone, you have the app, but two things happen there.
Number one, maybe you didn't download the app
and now you're at the event,
which may or may not have poor reception.
But if the app is down, right?
Now everyone's hit.
So when you're looking at those solutions,
you know, you talk about,
can you use a different tool?
Do you use the old tools?
Do you need people who just fundamentally know
and understand these things?
I think that makes the case that
you do need those people
to fundamentally understand it, because when the new,
the easy button for those technologies fails in some way,
you have to be able to fall back on somebody,
the old guy with the spider webs on him in the corner
who still knows how to do this stuff manually.
Or you have taken 100 selfies and you're out of battery now.
Yeah.
Hey, Philip, coming back to the presentations of DNS,
we covered DNS, we covered TLS.
You also talked about bad configs, even though, I mean, in general,
I think, you know, like DNS, you brought up your own example.
You know, sometimes, you know, it's config problems that lead to this.
Any other examples that you have where you say,
hey, I would wish that this doesn't happen anymore
and this is why I explain it to the people in my talk?
I think your N plus one is a great example
because that's also been plaguing us as an industry for a long time.
And then you have whatever abstraction for your ORM
that is potentially doing that or doing other weird stuff um so I think yeah it's the
the little things that we tend to overlook because in theory they're solved but in practice they're
not or it's like the case of the bad YAML intonation or whatever, what would be my example for bad configs.
Everybody running Kubernetes has, I don't know,
how many thousands of lines of YAML lying around somewhere,
and I don't know if all of those are correct or doing the right thing.
But I feel like there is, as an industry,
we're not moving away from making it easy to make some of those mistakes, I think.
We keep piling on new layers on top of them.
And then we kind of like forget about the old problem,
but they are still lurking down there somewhere to take stuff down.
And I guess everybody who has been running something in production
has been there to run some bad configuration.
And it's always a combination of two or three things.
Like one thing alone is normally not enough to throw you off but it's normally two or three things combined
just make the right mix to create other chaos could we solve this problem in a way i'm just
throwing out an idea and open telemetry brought this up with distributed tracing however you get
to the trace let's assume we get framework vendors or framework developers
like those that are providing Hibernate
or other abstraction frameworks.
We get them to, on the one side,
instrument their frameworks
with, let's say, OpenTelemetry.
And wouldn't it be cool
if there would be another standard
on top of OpenTelemetry
where you can define bad patterns?
Because if you get a distributed trace
and that distributed trace shows me
the M plus one query problem,
one SQL statement,
and then five times,
10 times, 100 times the same,
this should be a pattern
that should kind of flag
that distributed trace later on
and say, hey,
this is something you may want to look at.
And maybe you're not even aware of it,
but based on how we intended this framework,
this is a distributed trace
that we should probably not see.
And I was wondering if, you know,
you have some knowledge now on certain frameworks
and how things behave and should behave,
and then we detect patterns if something is abnormal.
We have other countless examples of, you know,
because we've been analyzing distributed traces
for many years.
Wouldn't it be cool if we could just say,
hey, with OpenTelemetry we solved the collection problem,
but now we need to figure out a way
how to let people and framework providers define
what is normal and what is an abnormal pattern?
I guess?
I feel like we're not there yet.
I feel like OpenTelemetry has been very much about data collection.
But then once you hit the backend,
then it's like every vendor is on their own
and doing their own thing and way.
And it took a long time until the data collection was standardized
because nobody reasonable wants to have
all that vendor-specific stuff in their application
and kind of locks them into one vendor so much much so i think that that was an easy sell like the the sell to tell people
oh there's this open standard for detection and figuring out what's going on i could see that but
i also feel like it's it's harder to sell to to kind of like the end user because they're like
as long as it does the right thing i don't care how it's standardized i it's not in my application and i feel like for a vendor
you're not necessarily um like there are not a lot of incentives to share the secret sauce because
right now i feel like that is kind of like the secret sauce what sets apart the different vendors
i guess it it would be interesting i just feel like it's a hard starting point to say
because that's the new unique value proposition
of the different tools that,
oh, we get open telemetry data.
This is how we can search it,
create an alert on it, whatever.
There it's deeply in the vendor land.
Or maybe I'm just thinking too much
from the vendor perspective,
but I have the feeling that the incentives there
are maybe not as strong.
And from the end user, there's maybe not as much push, because it's not in your application. I think, Andy, if I'm
understanding it right, my take on that would be that this wouldn't be
from the vendor perspective, it wouldn't be the full scope of how
the platform analyzes. There would be a small portion
dedicated to open analysis, let's call it, I don't know,
that can ingest a predefined pattern from a vendor that we know these certain patterns occur in our situation.
When you ingest your code, you can also upload this to your tool if they support it.
And your tool will still do all the cool things you do, but once we're in one section of it, we'll be able to ingest these.
And when a problem arises,
it can then cross-check it to,
is it one of these patterns?
Hey, we know it.
It's already defined.
Bam.
So it would only be on those known and existing patterns
that vendors supply
or other ones people put out there.
But then you still have to do all the magic
because, let's face it,
99% of the time,
things are hidden behind a million other things or they are different problems i mean it is kind of scary how how often it is the same
problem at the end of the day um but obviously there's a it's a lot more complex yeah i mean
that was kind of my thinking right let's see coming back to an analogy maybe right if you
think about a car and you're you have an engine and then the engine probably has some specifications
the engine should has some specifications.
The engine should be operating in a certain temperature range, right?
This is something that I can then measure and then alert on.
But if the engine is put into a car, then you need some maybe some additional logic that makes sure that whichever person is driving is not hitting a speed limit.
But the speed limit depends on where you actually drive.
So this is then where the other feedback,
let's say more intelligence comes in
and where you need more data
and then to make better recommendations
to the driver.
I'm just saying, right,
if you are providing a framework
and the framework is observable,
then why not give at least a framework
of we think as a framework provider,
the framework should be used
kind of with these constraints.
Or it should run.
It runs normally if we see this and runs abnormally if we see that.
But maybe I'm going off in the wrong direction.
Well, just to round out your car idea, I think cars have that already.
You hook up the computer to your car and for the common things,
like your oxygen sensor or whatever,
the things that they can know about that are measurable and simple, that's something like that is standards.
I mean, I can definitely see a battle between vendors on this idea.
It's not, you know, that's more of a very optimistic outtake of big money capital and sharing, which would be awesome if it does, right?
But who knows?
And I feel like that would be like a
classic log statement, no?
Something sees that you have a
recursive invocation to some degree,
you could potentially just do a
one logging. I feel like the
tooling is almost there.
And I'm also
slightly torn if the tool should even allow
that or should require a special flag for some of these things.
Like the M plus one query problem is, it's of course hard in the end, but I feel like in theory, this is like a known thing that you have like one query runs more than 10 other queries.
That's like a really weird sign normally or not something you want to have and in most cases so maybe it
should fight back harder from the start i know but the same with expired certificates right it
should be a fixed problem but yet it brings down very popular websites every year and this is
all right um switching gears a little bit um because you know we obviously we both we all are very interested in making systems
observable and through different means of looking at different types of data. One of your
recent talks and this is the one I think you also had in Stuttgart was around log4j. I think we
don't need to explain what log4j is and log4shell and the exploits because Brian I believe we've
covered this a little bit
in previous episodes.
But I would just like to get your take on it.
What you are, you know,
because unfortunately I haven't seen your talk.
I kind of clicked through the slides a little bit,
but I haven't seen your talk.
I'm just interested in your role as a developer advocate.
What do you advise people to avoid these problems in the future?
What kind of best practices do you advise people to avoid these problems in the future what kind of
best practices do you give them yeah so my talk is actually kind of like almost broken into two
parts because elastic search plus lock search to some degree um have been using um lock for j for
logging for a long time and we're almost affected by it but especially elastic search not
because of the java security manager and that's kind of like what i'm what is like the first part
or half of the talk almost it's like looking at why is it hard to detect that this is a problem
or not a problem so for example that those are not aware the java security manager is unfortunately
this is going away but it's basically something where you can create your own sandbox to some degree of what you can do from
a Java application. And the sandbox that we have put in place is that only very few packages of
Elasticsearch, for example, can do a network call. So our network library is Netty, and Netty can do
outgoing network calls or can bind to an interface and whatever. Most other pieces of the code cannot do that.
For example, our logging library cannot do networking calls,
except for DNS lookups.
But that's why, for example, an Elasticsearch is not a remote code execution
because it just cannot fetch remote code.
It could only use code that is already around and work with that,
but it could not reach out to any system to load any remote code,
which I think is interesting how you can generally solve that problem.
So even if you have bad security issues, and like we can discuss for a long time
if a logger should even have these features or not,
how can you as an application developer or provider,
especially if you have widely used tools, can protect against
stuff like that. Just to say, for example, my logging library doesn't need to write to the
network. It needs to write the file. That's pretty much all it needs to do. It shouldn't have the
rights to do other stuff. Why is that not a more common thing? And why is it actually very
hard to get right? And why it took us a long time to get
to that point. And then of course the second part is more on the detection side where we have
basically I think two angles where from the observability side where we can see what is
the application doing. Is it like it's suddenly doing network calls to some other IPs or
DNS names that it shouldn't,
which is definitely a weird sign and you can suddenly see it's always
doing a get on whatever code
and then it's being executed, so that's not a good
sign. And the other thing is, since
we have more security tooling around
that now as well, we can actually see the
processes and we could see that
the Java process, for example, spawns
another shell or
whatever out of the java process which is also not what you commonly do in your web applications
um that you spawn out and then do another wget or whatever whatever people are running so we we have
that the second side of how to protect against that but the the first half is more like how do we protect our own products
or make them more secure by default which i think is um something we as an industry also
haven't made a lot of progress in like there is a lot of theory um out how to write more secure
programs and i mean very few or almost nobody manages memory manually anymore like they are
they're even in in c++
you have proper tooling around that anymore so we have made some progress but some other stuff that
is again surprisingly simple like why don't we limit what can do network calls or
or what system calls are even possible from an application um why is that not more widely used
and i guess it's because most of us just want to ship fast.
Yeah, I think you bring up a really good point.
And I think sometimes people that complain about
why is it so hard to get these permissions?
Why don't we have just everything on
because they want to ship faster?
I think these stories are then a great reminder
that there are reasons why by default
we block as much as possible.
And actually at the end you have to ask for permission you have to grant an additional permission if you need to make
a certain call uh consciously right and i think that's that's the great piece uh or the great
advice and kind of the reminder that there's a reason why we are very restrictive and i think
that that's a great point then i also want to highlight whether because you know maybe
not everybody's a java expert like you are maybe but but maybe they are responsible in in fighting
things like this you talked about the java security manager um that this is a component
that allows you to define what's possible and what's not and you said it's going away
in in favor for something else or they don't have a proper replacement but it's so that the problem
with the java security manager was that it was not very widely used and it created like both development
work and it was always a bit scary for us because there are very few java projects that are using it
heavily um so it doesn't get the exposure that we would like to see from others and well it's going
away and we are working on replacements around better modularization in code
and what can invoke what,
and to have more boundaries between code.
And so if you have a vulnerability in one part,
then it will not own the entire process, for example.
So there is work in progress.
We'll see how to get to that.
We have some people who are very deep in the Java ecosystem to work on that.
I think Oracle and others are also
still building a bit more in that area.
But I think the official take is
that the Java Security Manager was interesting.
It was just too much work
for the average application to use it.
And that's kind of why it has failed.
And we're not really happy about that
because it really saved our bacon with lock for shellll but we can also see that it's not as widely used as we would want or others on the
JVM would want so it it's maybe not the right abstraction or it's not the right thing right
now and we'll we'll see where that ends up but it was one of the very cool features of the JVM, I would say.
Yeah, what I've seen at conferences, people present,
and I guess, again, I'm not a security expert at all,
so I'm just consuming information,
and hopefully I kind of repeat it now in the correct way,
but people have shown using eBPF to basically then block calls when an application is opening up a port
or it's opening up, making requests, right?
That's one option.
And then obviously policy managers, right?
If you think about Kubernetes with policies
or also the privileges that the service account has,
you can really restrict an environment very nicely.
So what it can and what it cannot do.
It's just ties into like different environments and
then it's like oh you need to have something on the kubernetes level and then you have ebpf but
that depends also on depending on the kernel version and then what do you do on windows
whereas in in a jvm it was like just one thing that you had to do once basically so it's um i
mean i think again like in most cases, there are like a hundred options.
It's just like, what is efficient and like manageable?
We'll see.
But we already have a, like, for example, we do a system called filtering,
like Elasticsearch couldn't fork another process.
That's just like, we don't give the process the permission because we don't need to fork another process out of Elasticsearch.
It's not a thing.
This conversation reminds me,
not that I lived through it,
but if you think back
in the industrial revolution era,
where you had these factory floors
with conveyor belts
and spinning machine parts,
people working all under it.
You had fires burning stuff inside,
no ventilation, no lighting.
It was a very unsafe workspace,
but it was designed to get stuff out
the door quickly, right? It was designed for Macs, but the danger level was tremendous. And Andy,
we've had discussions in the past, especially once DevOps came up and we started talking about
the Toyota factory model, right? But in general, there's different changes during the course of
history, whether you're going from agriculture, industrial,
to compute, they all go through these same cycles. Like N plus one moves from database
to services, right? Similar kinds of situations. And I feel like what we're discussing here
is the idea of instead of safe workspaces, safe code spaces, where
you have now, instead of the open belts flying all around, now they're encased in a case.
So if they snap, they're not going to chop someone's head off. You have ventilation. So all the exhaust going on in the place, you know,
you're monitoring your DNS, you're locking down the firewall first, you have to get the permission.
And it feels like the, you know, on the compute industry, there needs to be, you know, not quite
regulation in that terms, but there needs to be these standardizations to make it a safe space
for code or execution and all that stuff while still being
efficient you brought up that other the other idea with the java piece where it just wasn't
performing and people weren't using it well because it wasn't you know design well all right
that's where you go back to the drawing board then to find a new way to do that people actually adopt
because you can only get to that place if people adopt it and it's um it's just stewing in my mind how how it's always the
same thing all over again right it's just a new environment same thing though let's hope it doesn't
take a hundred years yeah we're on a much different time scale these years these days right everything
is reduced in time on the other hand i feel like security has been like a lingering problem and
we have just gotten used to um it just has to hit you every now and then or it's like isn't it are
you still using the magnetic strip on credit cards in the u.s like in europe i think we mostly got
rid of got rid of them but in the u.s you use them for a long time yeah we use them for a lot longer
than you did some cards still have them but most of the US, do you use them for a long time? Yeah, we use them for a lot longer than you did.
Some cards still have them, but most of the times they're just...
Most stores have the chip reader or the...
What are the...
What's the little...
Yeah, the contactless one.
I think it's just more of the old stores.
I think that the bigger issue on that is,
going back to adoption, is the stores are responsible
for buying the new machine.
So if you're a small store without budget, you're like,
am I going to
pay for that? What's my incentive?
And this goes back to the same thing though, right? What's my
incentive to go ahead and use this new safe
coding thing? Log4J
was a fantastic.
I don't think we heard too much
devastation come out of it. At least I
hadn't. We knew it was there. We knew
everyone was exposed, but we didn't hear any horror stories on the news of companies getting terribly
compromised. So that was really fortunate. But it was also fantastic that it happened
because to your point, it was a real kick in the ass for security. People who weren't
paying much attention, it was everywhere then? We do need those for sure.
The thing is, right,
we probably also don't know which of the attacks. I mean, there's
constant reports. Also, we,
Philip in Austria, we recently had attacks
against certain public agencies,
but nobody reports on it.
What was the attack? How did they get in?
Who knows? Maybe
they were already in, but then
these organizations just
wait for the right moment to strike right so you never know if it's just a stupid phishing attack
or if it's something sophisticated um yeah i'm not taking any bets for for our public sector
yeah hey philip um kind of to i wanna i wanna kind of go into the final stretch here.
We talked about past conference presentations.
And folks, again, we will put the link to your website
where you also explain why your website is called Xera
or why your name is everywhere as Xera,
which is interesting.
I won't spoil it here.
People should go to your website.
But on your website, you also have upcoming
conferences. And if you look there,
DevOps Days Berlin, Jack
Saxony Day, DevOps Days Portugal.
Actually, you're doing this session.
But then you have a couple of talks
where you also talk about
OpenTelemetry, the state of OpenTelemetry,
OpenTelemetry for Java developers.
And also
one that I think is really interesting,
debugging Kubernetes
operators.
And I feel
these are
all interesting topics I would quickly
like to talk about, but I want to give
you the chance to pick one of those and say
this is the session that I'm most
excited about in the upcoming
rounds, and this is the reason why.
I would pick the Kubernetes operator.
I mean, I'm excited about all of them
and I actually need to add a couple more now.
I've fallen a bit behind from updating that list.
So I think there should be more.
But the Kubernetes operators is like,
I feel like I just got very efficient at creating talks.
That's how that talk started.
Because for our stuff, we have an operator
that's getting a lot of development time.
And then there was the problem that our support team
had to support that in production with users.
And then they were like, we're not Kubernetes experts.
We don't really know much about an operator.
How do we even start supporting that?
We're Elasticsearch experts or whatever whatever and then they basically had an internal training or like a scenario of how to
debug common common problems and i i don't want to see say i stole it but i i looked at those and i
kind of like took an inspiration because basically in support they figured out like these are the
most common scenarios that we see what people are doing wrong and how to debug them.
And that's
what our support people should know.
And my take then is I take those materials
and maybe you don't even need to open
the support case, but you can actually figure
it out yourself. And I think that
what we do there applies
very widely to other operators.
It's like you have a
bad reference. You could create actually more complex error's like you you have a bad reference you could create actually
more complex error scenarios like where you do stuff across namespaces and then you hit the bug
and then stuff gets deleted if you delete it in one namespace but it should stay in the second one
but i think it's more about the the basics that like most other tools it's like at first you're
like i have no idea how this works i i don't even know where to start and you're kind of like in this state of shock and you're frozen um and i think
that's not even true for kubernetes it's like any other system like you need a couple of tools and
points to get started and then you just start poking at it and then you see like oh this looks
wrong or or oh there is an error message that looks interesting and where i can go from there
so it's more about showing people,
oh, this is not totally different.
It's like the commands are a bit different
and you need to remember maybe these five commands
around these five things.
And then you can see, oh, yeah, there's this problem
or I can see what is going wrong here.
And then sometimes you need to tie it back a bit more
to fundamental concepts in Kubernetes,
like, oh, how is storage attached or whatever.
But it's still a simple enough starting point that if you do a kubectl describe and then the resource then you often see like where are you even um it's like i don't know in in the good
old linux days you would do a ps and or look you would do a grep or whatever so you also have like a handful of
commands that you just knew and you would run and you would just start cooking in the system and
then you would figure stuff out and i think it's kind of like the same for for kubernetes it's
the problem with kubernetes is like it has a few more layers and then you can go lower and you
might need to know more overall because it's so many layers but just like looking at the top most
layer and the most
obvious things i think is not completely different just to debugging something on
plain linux service it's more like transitioning from this is the other stuff i know and then okay
i need to know a couple of commands but then i can kind of like transfer what i've done in the past
i can transfer over to this new environment.
Just because everybody needs to use Kubernetes nowadays.
And I'm still a bit unconvinced if that is true for everybody or if we're abusing or overusing that.
But it's what everybody needs to do.
So I'm kind of like following along and say,
if you fail or run into problems,
let's take a look at how you can actually dig your way out again.
And like I said, I kind of like shortcuts or efficiency.
So I'm looking at like, this is what support sees.
That's probably a good starting point to turn into a talk.
So that's where I got this from.
Really cool.
And thanks for sharing these details. And folks, again, there will be, I guess,
even more chances to see Philip in the next upcoming weeks.
It's amazing still, the things that you already have on the list.
And if there's even still missing,
then it's even more amazing that you are able to cover all of this
because it's a lot of traveling.
Let's see where we meet the next time.
Exactly.
I was trying to get to go
to berlin but this didn't happen and now something else came up but i'm pretty sure we cross paths
again in real life and then maybe who knows maybe we find a dancing spot even though i am i do i do
i did ballroom a lot but more now i'm more on latin dance salsa that's my my genre um but i only
can know the basic step of salsa yeah it's good you'll have to wear a powdered wig
and puffy sleeves right
philip did we miss anything is there any final thoughts uh for our listeners that you would like to tell them
or no i i mean i hope we did well for the listeners um but i think we we covered a lot of
ground um from austrian culture all the way to computer stuff so if you ever if you ever visit
vienna check out for the guy in the glasses who is dancing walls, probably in the first district somewhere
with tourists.
And run up and stick a screen of Sound of Music
playing in front of him this way. He can't claim he's
never seen it. Force him to watch it.
Yeah, I don't need to. I only know the meme, right?
There is this where the one person is on the
I don't know, it's like
dancing in the traditional skirt or whatever.
I know that meme,
but that's pretty much all I know.
Well, it's
a very Americanized version of
that time period
for sure.
What are you going to do? That's America.
Anyway, this has been fascinating.
I blame it on the generation.
Let's blame it on the boomers, even though that wasn't them
making those. Let's still blame it on them right um this has been fascinating you know you said you hope it
resounds for our audience but as andy and i always say like we we love doing this because
of what we get out of it right we hope the audience comes along with us on the ride
um we don't really hear too much feedback so um i i found this fascinating and really really appreciate you
taking the time from your busy tour schedule you have to open the rolling stone someday
i i'll make a shirt at some point right yeah start selling merch
and you have a tour with all the dates on the back yeah exactly
all right uh andy anything you wanted to wrap with? No, just reminding people, check out the notes.
We will put the links to the blog, to your LinkedIn, to your Twitter, to your GitHub.
Everything.
I mean, you can find everything anywhere on your website, sierra.net.
That's X-E-R-A-A dot net.
But again, you'll find the link there as well.
And once this episode airs obviously Philip we will put it
on social media tag you
so then your followers will find it
and then our followers can then also follow you
and then hopefully everybody's happy
and we all make money
alright thank you so much thanks to our listeners
really had a great time
and see you on the next episode everyone
bye bye our listeners. We really had a great time. And see you on the next episode, everyone. Bye-bye.