PurePerformance - Why is it always DNS, TLS or Bad Config? This and many other learnings from Philipp Krenn

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always I have with me my very wonderful co-host Andy Grabner. Ladies and gentlemen. Hi Andy. Hi. I'm actually surprised that you don't come up with something more hilarious than the last time. You know I forgot. It's been a while since we had a recording session. And I'm like, usually when we do it back to back, I start getting snarkier.

Starting point is 00:00:51 This time it's been a while and I'm like, no, I forgot. Like no instant, no nothing. What's wrong with you? We're back to baseline. Back to baseline, okay. Our guest just made a mistake. He started speaking before he was introduced. Oh my gosh.

Starting point is 00:01:08 Oh my gosh. We need to go. We're going to, let me get that shock button. No, but maybe that's actually a good segue, right? Because maybe we do it a little unusual today. We have a guest today and Brian, today you're actually one US citizen against two Austrians. I know.

Starting point is 00:01:28 Don't be afraid. Well, you are allowed to add a little more Arnold references today, at least twice as many as you typically do. I'll go with Mozart, but I don't know any good Mozart trivia. I'm very happy that we have Philippe Crenn. I'll try to pronounce you in the English way, and I'm really sorry about that, Philippe. Philippe, developer advocate at Elastic. I think you can probably do a much better job than introducing, than me introducing you,

Starting point is 00:01:58 because I already messed up your name. So, Philippe, please introduce yourself to our audience. Thanks a lot. And as you could tell, once you invite me, you cannot shut me up anymore. So I will just get started early on here. So hi, I'm Philipp. I generally just skip my last name because that always leads to this weird pronunciation issue. So I have been with Elastic, the company behind Elasticsearch, Kibana, etc., for more than six years.

Starting point is 00:02:24 I was actually the first one in Austria, and I had my own country for some time. By now, I have to share. You owned your own country? Well, I was the only one in Austria, so I was always jokingly saying, I own Austria for Elastic, even though I'm not sales or anything. But at some point, I started to share my country, and we have a dozen others or so in Austria by now, all in different functions since we are fully distributed.

Starting point is 00:02:55 So it's a bit of a different culture approach. Well, I've been a developer advocate because I like to do conferences. So before COVID, I was doing like probably 220 travel days a year. So I was always going from one city in Europe to the next that's why I know the conference season quite well I have started to pick up again but by now I'm leading the EMEA team and I need to spend a lot more time on internal stuff so I guess I'm down to like 100 travel days a year or so now let's see hopefully fingers crossed covid will stay light

Starting point is 00:03:25 or whatever it is it that it's doing um to so we can actually do some stuff and i think we also learned um in general to be a bit more resourceful and do better planning around what is in person what is virtual what is the right mix though i think we're all still trying to figure out what is the right mix at least that that's my takeaway so far. You know, I think unrelated to our topic, I think somebody has to write some sort of expose, tell-all style book of life on the road as a conference speaker. Like the good, the dirty, the everything.

Starting point is 00:03:58 I think it could be quite salacious, I imagine. I think my trick or tip is that I'm just good at suffering and I don't care so much or maybe that's the frugal approach that I was brought up in I don't know um at least that's mine um Andy what is yours yeah I think the uh first of all it's um your friends should never believe that what you post on social media that this is uh the real story behind it because typically we only post the nice things when we travel to a new city but there is long days a lot of weekends also on the road where you don't get you don't have your regular um let's say kind of rhythm right that others say if you don't go and do certain

Starting point is 00:04:40 things on a monday tuesday or wednesday evening because you're always in a different place i think it's good at least for me to find to find something, though, that you can do that makes you happy. For me, outside of doing conferences, it's dancing. I'll try to find myself some spots in every city. Maybe you have something, Philipp, as well. What do you try to do, what do you like to do on a regular basis when you travel to new places?

Starting point is 00:05:03 So when I travel, i mostly just want to see the city and just walk around a lot but it's true at home um you can actually tell that we're austrian so i'm doing viennese waltz um and i'm actually dancing for tourists and everything in vienna still like i'm with a big dancing school and i i often do like i'm a bit old by now but the others are a bit younger uh but we often do performances at balls and for often for american tourists to come to visit to show them like how vienna really is i'm never sure that's really how vienna is but um yeah i don't want to bash too much on the americans but it seems to be very popular with americans um so it seems to be the the austrian wonderland that we

Starting point is 00:05:39 are projecting here so it's a bit like the the nice pictures that you post on social media about conferences is that you portray Vienna in a way that it was a hundred years ago and that's how we like it. Well, yeah, we like to pigeonhole every city in every country into a specific time period of our ideal vision of it. What an amazing story because I always tell people that Sound of Music is just an American invention. Now you come here and completely cliche the austrian waltzing in vienna i have never seen sound of music though really no other no austrian has seen the sound of music i have seen it twice i have to admit maybe you have too many american

Starting point is 00:06:18 colleagues but like i want to say like they're proper austrians who don't interact too much with americans like nobody even knows about sound of music and hey um quickly picking up on something you said and kind of switching over to the topic um over the last couple of years i think we managed to escape each other even though we were presenting sometimes at the same conferences but this year in july we both finally saw each other in stuttgart and at Java Forum Stuttgart. You had a presentation on Log4J, and I think it's obviously a pretty hot topic. But then in preparation of this podcast, I actually looked at your blog post, which, by the way, we are going to link to.

Starting point is 00:07:02 And I was browsing a little bit through kind of your more recent presentations that you did kind of this year and last year probably as well. And one really piqued my interest and it was called, well, for two reasons. First of all, the title is why is it always DNS, TLS and bad configs? And the second thing is it seems you're also a Harry Potter fan. And because in your slides, you used a lot of references to Harry Potter. But I'm really curious because Brian and I, we often have conversations with either our users or colleagues

Starting point is 00:07:29 or even in podcasts and talk about, damn, why is it always the M plus one query problem that is bringing systems to a crash? Why is it always bad hibernate that is wrongly configured that brings system to a crawl? And now you are telling us three things

Starting point is 00:07:46 that I think we have not talked about, at least as much. Why is it always DNS, TLS, and bad configs? And I think I would like to talk about this and just get your perspective on what you see out there, and especially also what people can do to kind of secure themselves from actually these three things becoming a problem. Yeah, so the the harry potter reference is actually an interesting starting point um it's i mean i i've seen the

Starting point is 00:08:12 movies and i i think they're nice movies um i just wanted to make the talk a bit more fun and the kind of like starting point was maybe you remember there is this scene where mcgonigal um says to harry ron and hermione, why is it always the three of you when something happens? And for me, this is like, in the slide deck, there's also a picture where there's like DNS, TLS, and bad config written on them. Why is it always the three of you? And that's pretty much the picture I have in my head when something happens. And then I looked at through some of the big outages that's happened over the last two, three, whatever years. And it is so often these three.

Starting point is 00:08:49 I mean, people probably still remember that Facebook outage from a year ago where they took everything down, where they allegedly had to chainsaw back into the data center because they couldn't even access anything anymore because the system was so dead and down. And that was a mix of dns and bad conflicts and it just seems to be there are these like underpinnings that we often like to forget that keep everything running and a lot of these technologies are also like super old like dns is not new or fancy it's like this old thing that is actually quite smart. So it actually works around quite a few things. So I recently read a blog post where somebody tried to point like two C names at each other.

Starting point is 00:09:33 So it's A foo.com points to B foo.com and the other way around to create a loop. But for example, DNS is smart enough to figure stuff like that out. So DNS is not stupid. It's just very old um and badly understood and that's i think why for example it's so often done wrong or dls has the classic problem that it's a complex and then b stuff always expires and nobody checks for that and we could do a bit of a better job to do that because it's kind of straightforward but we always forget like some important certificates somewhere in the back end um that

Starting point is 00:10:05 nobody checks or nobody sees and then it runs out everything suddenly stops and then it's probably nobody who has access to that uh can regenerate it that weekend or whenever it happens so it's it's always a bad combination of things but i think coming from the my background is part in in ops and running infrastructure and these are things things that I always feel close to my heart. And I feel like we're often spending so much time on the fancier problems. And n plus one queries are not even the fancier problems. But we spend so much time on, I don't know, you're doing something stupid in the application and your tracing can show it and whatever. But oftentimes, it's like the cheaper and almost dumber stuff that goes wrong and then it's not just like one page fails or one page is slow but it's like your entire application

Starting point is 00:10:50 is dead um those kind of things in infrastructure that can can hit you well also because i guess you know dns and tls are just so so fundamental to everything and it's like you know if i don't know if if you know we don't have electricity anymore, that's fundamental. Nothing works anymore. If DNS is misconfigured, you know, you have to, you brought up the examples. Yeah. Or the other fun thing is that that's what I have done wrong in a previous job

Starting point is 00:11:15 is where I configured like the wrong name servers or I copied for multiple domains and for some domains it was the wrong name server. And then because of the timeout, everything works and works and works until it doesn't. And then you can just watch as stuff kind of like disappears from the internet. And then you know that it will take the timeout again until you can actually fix it. And so you can just sit there and watch and fiddle your thumbs to wait until stuff works again,

Starting point is 00:11:38 which is like a very interesting attribute of DNS. It's not just like, oh, we'll just redeploy the new application and it's fixed and done and we move on. But the time to live of DNS, for example, is something that's kind of also like its own interesting problem. And I feel like observability is the fancy new thing and everybody thinks in like these complex tools that we have. And then like DNS and TLS are very classic monitoring things it's like you ping the certificate how how long is it still valid if it's less than 100 days send somebody an email or start shouting or whatever so so it gets fixed it's not sexy or new or anything but it's still like the

Starting point is 00:12:18 fundamentals that we get wrong so often that's that's why i kind of like like the talk or just to say like oh it's not just this fancy stuff and open telemetry and all the the new things that we can do and it's the the great new world but it's oftentimes it's the fundamentals that will catch up with us again you also did a great job in your slide to think also provide examples right how you can write a simple synthetic test that actually you know validates if your certificate is still valid like you bring up some good points on um you know check it on a daily basis and then 100 days before 60 days before whatever you are then sending alerts and and kind of raising the awareness i'm just wondering why you know we are also representing obviously a monitoring or observability company um you as well right it's like if why

Starting point is 00:13:06 don't we make it easier then for people to automatically do these things we should put these checks into the system by default and just help people unless you say this is counterproductive because then people actually forget that these things exist and then we kind of run into this situation that nobody understands these technologies anymore and if then something breaks nobody knows how to fix it. Like a chicken and egg problem a little bit. Yeah, my take is that it's like we're always running after the new hotness and look at what I can do and what I can figure out. And this is so cool and so smart.

Starting point is 00:13:37 It's almost like conference-driven development where you just pick up the latest and greatest from the conference. And we often forget, like, I mean, I think good old Nagios could do stuff like that already. And nobody wants to use Nagios anymore. And it's just like, it's kind of like a solved problem and then you forget about it until it's fighting you again. So CDD, conference-driven development, but in your role, aren't you kind of feeding this whole thing

Starting point is 00:14:08 because you are showing new cool stuff i guess you also talk about these fundamental things which is important but um i assume you also show new cool stuff and then hopefully not inspiring too many people to just look at the new cool stuff yeah so i'm i will just point at the other side now it's like somebody shows the cool stuff which is, so I will just point at the other side now. It's like somebody shows the cool stuff, which is fine, but it's not how you should build your product or your production environment. You shouldn't just take what somebody has shown you, what looks great at first. I think there's a fine line or mix between showing what is kind of like possible,

Starting point is 00:14:41 but also only picking what is mature. And for example, that's something that I think is possible, but also only picking what is mature. And for example, that's something that I think is important that I will generally only try to show stuff that I believe are reasonable to use in production or whatever, and not something like, oh, we built something and it feels half baked or it solves the wrong problem. Then I will generally try not to actually demo or show that. we like everybody we have some things that are just in the making or maybe that are are also never going to to make it out of the making because you need to experiment um and i i generally try to stay away from those i

Starting point is 00:15:17 think that's that's kind of the responsible thing to do not just show like what is hot but also like what is potentially reasonable to use but obviously everybody like we always like to say it depends on what is the right solution for you so everybody will need to decide for their environment and their skills and tools and whatever what is the right thing so i i think it's like a two-sided argument here who is responsible yeah i still love your cdd conference driven development it's an awesome turn and i think brian who are also a two-sided argument here. Who is responsible? I still love your CDD, Conference Driven Development. It's an awesome term, and I think Brian, who was also chuckling when he said this.

Starting point is 00:15:51 I also like what you said, too. I was going to say another fundamental along with that was you said, I'm not going to show you something that's not solving the right problem. I think that's another key thing is before demonstrating something or even before using a new tool or functionality. It's like, well, what are you trying to solve for?

Starting point is 00:16:10 And I don't think enough people are asking that question. They just want to use that cool new thing. And that can apply not just in development, but any situation. I do all this audio production and I throw in these cool plugins because they're cool. I'm like, well, what am I actually trying to do? I put it on to see what I could do instead of thinking what I want ahead of time. Anyhow, I think those go hand in hand,

Starting point is 00:16:29 the conference-driven development and what problem are you trying to solve? Those are fantastic. We can end here. It's an awesome show, but let's keep going. Just to add on top of that, I always say part of the job is to make people successful and successful can be to solve their problems or to show them what is possible,

Starting point is 00:16:46 but by showing them the wrong stuff that leads into a dead end or brings them into a spot that is not so great. I think you're not doing yourself or your company or your product to favor by leading people into something that looks good, but will not stand the test of time or might not go so well in production. I think that's part of the making people successful. And just one other small addition, like next to the conference-driven development, there's also the CV-driven development,

Starting point is 00:17:15 where people just throw everything into their production just to say afterwards, like, I have used this list of tools. And that's how we end up with these zoos of technology of what people often use which is by the way great for tool providers like us because somebody needs to actually make sure that stays up and running in production afterwards so it's um there's something in there for everybody yeah i think that i think the dns thing it's there's an analogy i was thinking of it, but that analogy then brought up another problem. So the analogy I was thinking of was tickets for an event.

Starting point is 00:17:51 So before we had apps, we actually had to get printed out tickets. And I know it happened to everybody at some point. You do all your prep, you get all your stuff ready, whether or not you're going to go have a party in the parking lot ahead of time or whatever. You have all your stuff ready, you arrive at the parking lot and like, oh my God, I forgot my tickets at home. The fundamental, the DNS, the underlying thing, one of the most critical things, that's just the basic, not all the little, I got my little hibachi grill and all these other things. So there was a great solution for that, which was the app. Now you don't have to worry. You have your phone, you have the app, but two things happen there.

Starting point is 00:18:25 Number one, maybe you didn't download the app and now you're at the event, which may or may not have poor reception. But if the app is down, right? Now everyone's hit. So when you're looking at those solutions, you know, you talk about, can you use a different tool?

Starting point is 00:18:38 Do you use the old tools? Do you need people who just fundamentally know and understand these things? I think that makes the case that you do need those people to fundamentally understand it, because when the new, the easy button for those technologies fails in some way, you have to be able to fall back on somebody,

Starting point is 00:18:54 the old guy with the spider webs on him in the corner who still knows how to do this stuff manually. Or you have taken 100 selfies and you're out of battery now. Yeah. Hey, Philip, coming back to the presentations of DNS, we covered DNS, we covered TLS. You also talked about bad configs, even though, I mean, in general, I think, you know, like DNS, you brought up your own example.

Starting point is 00:19:22 You know, sometimes, you know, it's config problems that lead to this. Any other examples that you have where you say, hey, I would wish that this doesn't happen anymore and this is why I explain it to the people in my talk? I think your N plus one is a great example because that's also been plaguing us as an industry for a long time. And then you have whatever abstraction for your ORM that is potentially doing that or doing other weird stuff um so I think yeah it's the

Starting point is 00:19:52 the little things that we tend to overlook because in theory they're solved but in practice they're not or it's like the case of the bad YAML intonation or whatever, what would be my example for bad configs. Everybody running Kubernetes has, I don't know, how many thousands of lines of YAML lying around somewhere, and I don't know if all of those are correct or doing the right thing. But I feel like there is, as an industry, we're not moving away from making it easy to make some of those mistakes, I think. We keep piling on new layers on top of them.

Starting point is 00:20:28 And then we kind of like forget about the old problem, but they are still lurking down there somewhere to take stuff down. And I guess everybody who has been running something in production has been there to run some bad configuration. And it's always a combination of two or three things. Like one thing alone is normally not enough to throw you off but it's normally two or three things combined just make the right mix to create other chaos could we solve this problem in a way i'm just throwing out an idea and open telemetry brought this up with distributed tracing however you get

Starting point is 00:21:00 to the trace let's assume we get framework vendors or framework developers like those that are providing Hibernate or other abstraction frameworks. We get them to, on the one side, instrument their frameworks with, let's say, OpenTelemetry. And wouldn't it be cool if there would be another standard

Starting point is 00:21:18 on top of OpenTelemetry where you can define bad patterns? Because if you get a distributed trace and that distributed trace shows me the M plus one query problem, one SQL statement, and then five times, 10 times, 100 times the same,

Starting point is 00:21:32 this should be a pattern that should kind of flag that distributed trace later on and say, hey, this is something you may want to look at. And maybe you're not even aware of it, but based on how we intended this framework, this is a distributed trace

Starting point is 00:21:45 that we should probably not see. And I was wondering if, you know, you have some knowledge now on certain frameworks and how things behave and should behave, and then we detect patterns if something is abnormal. We have other countless examples of, you know, because we've been analyzing distributed traces for many years.

Starting point is 00:22:03 Wouldn't it be cool if we could just say, hey, with OpenTelemetry we solved the collection problem, but now we need to figure out a way how to let people and framework providers define what is normal and what is an abnormal pattern? I guess? I feel like we're not there yet. I feel like OpenTelemetry has been very much about data collection.

Starting point is 00:22:27 But then once you hit the backend, then it's like every vendor is on their own and doing their own thing and way. And it took a long time until the data collection was standardized because nobody reasonable wants to have all that vendor-specific stuff in their application and kind of locks them into one vendor so much much so i think that that was an easy sell like the the sell to tell people oh there's this open standard for detection and figuring out what's going on i could see that but

Starting point is 00:22:57 i also feel like it's it's harder to sell to to kind of like the end user because they're like as long as it does the right thing i don't care how it's standardized i it's not in my application and i feel like for a vendor you're not necessarily um like there are not a lot of incentives to share the secret sauce because right now i feel like that is kind of like the secret sauce what sets apart the different vendors i guess it it would be interesting i just feel like it's a hard starting point to say because that's the new unique value proposition of the different tools that, oh, we get open telemetry data.

Starting point is 00:23:31 This is how we can search it, create an alert on it, whatever. There it's deeply in the vendor land. Or maybe I'm just thinking too much from the vendor perspective, but I have the feeling that the incentives there are maybe not as strong. And from the end user, there's maybe not as much push, because it's not in your application. I think, Andy, if I'm

Starting point is 00:23:52 understanding it right, my take on that would be that this wouldn't be from the vendor perspective, it wouldn't be the full scope of how the platform analyzes. There would be a small portion dedicated to open analysis, let's call it, I don't know, that can ingest a predefined pattern from a vendor that we know these certain patterns occur in our situation. When you ingest your code, you can also upload this to your tool if they support it. And your tool will still do all the cool things you do, but once we're in one section of it, we'll be able to ingest these. And when a problem arises,

Starting point is 00:24:27 it can then cross-check it to, is it one of these patterns? Hey, we know it. It's already defined. Bam. So it would only be on those known and existing patterns that vendors supply or other ones people put out there.

Starting point is 00:24:38 But then you still have to do all the magic because, let's face it, 99% of the time, things are hidden behind a million other things or they are different problems i mean it is kind of scary how how often it is the same problem at the end of the day um but obviously there's a it's a lot more complex yeah i mean that was kind of my thinking right let's see coming back to an analogy maybe right if you think about a car and you're you have an engine and then the engine probably has some specifications the engine should has some specifications.

Starting point is 00:25:08 The engine should be operating in a certain temperature range, right? This is something that I can then measure and then alert on. But if the engine is put into a car, then you need some maybe some additional logic that makes sure that whichever person is driving is not hitting a speed limit. But the speed limit depends on where you actually drive. So this is then where the other feedback, let's say more intelligence comes in and where you need more data and then to make better recommendations

Starting point is 00:25:31 to the driver. I'm just saying, right, if you are providing a framework and the framework is observable, then why not give at least a framework of we think as a framework provider, the framework should be used kind of with these constraints.

Starting point is 00:25:45 Or it should run. It runs normally if we see this and runs abnormally if we see that. But maybe I'm going off in the wrong direction. Well, just to round out your car idea, I think cars have that already. You hook up the computer to your car and for the common things, like your oxygen sensor or whatever, the things that they can know about that are measurable and simple, that's something like that is standards. I mean, I can definitely see a battle between vendors on this idea.

Starting point is 00:26:13 It's not, you know, that's more of a very optimistic outtake of big money capital and sharing, which would be awesome if it does, right? But who knows? And I feel like that would be like a classic log statement, no? Something sees that you have a recursive invocation to some degree, you could potentially just do a one logging. I feel like the

Starting point is 00:26:37 tooling is almost there. And I'm also slightly torn if the tool should even allow that or should require a special flag for some of these things. Like the M plus one query problem is, it's of course hard in the end, but I feel like in theory, this is like a known thing that you have like one query runs more than 10 other queries. That's like a really weird sign normally or not something you want to have and in most cases so maybe it should fight back harder from the start i know but the same with expired certificates right it should be a fixed problem but yet it brings down very popular websites every year and this is

Starting point is 00:27:16 all right um switching gears a little bit um because you know we obviously we both we all are very interested in making systems observable and through different means of looking at different types of data. One of your recent talks and this is the one I think you also had in Stuttgart was around log4j. I think we don't need to explain what log4j is and log4shell and the exploits because Brian I believe we've covered this a little bit in previous episodes. But I would just like to get your take on it. What you are, you know,

Starting point is 00:27:52 because unfortunately I haven't seen your talk. I kind of clicked through the slides a little bit, but I haven't seen your talk. I'm just interested in your role as a developer advocate. What do you advise people to avoid these problems in the future? What kind of best practices do you advise people to avoid these problems in the future what kind of best practices do you give them yeah so my talk is actually kind of like almost broken into two parts because elastic search plus lock search to some degree um have been using um lock for j for

Starting point is 00:28:20 logging for a long time and we're almost affected by it but especially elastic search not because of the java security manager and that's kind of like what i'm what is like the first part or half of the talk almost it's like looking at why is it hard to detect that this is a problem or not a problem so for example that those are not aware the java security manager is unfortunately this is going away but it's basically something where you can create your own sandbox to some degree of what you can do from a Java application. And the sandbox that we have put in place is that only very few packages of Elasticsearch, for example, can do a network call. So our network library is Netty, and Netty can do outgoing network calls or can bind to an interface and whatever. Most other pieces of the code cannot do that.

Starting point is 00:29:06 For example, our logging library cannot do networking calls, except for DNS lookups. But that's why, for example, an Elasticsearch is not a remote code execution because it just cannot fetch remote code. It could only use code that is already around and work with that, but it could not reach out to any system to load any remote code, which I think is interesting how you can generally solve that problem. So even if you have bad security issues, and like we can discuss for a long time

Starting point is 00:29:34 if a logger should even have these features or not, how can you as an application developer or provider, especially if you have widely used tools, can protect against stuff like that. Just to say, for example, my logging library doesn't need to write to the network. It needs to write the file. That's pretty much all it needs to do. It shouldn't have the rights to do other stuff. Why is that not a more common thing? And why is it actually very hard to get right? And why it took us a long time to get to that point. And then of course the second part is more on the detection side where we have

Starting point is 00:30:11 basically I think two angles where from the observability side where we can see what is the application doing. Is it like it's suddenly doing network calls to some other IPs or DNS names that it shouldn't, which is definitely a weird sign and you can suddenly see it's always doing a get on whatever code and then it's being executed, so that's not a good sign. And the other thing is, since we have more security tooling around

Starting point is 00:30:38 that now as well, we can actually see the processes and we could see that the Java process, for example, spawns another shell or whatever out of the java process which is also not what you commonly do in your web applications um that you spawn out and then do another wget or whatever whatever people are running so we we have that the second side of how to protect against that but the the first half is more like how do we protect our own products or make them more secure by default which i think is um something we as an industry also

Starting point is 00:31:12 haven't made a lot of progress in like there is a lot of theory um out how to write more secure programs and i mean very few or almost nobody manages memory manually anymore like they are they're even in in c++ you have proper tooling around that anymore so we have made some progress but some other stuff that is again surprisingly simple like why don't we limit what can do network calls or or what system calls are even possible from an application um why is that not more widely used and i guess it's because most of us just want to ship fast. Yeah, I think you bring up a really good point.

Starting point is 00:31:48 And I think sometimes people that complain about why is it so hard to get these permissions? Why don't we have just everything on because they want to ship faster? I think these stories are then a great reminder that there are reasons why by default we block as much as possible. And actually at the end you have to ask for permission you have to grant an additional permission if you need to make

Starting point is 00:32:10 a certain call uh consciously right and i think that's that's the great piece uh or the great advice and kind of the reminder that there's a reason why we are very restrictive and i think that that's a great point then i also want to highlight whether because you know maybe not everybody's a java expert like you are maybe but but maybe they are responsible in in fighting things like this you talked about the java security manager um that this is a component that allows you to define what's possible and what's not and you said it's going away in in favor for something else or they don't have a proper replacement but it's so that the problem with the java security manager was that it was not very widely used and it created like both development

Starting point is 00:32:50 work and it was always a bit scary for us because there are very few java projects that are using it heavily um so it doesn't get the exposure that we would like to see from others and well it's going away and we are working on replacements around better modularization in code and what can invoke what, and to have more boundaries between code. And so if you have a vulnerability in one part, then it will not own the entire process, for example. So there is work in progress.

Starting point is 00:33:20 We'll see how to get to that. We have some people who are very deep in the Java ecosystem to work on that. I think Oracle and others are also still building a bit more in that area. But I think the official take is that the Java Security Manager was interesting. It was just too much work for the average application to use it.

Starting point is 00:33:38 And that's kind of why it has failed. And we're not really happy about that because it really saved our bacon with lock for shellll but we can also see that it's not as widely used as we would want or others on the JVM would want so it it's maybe not the right abstraction or it's not the right thing right now and we'll we'll see where that ends up but it was one of the very cool features of the JVM, I would say. Yeah, what I've seen at conferences, people present, and I guess, again, I'm not a security expert at all, so I'm just consuming information,

Starting point is 00:34:17 and hopefully I kind of repeat it now in the correct way, but people have shown using eBPF to basically then block calls when an application is opening up a port or it's opening up, making requests, right? That's one option. And then obviously policy managers, right? If you think about Kubernetes with policies or also the privileges that the service account has, you can really restrict an environment very nicely.

Starting point is 00:34:40 So what it can and what it cannot do. It's just ties into like different environments and then it's like oh you need to have something on the kubernetes level and then you have ebpf but that depends also on depending on the kernel version and then what do you do on windows whereas in in a jvm it was like just one thing that you had to do once basically so it's um i mean i think again like in most cases, there are like a hundred options. It's just like, what is efficient and like manageable? We'll see.

Starting point is 00:35:13 But we already have a, like, for example, we do a system called filtering, like Elasticsearch couldn't fork another process. That's just like, we don't give the process the permission because we don't need to fork another process out of Elasticsearch. It's not a thing. This conversation reminds me, not that I lived through it, but if you think back in the industrial revolution era,

Starting point is 00:35:30 where you had these factory floors with conveyor belts and spinning machine parts, people working all under it. You had fires burning stuff inside, no ventilation, no lighting. It was a very unsafe workspace, but it was designed to get stuff out

Starting point is 00:35:46 the door quickly, right? It was designed for Macs, but the danger level was tremendous. And Andy, we've had discussions in the past, especially once DevOps came up and we started talking about the Toyota factory model, right? But in general, there's different changes during the course of history, whether you're going from agriculture, industrial, to compute, they all go through these same cycles. Like N plus one moves from database to services, right? Similar kinds of situations. And I feel like what we're discussing here is the idea of instead of safe workspaces, safe code spaces, where you have now, instead of the open belts flying all around, now they're encased in a case.

Starting point is 00:36:22 So if they snap, they're not going to chop someone's head off. You have ventilation. So all the exhaust going on in the place, you know, you're monitoring your DNS, you're locking down the firewall first, you have to get the permission. And it feels like the, you know, on the compute industry, there needs to be, you know, not quite regulation in that terms, but there needs to be these standardizations to make it a safe space for code or execution and all that stuff while still being efficient you brought up that other the other idea with the java piece where it just wasn't performing and people weren't using it well because it wasn't you know design well all right that's where you go back to the drawing board then to find a new way to do that people actually adopt

Starting point is 00:36:59 because you can only get to that place if people adopt it and it's um it's just stewing in my mind how how it's always the same thing all over again right it's just a new environment same thing though let's hope it doesn't take a hundred years yeah we're on a much different time scale these years these days right everything is reduced in time on the other hand i feel like security has been like a lingering problem and we have just gotten used to um it just has to hit you every now and then or it's like isn't it are you still using the magnetic strip on credit cards in the u.s like in europe i think we mostly got rid of got rid of them but in the u.s you use them for a long time yeah we use them for a lot longer than you did some cards still have them but most of the US, do you use them for a long time? Yeah, we use them for a lot longer than you did.

Starting point is 00:37:45 Some cards still have them, but most of the times they're just... Most stores have the chip reader or the... What are the... What's the little... Yeah, the contactless one. I think it's just more of the old stores. I think that the bigger issue on that is, going back to adoption, is the stores are responsible

Starting point is 00:38:02 for buying the new machine. So if you're a small store without budget, you're like, am I going to pay for that? What's my incentive? And this goes back to the same thing though, right? What's my incentive to go ahead and use this new safe coding thing? Log4J was a fantastic.

Starting point is 00:38:17 I don't think we heard too much devastation come out of it. At least I hadn't. We knew it was there. We knew everyone was exposed, but we didn't hear any horror stories on the news of companies getting terribly compromised. So that was really fortunate. But it was also fantastic that it happened because to your point, it was a real kick in the ass for security. People who weren't paying much attention, it was everywhere then? We do need those for sure. The thing is, right,

Starting point is 00:38:48 we probably also don't know which of the attacks. I mean, there's constant reports. Also, we, Philip in Austria, we recently had attacks against certain public agencies, but nobody reports on it. What was the attack? How did they get in? Who knows? Maybe they were already in, but then

Starting point is 00:39:03 these organizations just wait for the right moment to strike right so you never know if it's just a stupid phishing attack or if it's something sophisticated um yeah i'm not taking any bets for for our public sector yeah hey philip um kind of to i wanna i wanna kind of go into the final stretch here. We talked about past conference presentations. And folks, again, we will put the link to your website where you also explain why your website is called Xera or why your name is everywhere as Xera,

Starting point is 00:39:40 which is interesting. I won't spoil it here. People should go to your website. But on your website, you also have upcoming conferences. And if you look there, DevOps Days Berlin, Jack Saxony Day, DevOps Days Portugal. Actually, you're doing this session.

Starting point is 00:39:53 But then you have a couple of talks where you also talk about OpenTelemetry, the state of OpenTelemetry, OpenTelemetry for Java developers. And also one that I think is really interesting, debugging Kubernetes operators.

Starting point is 00:40:10 And I feel these are all interesting topics I would quickly like to talk about, but I want to give you the chance to pick one of those and say this is the session that I'm most excited about in the upcoming rounds, and this is the reason why.

Starting point is 00:40:27 I would pick the Kubernetes operator. I mean, I'm excited about all of them and I actually need to add a couple more now. I've fallen a bit behind from updating that list. So I think there should be more. But the Kubernetes operators is like, I feel like I just got very efficient at creating talks. That's how that talk started.

Starting point is 00:40:48 Because for our stuff, we have an operator that's getting a lot of development time. And then there was the problem that our support team had to support that in production with users. And then they were like, we're not Kubernetes experts. We don't really know much about an operator. How do we even start supporting that? We're Elasticsearch experts or whatever whatever and then they basically had an internal training or like a scenario of how to

Starting point is 00:41:11 debug common common problems and i i don't want to see say i stole it but i i looked at those and i kind of like took an inspiration because basically in support they figured out like these are the most common scenarios that we see what people are doing wrong and how to debug them. And that's what our support people should know. And my take then is I take those materials and maybe you don't even need to open the support case, but you can actually figure

Starting point is 00:41:35 it out yourself. And I think that what we do there applies very widely to other operators. It's like you have a bad reference. You could create actually more complex error's like you you have a bad reference you could create actually more complex error scenarios like where you do stuff across namespaces and then you hit the bug and then stuff gets deleted if you delete it in one namespace but it should stay in the second one but i think it's more about the the basics that like most other tools it's like at first you're

Starting point is 00:42:02 like i have no idea how this works i i don't even know where to start and you're kind of like in this state of shock and you're frozen um and i think that's not even true for kubernetes it's like any other system like you need a couple of tools and points to get started and then you just start poking at it and then you see like oh this looks wrong or or oh there is an error message that looks interesting and where i can go from there so it's more about showing people, oh, this is not totally different. It's like the commands are a bit different and you need to remember maybe these five commands

Starting point is 00:42:32 around these five things. And then you can see, oh, yeah, there's this problem or I can see what is going wrong here. And then sometimes you need to tie it back a bit more to fundamental concepts in Kubernetes, like, oh, how is storage attached or whatever. But it's still a simple enough starting point that if you do a kubectl describe and then the resource then you often see like where are you even um it's like i don't know in in the good old linux days you would do a ps and or look you would do a grep or whatever so you also have like a handful of

Starting point is 00:43:05 commands that you just knew and you would run and you would just start cooking in the system and then you would figure stuff out and i think it's kind of like the same for for kubernetes it's the problem with kubernetes is like it has a few more layers and then you can go lower and you might need to know more overall because it's so many layers but just like looking at the top most layer and the most obvious things i think is not completely different just to debugging something on plain linux service it's more like transitioning from this is the other stuff i know and then okay i need to know a couple of commands but then i can kind of like transfer what i've done in the past

Starting point is 00:43:42 i can transfer over to this new environment. Just because everybody needs to use Kubernetes nowadays. And I'm still a bit unconvinced if that is true for everybody or if we're abusing or overusing that. But it's what everybody needs to do. So I'm kind of like following along and say, if you fail or run into problems, let's take a look at how you can actually dig your way out again. And like I said, I kind of like shortcuts or efficiency.

Starting point is 00:44:10 So I'm looking at like, this is what support sees. That's probably a good starting point to turn into a talk. So that's where I got this from. Really cool. And thanks for sharing these details. And folks, again, there will be, I guess, even more chances to see Philip in the next upcoming weeks. It's amazing still, the things that you already have on the list. And if there's even still missing,

Starting point is 00:44:35 then it's even more amazing that you are able to cover all of this because it's a lot of traveling. Let's see where we meet the next time. Exactly. I was trying to get to go to berlin but this didn't happen and now something else came up but i'm pretty sure we cross paths again in real life and then maybe who knows maybe we find a dancing spot even though i am i do i do i did ballroom a lot but more now i'm more on latin dance salsa that's my my genre um but i only

Starting point is 00:45:06 can know the basic step of salsa yeah it's good you'll have to wear a powdered wig and puffy sleeves right philip did we miss anything is there any final thoughts uh for our listeners that you would like to tell them or no i i mean i hope we did well for the listeners um but i think we we covered a lot of ground um from austrian culture all the way to computer stuff so if you ever if you ever visit vienna check out for the guy in the glasses who is dancing walls, probably in the first district somewhere with tourists. And run up and stick a screen of Sound of Music

Starting point is 00:45:50 playing in front of him this way. He can't claim he's never seen it. Force him to watch it. Yeah, I don't need to. I only know the meme, right? There is this where the one person is on the I don't know, it's like dancing in the traditional skirt or whatever. I know that meme, but that's pretty much all I know.

Starting point is 00:46:07 Well, it's a very Americanized version of that time period for sure. What are you going to do? That's America. Anyway, this has been fascinating. I blame it on the generation. Let's blame it on the boomers, even though that wasn't them

Starting point is 00:46:23 making those. Let's still blame it on them right um this has been fascinating you know you said you hope it resounds for our audience but as andy and i always say like we we love doing this because of what we get out of it right we hope the audience comes along with us on the ride um we don't really hear too much feedback so um i i found this fascinating and really really appreciate you taking the time from your busy tour schedule you have to open the rolling stone someday i i'll make a shirt at some point right yeah start selling merch and you have a tour with all the dates on the back yeah exactly all right uh andy anything you wanted to wrap with? No, just reminding people, check out the notes.

Starting point is 00:47:09 We will put the links to the blog, to your LinkedIn, to your Twitter, to your GitHub. Everything. I mean, you can find everything anywhere on your website, sierra.net. That's X-E-R-A-A dot net. But again, you'll find the link there as well. And once this episode airs obviously Philip we will put it on social media tag you so then your followers will find it

Starting point is 00:47:30 and then our followers can then also follow you and then hopefully everybody's happy and we all make money alright thank you so much thanks to our listeners really had a great time and see you on the next episode everyone bye bye our listeners. We really had a great time. And see you on the next episode, everyone. Bye-bye.

PurePerformance - Why is it always DNS, TLS or Bad Config? This and many other learnings from Philipp Krenn

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.