PurePerformance - 034 Monitoring at Facebook & How DevOps Works with Goranka Bjedov

Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatches ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hey, we're back still with Karanka from Facebook. And hopefully you listened in to the first episode where I thought it was just enlightening to hear how Facebook is doing performance engineering, capacity management, how they are deploying new features fast to figure out even faster if they should forward, if they should keep these features or kill them or improve them using New Zealand as the test bed, as many other companies do in the world as well. So we're still here because I still have a couple of questions.

Starting point is 00:01:00 Brian and Goranka, are you still with me? Sure. I'm still here. Awesome. So one thing that I have two different topics that I would like to cover, just quickly pointing them out, large-scale production monitoring. You said, obviously, you built your own tools because for that scale, it's hard to find vendors that actually build these tools. I would like to, on the one side, understand kind of what are the key metrics, what are the key things you look at, especially knowing that you have multiple layers of services. What are the things you look at in production so that we can all learn a little bit about the key metrics and what you do with them? And the second topic goes more towards some of the DevOps best practices. I know we heard a little bit about how you do rollouts and that you allow teams to do their own stuff. But I have a couple of more questions on how DevOps kind of works at Facebook. The topic about the Dev Karma, I know one of your colleagues has been promoting that.

Starting point is 00:01:59 But first, kind of going into performance monitoring in production how do you get a handle how do you get an overview of the performance of that many thousands of servers that run so many thousands and maybe millions of instances of services from different types of services that are related to each other and depending on each other how does this this, from a high-level view, what are you looking at? So, actually, let me tell you first how we approach the problem because I think we've done this right. And it doesn't happen very often.

Starting point is 00:02:36 We mostly do things and then we kind of make them right, but this one we actually designed correctly from the beginning. So we have these three tiny little daemons that run on every single one of our machines in the data center. And the first daemon, the one that I like the most, is Dynalog. And that's because when I joined Facebook, that was the thing that I was working on. And it's this really tiny daemon that the only thing it does is it collects pretty much all of the performance data, everything that Linux kernel knows about on a per second basis and exports it to the thrift port 1777, but it doesn't matter. And so the only thing it does is it collects the data. So anything, for example, in Slashproc or, you know, if you know of the kernel data structure task stats, you know, anything that is in there, Dynalog collects and just kind of sends it out.

Starting point is 00:03:38 The other things can now subscribe to those, you know, to those basically streams and can collect, consume them, do whatever they want. We have a second little demon that just wakes up every 10 seconds and takes a top and puts it in a file. And that's the only thing it does. And the third one is really an interesting one. And it may shock people listening to this. We call it strobe light.

Starting point is 00:04:04 And it basically wakes up. I'm not going to say how often. It kind of rolls a very large multi-sided, you know, and a whole bunch of them dice. And then if the number comes up correctly, it crashes the machine and it grabs all of the stack traces of everything. And those three demons are basically the bottom of our pyramid of monitoring. On top of that, each one does one thing and does one thing well, right?

Starting point is 00:04:32 So we've used the Unix principle, you know, pick one thing to do and then do it well and you will be okay. And so on top of it, we have built different teams, built things that they need. We have from different dashboards to alerts to, you know, whatever you want to think about, we have. One of my colleagues on the capacity team has combined Dynalog with the data available in something that we call SMC, but basically it tells you which machines work together as groups and has created these dashboards that tell us, you know, how is the load distributed across those groups of machines, especially as they're distributed across the world and across our data center. And so we can easily see, hey, for this particular product, you seem to have a lot more load in, I don't know, Oregon than you have in Sweden.

Starting point is 00:05:27 Why? What is going on? What are your load balancing rules that are causing this to happen? This is also for dependency detection, obviously, right? Correct, yes. But then a different engineer, and so the first one was built by Hau. The second one, Dennis, on the other hand, goes now into these subgroups of servers and basically does all of the automatic performance analysis that you could look at. So he looks at these servers automatically and we call it

Starting point is 00:05:57 Perf Doctor. And it just basically says, hey, here are the things that I have heuristically determined could potentially be problems for you related to CPU, related to memory, related to maybe disk, related to flash, related to anything else. And so you come in there and you find, hey, you have a lot of LLC misses or, oh, my God, you know, you have a lot of interrupts. An engineer wouldn't necessarily even think about how do I figure know, you have a lot of interrupts. An engineer wouldn't necessarily even think about how do I figure out if I have a lot of interrupts, but you come to Perf Doctor, which is automatically tied to the other thing that we call Overwatch. And, you know, look at

Starting point is 00:06:36 this and you go like, geez, you know, I have these red things over here. What do they mean? And you can contact your capacity and performance engineer and we can work on fixing those immediately and in the process um a lot of a lot of my back-end engineers like learning you know some of this stuff because not everybody can be an expert performance engineer i like i like this a lot uh just interrupt you quickly so what we we have a similar thing right i mean obviously we built a performance monitoring product and we within banner trace we we, we know how to use the product, but our many customers out there, they might not be performance experts. So what we actually did, we automated problem pattern detection and same thing. We automatically say, hey, since the last deployment on these transactions, you have the N plus one query problem to the database, or you have a heavy

Starting point is 00:07:23 asynchronous threading and it increased by 50 percent since the last deployment and then so as you said not everybody is an expert to a understand all the patterns out there and b don't even know where to look for and how to analyze the data so we just bubble it up i like perf doctor that's a that's a nice one yeah yeah i think dennis did a phenomenal job with it it's It's a really fantastic tool and he's still adding features to it. You know, I mean, think about it, like if you have Flash, right? Flash a whole bunch of servers with flash in there, I don't necessarily have to be an expert on, you know, when is this flash going to run out of its write cycles and stuff? Why, why would I have to worry about that? Because, Hey, are you going to also make me worry about, you know, exactly how my nicks are behaving and so on. You can't, you cannot today require,

Starting point is 00:08:24 you know, an engineer and, and have them move fast, right? Because keep in mind that the rule is move fast. And so what we do is we provide all of these tools that allow it, um, you know, to, to, for people to look at what, what we think is wrong. And then, you know, what you will find out, not every engineer is interested in learning all this stuff, but typically, you know, one or two per team will jump in and say, well, you know, what does this mean? You know, why are you saying this? Like, what's this LLC thing? Why, you know, you're telling me use huge pages.

Starting point is 00:08:55 I don't even know what, you know, what are you talking about? You know, and those are the ones that you sit down and you explain, hey, and so here is how this works. And now see, like compile it this way and see what you get. Or, you know, if you switch to the later version of the compiler, you could potentially get this win or that win. And, uh, you know, not, as I said, not every engineer will be excited about it as much as we are, but you know, all you need is one or two and, uh, you know, and you move kind of that stuff upstream and people start monitoring and tracking their own stuff. It works really well.

Starting point is 00:09:31 It's kind of like you provide performance engineering service as a service. Correct. But we're also very lucky, right? To answer your second question, the one that you said you wanted to discuss, we have a class of engineers that we call production engineers. I think in the industry they are really commonly called service reliability engineers, SREs. I think our guys do more than just service reliability, so we like calling them product or production engineers. And those are the engineers who both know the code and are system administrators and are shrinks on the other side and know how to deal with people.

Starting point is 00:10:12 And so they're kind of really rare, hard to find. And I love working with them because in general, they are the people who every large team of developers will have a subset of production engineers dedicated to just sort of working with them and helping them, you know, maintain the product, deploy the changes. They will try and catch things early on. And I tend to work very, very closely with them. They are, like if a production engineer comes to me and says, look, I'm in real trouble and I need, you know, X machines, most of the time, I don't even question

Starting point is 00:10:50 it. I will just do what they ask me to do. If a developer comes to me and says, I'm in a real trouble and I need X machines. And most of the time I just laugh, right? And it's like, because I know, you know, it's like, come on, you don't even know what you're talking about. Right. So the big difference is, you know, because Hey, like any other company, you try to hire smart people. Smart people will look for easy solutions. If that weren't the case, we'd still be living in caves, right? And so a smart developer will look at a problem and say, you know what, I need more servers. That's the easy thing for them.

Starting point is 00:11:22 A production engineer, on the other hand, has far more of a conflict of interest. If they have more machines to manage, that increases the difficulty in their job. And so they have vested interest in actually analyzing the situation and saying, do I really need those additional servers or could I just recompile this in a different way and get the benefits of this or that?

Starting point is 00:11:44 And so that's why I tend to trust production engineers by default, and I tend to not trust developers by default. And by the way, my teams know that, right? We kind of joke about it, and that's perfectly okay. Wow. So the Dynalog, then the ATOP, the Strobelite, the PerfDoctor, you mentioned Overwatch. That means these are all there to collect data. Now, you, if you're responsible for this large infrastructure, when do your alarm clocks go off? So luckily, the way we look at capacity, capacity should never be an emergency, right? I even have a sign on the desk that says your lack of planning is not my emergency. We monitor all of these things.

Starting point is 00:12:37 We very closely monitor especially large services to see how they're behaving and so that we can jump in and fix things. But, you know, occasionally things will happen in the world that will cause alarms for us, right? In, I think, 2012, don't quote, but it was either 2011 or 2012, there was this huge flood in Thailand. And it turns out that in order to make a hard drive, there is this small piece of it that is only produced in Thailand. And it turns out that in order to make a hard drive, there is the small piece of it that is only produced in Thailand. And so when, uh, when all of the factories flooded, we kind of knew that, uh, at that point in time, there's going to be shortage of, uh, of disks

Starting point is 00:13:17 around the world. Uh, right now, I think pretty much everybody knows that, uh, you know, there is, there is memory issues being, uh, you know, on the field and stuff. And so when that happens, we basically have an ongoing alert of trying to figure out how do we mitigate the, you know, something that is happening in the world. So what do we do? There is not going to be disks, new disks to buy. So, hey, let's see what are the disks that we currently have in our machines. Can we extend the life of some of those? What state are they in? Can we reuse them in a different way? You know, what can we buy on secondary markets? And it's, it's one of those things where, uh, where you get closely involved with all of the other teams.

Starting point is 00:13:58 You know, I am going to, in that situation, I'm going to work very closely with my purchasing teams, the VMO, um, and, and try to find out what can we buy for how much? What can we, you know, cannibalize from our own fleet, right? So here are the machines that are coming to the end of life. Are there pieces there that I could reuse and stuff like that? And so in those situations, yeah, you kind of, you tend to work, you know, a little bit more. But I don't have, none of us carry pagers, none of us on call in the sense that if the phone goes off at 3 a.m., you know, you have to jump. We do have times, and I know I've mentioned it, there are times when we will run experiments and take down a whole, what we call a region, a whole bunch of data centers in a particular area.

Starting point is 00:14:48 But those are usually planned. So far, we haven't taken a whole region down accidentally, but those are usually planned. test is going on, pretty much a subset of capacity and performance engineers will be kind of on call the whole time and available monitoring, you know, what is going on and, you know, how will things finish. We've actually had one time when something like this happened accidentally, but we were prepared, right? If you remember Hurricane Sandy. Oh, yeah. prepared right if you remember hurricane sandy oh yeah um so hurricane sandy ended up ripping up a

Starting point is 00:15:26 lot of cables between europe and um and the u.s and uh you know pro tip uh and sort of a plug for an ex-colleague of mine uh carlos bueno wrote a phenomenal blog post uh about you know where the cables go and compared the current layout of all the cables that we use for internet communication with the trade routes that the ships used to take 100 to 200 years ago. And you basically get a one-to-one match of all of these things,

Starting point is 00:15:57 which is kind of pretty fascinating if you think about it. And so the cables are all coming to the US from Europe, basically kind of in one or two places. And Sandy went through one of those and ripped the cables and kind of left one of our regions without or kind of left Europe really disconnected from the rest of the stuff. And so at that point in time, you know, you sit down and you see what you can do. How do you read out traffic in different ways? I have a truly phenomenal networking team.

Starting point is 00:16:25 Love those guys. And so we managed to get through everything without any problems. And, of course, we knew that Sandy was coming. We had a couple of days warning. And so we kind of analyzed what are the things that could be impacted. I have to admit, I don't know if anybody thought about the cables being pulled. I certainly didn't hear of somebody doing it but we consider the situation that the storm shifts a little bit and takes out uh you know our whole slow data

Starting point is 00:16:51 center in this region and so we prepared for that uh and then the rest of it just kind of follow through wow hey um now quickly to my my second topic um so devops devops at facebook i mean you mentioned that developers can basically you know push code through uh fast you know they encourage that now they um if i'm a developer if i would start tomorrow and and what what what what am i allowed to do am i already allowed to do changes that make it all the way to production? What are my safety nets and what are my responsibilities? Oh, absolutely. So you start, when you start at Facebook as an engineer, you start in a boot camp.

Starting point is 00:17:36 We're kind of an unusual company in the sense that when you join Facebook, you don't necessarily know which team you will work on because you get to make a pick. You decide who do you want to work with. So you want to work on newsfeed or on ads or on search, or is it Android that, you know, is the stuff that you want to work on, but you get to spend three months, sorry, not three months, six to six to eight weeks in bootcamp. And during your bootcamp time, you will end up working in all aspects of all of the codebases. To give you an example, I remember the first time I started and they gave me a couple of bugs. We have this separate class of bugs that we call bootcamp bugs, which are relatively simple. So bootcamper can fix it, but it requires you to get into the codebase and sort of figure how things are working.

Starting point is 00:18:22 And so I looked at my first bootcamp bug and it was in the PHP code base. And so I went to my bootcamp mentor and I said, somebody made a mistake. You see, I'm a performance person. I'm a C++ person. I mean, there is no way you want me writing in PHP, right? It's like, are you kidding me?

Starting point is 00:18:39 I mean, I despise PHP. There isn't enough words in the English language to express my distaste for PHP. And, you know, and my bootcamp mentor said like, and that is exactly why we gave you this. You'll see you don't have only one, you have multiple because you still need to know what the code base looks like. And, you know, we just wanted to make you cry. Right. And he wasn't thinking about me programming in PHP.

Starting point is 00:19:01 He was talking about me looking at what the code looks like. Right. Um, you know, and so, um, so yeah, my, uh, my code change ended up landing in production within, uh, within a week. Um, and, uh, hopefully the, the changes are small and you're not going to take the site down during that time. Uh, what would happen is, you know, we would on, I think on Friday or on Saturday, the code would be pushed so that we Facebook employees would basically end up using this new Facebook. And then on Tuesday, so we use it for a couple of days. And then on Tuesdays,

Starting point is 00:19:33 we pushed to everybody. Again, this was like six years ago. And, and so it happened frequently that, you know, you'd log on on Saturday or Sunday and there would be like, Hey, you know, is, is the top bar missing for everybody, right. Or something like that. And then you fix it before it goes into, you know, that's, that was the whole point. Um, but, uh, but yeah, you would, uh, within the first six to eight weeks, you're going through bootcamp, you're getting, uh, in the first part of that, you get a random set of bugs to work on that will probably toss you into front-end code and into some back-end code. And then after about three to four weeks, you kind of decide, you know, to kind of spend more time in that code base and see you work closer with those teams. And at the end of your bootcamp time, you decide, hey, I would like to work on News Feed.

Starting point is 00:20:37 And the News Feed team says like, yeah, you seem OK. You know, you've done OK and we collaborate well with you. And so you move to the newsfeed team or you know likewise you know i would really like to i don't know work on on instagram and join facebook yeah but but instagram is where where it's at for me you know i i'm for some reason in love with django and python and this is where i want to go and so you join instagram team but you have you have that choice but you start pushing things into production immediately. Uh, we find that, uh, it is amazing how quickly people lose the fear. If you push a small thing and you see how the whole thing works, you suddenly sort of relax

Starting point is 00:21:18 a little bit because you realize, you know, there are other people looking over these things. And so if I, if I make a mistake, most likely likely or hopefully it's going to be caught during the code review. If not, it's going to be caught along the way through the pipeline. And I think it frees people up a little bit to be freer because, interestingly enough, and I actually completely agree with this, the worst thing you can have is when people are afraid to to do things and make mistakes uh that paralyzes you and you just make no progress you know that that's that's very similar in the vein of an artist right trying to create something is all the self-censorship and self-doubt of putting something out or continuing with the project because of that fear of what's going to happen when it goes out and in a similar way with this you have the idea of i can put this out

Starting point is 00:22:11 without fear because i know there are checks and balances and probably gives developers a freer reign to create or experiment with new ideas that they might not have if they had to think of it all the way through. Absolutely. And, you know, I think the second thing with that is the culture. And I hope every, you know, coding, programming, you know, engineering shop takes this very seriously is never blame the person that made the mistake. It isn't that person's fault. If somebody brings all of Facebook down accidentally on the inside, that's not that person's fault. That's the fault of all of us who have been there for years and have not realized that they have created the system that has this one single point of failure that can take it down. And the fact that the person ran into it,

Starting point is 00:23:02 that's not that person's fault. That's our fault, right? Don't blame the person that has just been there for three weeks. I mean, what the heck does he know or she? Yeah. So what I would like also to understand, if I'm pushing my feature out, then am I responsible? Am I responsible to keep monitoring it in production from a – is it successful? Is it not successful?

Starting point is 00:23:27 Is it performing well? Or is this a service that you deliver? Or is it something that I have to then sit down and say, okay, I believe this is crap. Nobody needs it. And then rip it out. Or I believe, oh, this is really cool, but now I need your help to make it more efficient. How does this work? Have you ever met a developer who pushed something out and said,

Starting point is 00:23:46 I think this is crap and we should pull it out? Never, never happens, right? So one of the things that I do before they push out, I ask for sort of written or verbal commitment on what is considered a success. Because it is interesting how people will say, well, you know, if people like it, uh, you know, then we'll, we'll have it stick around. And it's like, okay, well, what does that mean? I need measurable things. So, you know, you can tell me, okay, if, um, you know, increase of, I don't know, 10% more people using the feature is a success. Okay. You know, because I don't know i can't judge that but i want that number up front and you know what's what's failure if it's less than three

Starting point is 00:24:32 percent people using the feature is that a failure you have to commit to that before we push the product and because otherwise you know you push the product and no matter what the numbers are the developer will declare it a success and I will look at it and go like, well, this should have been better. And so it's nice to have that agreement up front. And that doesn't mean that once the things are pushed, and let's assume that you end up with these two and a half percent. And so it looks like failure, but there are other indicators that show that if you only change this thing here and that thing there, it could be a lot better. Well, fine, let's go ahead and do that. Uh, but you have to agree on how you will evaluate, not how you will measure that goes without saying, we all have to agree on how we measure things. But the question is, uh, how do we, what do we

Starting point is 00:25:20 call success? What do we call failure? What is in between? And a lot of times when I find people arguing and disagreeing, it's because they have very different standards on what we mean is very good. And if you performance, typical requirement performances, it has to perform well and it has to be secure. Well, what the heck does that mean? And so you need to define those things. One last thing, because I remember you sent me a video link after we met in New Zealand. One of your colleagues, he came up, I mean, he's's famous it seems within facebook yeah one one thing that i liked what he

Starting point is 00:26:06 he talked about is the the death karma or the karma of a developer yeah so if i got this if i understood this correctly then in general you first start trusting developers to do a good job and and but then they have to prove over and over again that they are not misusing that trust, meaning they have a high karma, but that karma gets gets lowered in case they have to do something stupid. Right. Correct. You know, so so Chuck, Chuck was a head of release engineering. Right. And so, you know, he has a team of people working for him. He likes those people. They're his people. He doesn't show it much, but he does like them. And so, you know, you developer and you push a crappy code out and you walk out.

Starting point is 00:26:54 And now these guys have to stay till midnight or till one o'clock. Again, this is a long time ago when we were pushing the code changes. Yeah. You know, you've just lost some karma points, uh, from that point on, they have full rights to say, you don't get to add anything to the trunk or to build without being there. The rule is, or the rule used to be is you, when your code is being pushed and, uh, you know, when your changes are going out, you stick around to support your changes. And if you don't do that, you know, yeah, you will lose some karma points. I joke, I do something similar with performance.

Starting point is 00:27:30 You know, you can occasionally swindle me out of a few machines. But once you do that, I will remember it. And the next time I will make you work that much harder for even four sort of legitimate requests. You know, and I always joke, because like anywhere else, most of the developers tend to be men, right? And I always joke with them and I say like, look, I'm a woman and I'm from the Balkans. I will remember this 500 years ago from now, sorry.

Starting point is 00:27:55 And in our 10th life, I will hold you accountable for this. So just don't, you know, just do the right thing and we'll all be happy. You know, and so part of our culture is it isn't top down. We sort of self-police culture of engineers. Like I'm not a manager. I'm an individual contributor.

Starting point is 00:28:16 But I have this power to sort of jokingly talk with people and say, don't, just don't do this because I will hold you accountable. And I do hold a grudge. And people know that. And so it's so much better than if, you know, I have to go to my manager who goes to his manager and so on, and then it comes down and says, well, this person is doing something that, you know, person over there doesn't think is right. Uh, very direct, um, machines of mine. I'm giving them to you because I think it's the right thing for facebook but if i find out that you are doing something that isn't okay and doing it intentionally

Starting point is 00:28:50 i'm less likely to trust you in the future i think that's perfectly reasonable wow brian i think this was so both sessions extremely extremely delightful um yeah just just amazing to hear how companies like Facebook deal with deal with the way we should deliver and build software. And I think some some very hopefully some very good thoughts

Starting point is 00:29:15 and ideas that spark with with our listeners. And yeah, if I want to sum it up for this episode, O'Brien, you want to take a stab at it or shall I? No, you're the summer. Summer. Go on. The summerizer.

Starting point is 00:29:36 I'm the summerizer. The summerizer, exactly. I'll be back. What I really liked about this episode, the know, the two areas we discussed, uh, the way you monitor large scale environments, you said you, you built your own, uh, purpose built tools, do one thing well and do it really well. The dialogue, the taking the snapshots every 10 seconds, the AWS droplight, uh, figuring our building automatic detection into, um, into your, into your monitoring system, because not everybody is a performance expert. So bubble it up.

Starting point is 00:30:08 That's great. So understanding a little bit how monitoring works at large scale. And then what I just loved to hear is how developers get onboarded, how code is getting pushed through, that you have to boot camp where you start with bug fixing, getting familiar with the code baseline, making your first production changes. You said within the week you have your first production change and eventually figuring out in which area you really want to work at. And by that time, after the boot camp, after six to eight weeks, you already got to know that team and they can then basically hopefully agree with you that you will be a good addition to that team and i guess the biggest thing that i take away from this

Starting point is 00:30:50 is um what is success because success means something different for different people and so if a developing team comes up with a new idea everybody needs to have an agreement on how we can actually measure success later on so we can decide if it's a good thing to move forward or not. And this is just – I like this. And having it as a written statement to hold people accountable for later on as well, which is phenomenal. Yeah, and Andy, building off of that success thing, I read today there was a – are you familiar with the actor Shia LaBeouf?

Starting point is 00:31:24 He was in the Transformers movies and some of those others. Yeah you familiar with the actor shia labouf he was in the transformers movies and some of those others yeah familiar with the name he uh had some movie um and it opened in one theater in england and sold one ticket and yeah and you know one of the one of the excuses i should say made in the article was that, Oh, it was also released direct to, you know, streaming and all that at the same time. So that might've, but you know, just, just the definitions of success, you know, uh, selling one ticket, um, just tying that in. And I, um, you know, not to try to, I'm not going to try to resummarize, uh, you know, anything you said. I think this is probably one of the episodes that I'm going to go back

Starting point is 00:32:03 while I'm editing it and really, really listen to, cause there's so much in here. Um, just so much to think about and chew on mentally. Um, but in, in a little bit of a humorous way, the, what I'll say is the one thing I got out of it is that I'm very glad to hear that employees at Facebook do not wear pagers because if they're worried Because if they're using pagers, yeah, they're being... I know it's a term of speech, but yeah. It's a figure of speech, but a long time ago. Oh, that was it, right. You know, 2010 or so, one of our employees played a prank on one of the sites that used to try always

Starting point is 00:32:50 and come and report early on Facebook. And so he basically pegged their, knew all of their IP addresses. And what would happen is when they would try to come, they would get this information that Facebook is trying to release a new fax product and so on. And so they ended up actually reporting on it, which, of course, Evan was more than thrilled with and stuff like that. And then eventually they realized what was going on.

Starting point is 00:33:19 But yeah, pagers, fax, you know, I'm very old. Anyhow, definitely would love to just thank you so much for joining today. Really, really enlightening. It's always been a mystery. There's always those clouds of mysteries behind, you know, the other companies like Facebook and the other large behemoths that are dealing with such large user bases and such large server footprints and services like how does it all get done and getting to peel and i know there's a lot of tech blogs and out there but getting to speak and hear it in in plain terms is is just fascinating absolutely

Starting point is 00:33:57 fascinating so really huge thanks for for coming on today you know thank you for inviting me uh you know and uh i hope that the conversation ends up being useful to your listeners. Me too. It's useful to me too, as well. Yeah, can't say thanks enough. Andy, anything else? Or Greg, any last words? Are there any conferences or anything you'd like to plug of any appearances coming up? Not really. I tend to do one or two events a year because it can just get overwhelming, and I still have a lot of work that needs to be done at Facebook, so nothing to plug.

Starting point is 00:34:37 All right. Andy, anything to plug? When is this airing? I think sometime this month. Okay. Well, I will be on the West Coast in two weeks at the AWS Summit. This month being April. Yeah, sorry.

Starting point is 00:34:54 The month being April, exactly. And the big plug that I want to make is in June, actually June 1st, we have our Dev1 conference. It's a Dynatrace developers conference for developers and operations. June, actually June 1st, we have our DevOne conference. It's a Dynatrace developers conference for developers and operations. So it actually came out of some feedback that we received internally where developers, especially developers within Dynatrace, wanted to get more educated on what's happening out there. And so we actually organized this one-day event, like a DevOps day, but we call it DevOne, so DevOne.at.

Starting point is 00:35:27 And we also open it up to the public now. So in case anybody is in Linz, Austria or somewhere in the area, check it out, DevOne.at. And actually, I will be giving – I was asked to do the keynote there, and I would like to repeat, Greinke, some of the things that you've publicly stated uh because one of the things they ask me because people what they ask me is like how does how does devops work in big organizations how does software engineering work in big organizations

Starting point is 00:35:56 and i think you just brought up some really great concepts that i would like to to rephrase so that's thanks again for helping me with my keynote problem and you'll also be performing at the sands casino right with a carrot top to rephrase. So thanks again for helping me with my keynote. No problem. And you'll also be performing at the Sands Casino, right, with Carrot Top? I'll send you the video. I'll send you the video with me, with us on stage with Lederhosen and Dindlo.

Starting point is 00:36:18 Fantastic. All right. And once again, any feedback anybody has, we'd love to hear any. You can reach us via Twitter at Pure underscore DT, or you can always send us an email if you want to be old-fashioned, which is kind of funny, isn't it? At pureperformance.dynatrace.com. And thank you all for listening, and we'll talk to you soon.

Starting point is 00:36:40 Thanks. Goodbye, everyone. Bye. Bye. Bye. Bye.

PurePerformance - 034 Monitoring at Facebook & How DevOps Works with Goranka Bjedov

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.