a16z Podcast - Anatomy of the SolarWinds Hack: Who What Where When How

Starting point is 00:00:00 Hi everyone. Welcome to the A6 and Z podcast. I'm Sonal and we're sharing this episode that just dropped on 16 minutes here today because it's all about the solar winds hack, one of the largest, at least publicly known and so far, hacks of all time. It's widely relevant to all, especially given ripple effects, including the Wall Street Journal reporting just yesterday that according to a recent government investigation, 30% of both private sector and government victims linked to the hack had no direct connection. to solar winds. So it's going to have ripple effects for quite some time. So we did an anatomy of a hack in this long explainer episode, a teardown of the specifics we know so far, what went down, and what we all need to know whether you're a big company, small company, or individual. For quick context, before I introduce our experts, over 18,000 customers downloaded compromised software. This is as reported in December, though obviously it now goes well beyond them. Those customers include several large government agencies, which we covered on 16 minutes last year. Private sector victims include companies like Cisco, Intel, Microsoft, Nvidia, Deloitte, VMware, Belkin, and others.

Starting point is 00:01:07 The broad consensus per statement issued by the Office of the Director of National Intelligence, the FBI, the Department of Homeland Security, and the National Security Agency, is that Russia was most likely the origin of the hacking, and more specifically that the Cozy Bear Group, also known as APT-29, overseen by Russia's intelligence service, was responsible. That's just a super quick high level because we're going to actually go deeper and break down the who, when, how, and the chess game of it all. So now let me quickly introduce our experts. Our in-house expert is A6 and Z operating partner for security and former CISO, Joel De LaGarza, and our special expert guest is Stephen Adair, the president of Vilexity, an information security firm that does

Starting point is 00:01:46 incident response and forensics, including memory forensics, and they've responded to multiple cases of this. Their team actually put out several detailed posts on this and more. But first, Stephen, can you summarize what happened? Obviously, we'll continue to dig in on the details throughout the episode, but also what category of hack this is. Start with the basics. Yeah, sure. So Solar Winds is a company that creates network and system management software that's used

Starting point is 00:02:11 really heavily by tens of thousands of organizations around the world. So it's used by large giant commercial companies, Fortune 500. It's used by small organizations, managed service providers, and governments. So it's a piece of software used to manage these really sensitive important assets. So think about the IT teams and people who want to watch what's going on on key systems, on network devices and things that are really important within a network. They're a product called Orion. That's their flagship product.

Starting point is 00:02:38 And what happened is that Solar Winds was basically breached. How exactly, you know, that's not really been published. We don't know. But attackers were able to compromise Solar Winds, get into what's called like the build process of this product. So essentially a development or this software that's downloaded and used by all these organizations. They were able to get into SolarWinds networks and modify that build process. And what's interesting and notable about this is they didn't go in and modify the source code. What they did is think about it if you're on an assembly line and someone made a change, like, early on,

Starting point is 00:03:08 and then they all put it together. They actually waited until the very end, the very last step of compiling this package to make this software that goes out. And they monitored it, they watched it, they looked at it, they learned, they tested, and they ended up compiling in a backdoor, which would give them access to the systems running, solar winds, Orion, for anyone who installed the update or downloaded it freshly since they did this. So they were able to modify solar winds and push out this update to organization all around the world. Basically, they'd create a shopping list and selectively target who it was that they wanted to go into and basically break into and further their access. They could look through

Starting point is 00:03:43 and see, oh, this company or this government agency, I'm very interested in them. They could actually activate and walk right into their network. And they're already sitting in and going into a very sensitive part of that network. So in short, it's what's called a supply chain compromise. where they were able to get into the build process, insert themselves at the backdoor into this legitimate software, and expand their access, and do it very stealthily for many, many months until, you know, fire eye came forward and figured this all out in December 2020.

Starting point is 00:04:10 Right. And just to quickly, even more high-level context, this is playing out against the broader landscape of, for many years now. Companies have obviously been using various providers of third-party in cloud software and services. We'll delve into this whole notion of a supply chain hack, what it means, what it means for the future of security. But the thing I want to really pull on from what you said is that this was very unusual because they didn't go for the source code. They kind of waited for the updates. And then they were very targeted as opposed to just sort of spray and prey.

Starting point is 00:04:43 So in your assessment of all the hacks that you've seen out there, and Joel, I want to hear your thoughts too here, is this really a sophisticated hack? because obviously in our show, we not only tease apart what's hype, what's real, but I often wonder if that word gets thrown about very casually. Yeah, so from our opinion, it's definitely this aspect of it is certainly one of the more sophisticated that we've seen. And it's not necessarily that there aren't a lot of smart people around the world, good and bad, that couldn't pull off something similar.

Starting point is 00:05:09 It's, you know, one, the fact that they did. Two, they did it so strategically. And three, you know, even if they had gone in and modified the source code, people would still be talking about how sophisticated it was. But they took it up a notch and basically said, Yeah, we modify this code or someone's watching it or they audit it or someone's watching the check-in process. Basically, they went to a system where none of that mattered anymore. And they just kind of bypassed all that and went like straight for the jugular.

Starting point is 00:05:33 And what would I would argue say, a much more difficult way to go about it, but a lot more likely to meet with success and go and detected. And I think they gambled correctly in this case. I mean, I think with these kinds of operations, and this is ultimately an espionage, you know, nation state professional type operation from my. perspective, the duration and the extent to which these things can run and detected is usually the indicator of how sophisticated they are. And so like these long-running, you know, really successful campaigns that avoid detection really belies like a level of sophistication. Because operational security, right, like covering your tracks, is actually just about as hard as getting in. And so, you know, the fact that they exercise the ability to cover their tracks for so long,

Starting point is 00:06:15 to know where to insert in the process, and to lay low is just indicative of a level of discipline that you don't necessarily see in a lot of attackers. Not just get in, but be able to cover their tracks, which is what both of you guys say. And by the way, we've only talked about the duration of when the hack was revealed by Fire Eye and that it had been several months before. Do you guys have specifics on what the latest date point is in that timeline?

Starting point is 00:06:36 Yeah, the first, essentially what they did was an experiment early on, and this has been post publicly. The code in solar window, Ryan, was modified in late 2019, where basically they made some initial modifications, which actually didn't do anything malicious or put a backdoor, allow any type of access. Software went out, and they basically were able to prove, like, hey, I succeeded at doing this.

Starting point is 00:06:55 It existed. No one noticed anything. And essentially waited at some point to move on to phase two, which was, okay, I can get in, I can go and detect it. I can have it build. It all works. Stuff makes it into production. No one notices.

Starting point is 00:07:07 And they said, okay, well, I'm satisfied with that. Now it's time to go for broke and put the actual code in there and open the floodgates. What you just described, Stephen, sounds exactly the way a company builds a product. like, hey, we're going to test it out. We're going to try an experiment, an MVP, a minimum viable product, if you will. Then we'll, based on that, decide how to deploy it and target it and blah, blah, blah. I mean, I hate to say that, but that's exactly what you just described sounded like. Yeah, and honestly, it wouldn't surprise me if they had done some way of trying to basically clone their development environment too and probably tested this, I would guess, probably pretty thoroughly before they even ran the tests within their network.

Starting point is 00:07:44 So they were incredibly savvy in certain ways in terms of how targeted. targeted they were and the choices they made. In the Microsoft blog post, one line in particular really struck me. It said that the threat actors were savvy enough to avoid giveaway terminology like backdoor, key logger, et cetera. Instead, they gave their tampered code an innocuous name, Orion Improvement Business Layer, that would fit right into a marketing brochure. This is from an Axios post summarizing it. The attack's crucial door opening exploit was a small chunk of, quote, poisoned code, which is what Microsoft dubbed it, all of five lines long, or roughly 160 characters. And then Enafried at Axios goes on to comment, which I had to chuckle, even though it's sad,

Starting point is 00:08:28 was this could well be the most damage per character yet achieved in the short history of cyber warfare. So I am curious if you have any thoughts on some of those honestly quite clever things that they did to hide undetected. And any more specifics you could share there. And then we'll go into the step-by-step in a moment, too. The fact that they're not naming variables and naming things that are commonly used in attacks is mostly a credit to the existing kind of antivirus and anti-mower industry. You've got a lot of tools that are out there that are looking for this stuff. And you would imagine any adversary that's relatively sophisticated is going to run their changes

Starting point is 00:09:00 through all those tools to make sure they don't get detected before they deploy it. And so that's just table stakes for this kind of activity. It doesn't really show any kind of real sophistication. Of course, it just depresses me to hear that. And we'll talk about this at the end, which is, what companies and people can do because I'm like, great, the better and better we get, the more and more sophisticated they get. And it just becomes this like never-ending back-and-forth, back-and-forth escalation.

Starting point is 00:09:23 SP&I-101. Yeah, to be completely honest, that stuff doesn't surprise, especially when their job is to blend in as much as possible. But I'll add to one of the things and make sure that we give credit, some of the analysis of things we're talking about today are obviously from a lot of the security communities come together and publish a lot of details. It's been great. But one of the other things that they did is they actually used an existing, file that is part of SolarWinds Orion that's there legitimately. It was there five years ago, was there two years ago. It's there right now. But they actually repurposed that exact config file.

Starting point is 00:09:53 They created a specific value and said, if this is a three, you shouldn't begin and you're basically turned off. And they use values in fields within this. So then leverage that file that's already being read and used by the program to then also inform it on some of what it should do. So they use native existing files and functionality and things that are very inaccurate looking. And then they did a couple other steps beyond that that are pretty stealthy, although they're not necessarily rocket science. They're very uncommon. One of them is the fact that this back door, once it's loaded, it wouldn't start it's beckoning or calling out for this DNS activity, which I know we haven't explained yet. But basically, the mechanism by which it actually gives that avenue of control back into

Starting point is 00:10:28 the system, you have to meet certain criteria before to even, you know, beacon. So, example, if you weren't domain join, meaning you're less likely to be an actual corporate asset. You're someone testing it on a computer. You're a workstation at home. You're not even going to pass the sniff test. But what they then do is they actually set a timer. And so it might be actually up to two weeks before it actually starts doing anything. I might be under scrutiny from QA or a build or someone might be looking at it when they first install it, make sure it's not malicious. So they actually say, hey, I'm just going to wait two weeks. I'm in this environment. This is for the long haul. I'm not in a rush to immediately get access to these systems. So that's an

Starting point is 00:11:02 interesting aspect. It's actually fairly uncommon to see Maurer that is on any timer of significance or driven by a specific event that's likely to happen very soon. The other thing that was really interesting, the Mower basically would activate when a certain response was given to its query. Hey, go connect to this domain name or go connect to this website. And those domains that they used were actually domains that had expired. One of the telltale signs when you're looking in the Mower and the things is like, oh, it was just registered last week or last month or earlier today.

Starting point is 00:11:33 So this would pass that SNF test all day long. Some of them had five or six years they had existed. They might even have like a website. They picked up infrastructure that had a history to it. They actually owned and controlled these domains. They weren't like hacked domains or things like that where they were using compromise infrastructure. So just kind of an interesting note on that. It's interesting and honestly a little creepy.

Starting point is 00:11:53 I got goosebumps while you were talking because it makes me think of every long game, the patience and waiting and stalking that really skilled predators do. and I don't mean to glorify it by any means, but I'm just sharing that what you just shared in technical terms, it gave me goosebumps quite literally. I don't know how you think about it. When we first saw this in July last year, we had, I think, three domains that we had seen use in our actual attack.

Starting point is 00:12:20 And as we looked into them, we said, wow, like we kind of noticed this. He said, yeah, these things have a real history, what the hell's going on here? And then we found a way to find more of their infrastructure even if we hadn't seen it used an attack. And they all had this in common. Like, we had a way which we could figure out and find some of the infrastructure. from some mistakes that they had made. That's why in our post,

Starting point is 00:12:38 we actually were able to provide a lot of indicators, like DHS, included that in their list and everything. But other than that, each one of the domains we looked into, we just instantly knew at that point. I mean, we already knew we were dealing with an advanced straight actor, but we were kind of thinking to ourselves, like, these guys have really stepped it up a notch. This was actually the third time we had dealt with them

Starting point is 00:12:54 in an instant response engagement. But this was like a little bit different than the other two rounds. There's a number of things that just made it stand out, and that was definitely one of them. This might be the first A6 and Z podcast, network show to be optioned for a movie. I'm just going to say it right here on air. Joel, anything to add to that before I switch into the detailed step by step?

Starting point is 00:13:13 I mean, only if Matthew McConaughey plays me. I listen to him on the comm app every other night or so. Yeah, no, I mean, I think that that's exactly it. Just the level of preparation and just the long game that these guys are playing. You know, the smellware stuff is pretty common on the financial crimeware type side, right? People trying to steal money. but those actors typically register domain names within a day. And it's just all very fishy and suspicious.

Starting point is 00:13:38 But to see someone build these like really advanced, large, complicated infrastructures, years ahead of using it, it just belies a real level of sophistication. You don't really see every day. Okay, so just a recap for listeners where we are and where we're going. We've covered what happened at a high level, including some of what's hype, what's real, and interesting or undercovered in the media. You did a great job summarizing, Stephen. But now let's spiral into that a bit deeper and fill in some blanks.

Starting point is 00:14:02 that you haven't covered, both technical details, like you mentioned the beacon, DNS, I want all of it, how folks figured things out, so we can then know what the open questions still are ripple effects and implications, and then more on supply chain compromises and what we can all do. But I especially want to know the anatomy of how they got access to the emails, but start from the very beginning of the timeline. Yeah, so the story of the solar wind supply chain compromise obviously starts with solar winds. And that's probably where some of the question marks are currently, and they might remain that way.

Starting point is 00:14:34 They were reached sometime, at least as of late 2019, and then ultimately what came out later in May of 2020, push out an actual backdoor version of their software, a backdoor meeting, a piece of software that shouldn't be there that allows this foreign adversary to have control or remote access into these systems. So we're talking in late May that happened from the cases we've been involved in and things that been published publicly, we're seeing that a lot of the threat activity started in June and July.

Starting point is 00:15:02 The SolarWind Software would send out this DNS query. So when you want to go to a website, you want to go to A16Z.com, you type that in. There's a system called DNS. It says, hey, where is this located? My DNS server says, oh, it's located over here. It's the basis which kind of you can find things on the internet. So you're not memorizing these numeric IP addresses.

Starting point is 00:15:24 So the malware, all it did, once it finally activated, it waited between 10 and 14 days before it would start creating these DNS queries. It would do these DNS queries from the SolarWinds Orion server, and those DNS queries contained encoded data. And if you coded that data, it gave different information of one was information about the network that that machine is joined to. So for in the example of, say, Microsoft, it might show Microsoft.com or Microsoft. Internal or, you know, one of these government agencies, it might say trez.gov. but it would give this indicator

Starting point is 00:15:58 so the attackers could actually see who these victims were. Because remember, they're indiscriminately pushing out this software that essentially tens of thousands of machine. That is an untenable thing to manage and go and manually look at everything and try and actually install software and do something of significance.

Starting point is 00:16:12 And their goal is to stay under the radar and not get caught. And now they have to decide who does they want to go after further. So they probably have a shopping list that they started with and they probably have a new shopping list of things they're walking to the grocery store

Starting point is 00:16:23 and didn't even know they wanted that. But now they know they do. And they essentially issued commands and allowed them to initiate this backdoor on who it was that they wanted to attack. And they did this through a specific DNS response called a C-name value. So it says, hey, where is this host name?

Starting point is 00:16:37 It responds back. They would actually send a specific response to prep it so that the malware would be waiting to know that next time something happens that it should take a specific option and open the back door. They would respond with his domains. And these domains would basically be the control points

Starting point is 00:16:50 that where the attackers would then have the hands-on keyboard. A human is doing this at this point. Someone says, I'm ready to take a look at this system. Now, hackers that, you know, behind this are actually involved. And they're saying, now I want to look around and figure out, is this a test machine? Is this a real network I'm interested in? This is a lab environment? Is this a staging environment?

Starting point is 00:17:09 You know, things like that. And they can figure out, is just the real deal. Does this have access what I want? Do I want to proceed? And they did this for we don't know how many organizations. And that's the real scary part in all this is you have all these people that have come forward. And they're like big companies or there's these government agencies. and that's just the ones we know about.

Starting point is 00:17:26 I don't think anyone has a real notion of the size and scope of where they took a further interest and then actually did something. In our particular case, we got permission to write up and share details of our incident investigation. The attackers were very focused on getting access to email of specific individuals.

Starting point is 00:17:41 So their goal was maintain access, move around, get what they need, having access to specific individuals and what they're writing, who's sending them, while they're communicating was a key focus of what they're doing. We're able to see that they did that. The interesting part in kind of stepping away slightly from solar winds and why the intel community and law enforcement says it's likely tied to Russia, APT 29 or the Dukes, when we've been tracking as a group we called Dark Halo, just because we've dealt with ABT 29 on many occasions in the past,

Starting point is 00:18:11 but we just have no real way to link the two. But what was entering to us is the story of this group didn't start with Solar Winds. We worked three separate incidents involving the solar winds attackers, what we called Dark Halo. So this is a story that starts. it's well before and has multiple other avenues. We had actually dealt with them back in 2019. We had an organization we were doing work with, and we kicked the group out. They went away. In our initial response, we had determined they'd been in that organization for four to five

Starting point is 00:18:38 years prior. They came back in Q1, 2020 through an exchange control panel vulnerability, you know, mail service. They had a vulnerability that attackers would take advantage of, got back in, stole email for certain individuals. They were kicked down and removed again. That's what we did. And then they came back.

Starting point is 00:18:53 A third time with solar winds in July of 2020 again, we didn't have a good way to prove it. And we took steps and mitigation in place to deal with it. So to say, hey, how did they get into solar winds or where else they're operating? Well, this isn't their only trick. They have a lot of tricks that they sleep. They've been able to do this and operate for quite some time. Wait, so how did you make that link across those separate incidents that it was the same group? I'll tell you.

Starting point is 00:19:15 And it was something interesting is if we had worked them at three different organizations, we actually wouldn't have come to the conclusion that this was a single threat group. We wouldn't have linked the three things. Any advanced attack or anyone in the network, they have certain commands and things that they're going to do. But they changed enough between each of the attacks, that the actual techniques, the tools, there's a custom malware or a commercial script

Starting point is 00:19:35 or a public script like Nishang or a pin testing framework or these different toolings or a web shell. They changed it between each one of the attacks where was it to be very non-obvious. That's just the same group. But what they did is they went after the email of the same people each time and why we are 100% certain it's the same group. is when they would steal email,

Starting point is 00:19:53 they would only take a certain amount of email. They would specify, I want all the email since the last time I took it. Oh, so it's like incrementally building on the total. Oh, my God, that's so fascinating. I keep going, yes. So in early 2020, they got back in and they said, okay, well, I want all the email for these particular users

Starting point is 00:20:08 since a specific date in 2019. And then when they came back in through the solar winds vulnerability, they basically said, hey, I want every email for these people. And I only wanted starting from this specific date range, starting in early 2020. So we had each time they came back and asked for the email since the last time they did it. So in the one case, obviously, they had an intimate and previous knowledge. The other cases we worked, they didn't have as much knowledge.

Starting point is 00:20:31 They had to work their way and kind of figure out the way of the way. And so we're dealing with the same group in all three incidents. That's an interesting tidbit. I was about to say I still have goosebumps. That's incredible. That was so good, Stephen. Pretty impressive analysis and work there. The things that really jump out to me is this is something that is linked together over a four-point.

Starting point is 00:20:52 last year campaign, trying to maintain persistent access to the communications of high value individuals. I think the other thing that really jumps out to me is that they have a big data problem. They got access to tens of thousands of computers and potentially thousands of organizations. It sounds like the kind of analysis that Stephen has done is pretty unique. There aren't a whole lot of people in the world that can do that sort of thing. And so this is probably an incident that will be continuing to understand for the coming months, if not maybe years, there's probably going to be a really long tail on that. These people are still out there. They're still operating. What are they doing now? That's

Starting point is 00:21:27 particularly concerning. It's interesting because Martin Casato, our general partner, who's also a security expert, he mentioned to me that he thinks it's super interesting how interactive the attackers are during the attack because it's obviously a very sophisticated team of people gathering data and making chess moves in real time. And it's so fascinating because when we report and talk about and communicate these types of attacks, we kind of make it seem like it's a malware that does all the work, but it's really the people that are at the center of it. And then on the other side of it, you have this whole interesting dance on your end as sort of this forensics expert with your team going in and trying to figure it out and the puzzles and everything involved.

Starting point is 00:22:08 Well, you know, I heard chess is popular now. Yeah, that Queen's Gambit, right? This is exactly like playing a game of chess. The difference is is that you don't see the moves immediately. They get revealed over time. And then you're left kind of piecing other things together. That's exactly the analogy. Yeah, I definitely agree that their goal was to actually not have their moves or they did never be understood.

Starting point is 00:22:29 We noticed the versions of their software that were downloaded. There was an update to Solar Winds Orion. I believe it was in August of 2020. And that version wasn't backdoor it anymore. Didn't have the malicious code. So we initially speculated, oh, did the bad guys remove it? Did Solar Winds find it? Did it inadvertently get removed?

Starting point is 00:22:44 And we didn't know how it was going down at the time. So they removed the code. They got in, got all this access and basically said, I'm going to try and remove this now and like flying into the radar. So if they had their way, they would have pulled off like the perfect caper, done all this stuff. No one would have known how it happened. And then the Orion product basically would have nothing malicious in it. So just kind of like an interesting other thing that they did. It is. It's a very vivid contrast to the analogy of chess, especially given the popularity of Queen's Gambit when you see them recording their moves and the spectators watching, it's a real contrast to this idea that you're literally making the move peeling it back. making the move, feeling it back. It's really stunning. Okay. So my next question before we talk about some things we can expect to see moving forward, what are some of the open questions still on the table? Like we know solar winds was compromised, but the big open question there is obviously we don't know how. Then the second big thing in the Microsoft post that I saw, and Steven Sinovsky pointed this out, which is, you know, they do this outline, but we still don't know how the signed code was signed. So that whole idea of signed the code is a bit of a mystery still.

Starting point is 00:23:46 I want to hear from you guys, what are your open questions or what are the open questions the industry is still looking at or that people should or shouldn't look at? Sure, yeah. So how is solar winds compromise? Obviously, one of the open questions. You could spend as much time and resources. You could have infinite resources, and you may not ever be able to answer that question because that system's gone. It was wiped. All the logs are here. It was never logged or it happened five years ago.

Starting point is 00:24:08 So I would say the scariest part of this, people are finding out about this in December for something that was operationally live in May. They had a long headway into breaking into different organizations doing that shopping list. And there are going to be, and there have been, from this very group, and as a result of the solar winds compromise, more supply chain breaches. Some people are breathing a sigh of relief. I didn't run, you know, SolarWinds, Orion software. I'm safe. That's not necessarily true. We're not trying to so fear and certainty and doubt that everything is untrusted, which, you know, arguably, you need to go to a typewriter, it's in pigeons now, but it's IT companies, it's security

Starting point is 00:24:46 companies, it's managed service providers. It's managed security service provider. There's these different people that were running solar winds that then had this level of access to either directly get into networks, get into email, get into authentication system to provide software or software updates or software downloads. They 100% certain had access to numerous networks and systems that would allow them to rinse and repeat solar winds probably on numerous different scales in numerous different ways. It doesn't have to be through a build time compile. It could be they changed a download. They changed an update process. They took keys or secrets. or remote access protocols or passwords that got them into like other networks or other systems.

Starting point is 00:25:26 So the scary part is that the supply chain compromise here is just causing a chain reaction that's probably already impacting other organizations that have no idea. I think that's one of the biggest questions is who else was victimized that we don't know about and what do they do? So what you're basically describing is like this complex adaptive system, like everyone sort of networked and connected, trying to tease apart the scope and ripples of this is going to take ages. And we might never, ever get to the bottom of all of that because of that connectivity. It's interesting because General Paul Nacosone, or Nacosone, I'm not quite sure how to pronounce it.

Starting point is 00:26:02 He heads both the NSA, the National Security Agency, and the military's U.S. Cyber Command. One of the things that they talked about is that developing a coherent, unified picture, what you just described, Stephen, of the extent of the breaches has been difficult. The challenge is that, quote, he's expected to know how all the dots are connected, but doesn't know how many dots there are or where they all are, which is kind of a distillation of what you just described. What are the other open questions that are on the table? For me, the big open question. And with all of these really sophisticated breaches, the first is how many stupid things led up to this? Like, how many ridiculously easy to solve

Starting point is 00:26:39 problems like applying security patches or using two-factor authentication? Like, how many of those kinds of things we know we should always do are responsible for this is always front of mind when me see this because I think when you double-click on these a lot of the times, it starts off in a fairly innocuous way, which is like someone guessed an account or someone got access to some account. But as this event shows you, if you give a sophisticated actor, a toehold in your organization, they're just going to run through it. So that's the first one. And then the second one is we think of these breaches because of just the way the media covers them and the fact that they kind of show up sporadically. We think of them as like events in time that have a start and

Starting point is 00:27:16 finish. But in reality, these groups are still running and we're still chasing them. You don't know the implications of any of this stuff for a while. Like, you don't know if they were getting into the Department of Energy to read, you know, Rick Perry's old emails or if they were getting in there to steal futuristic bomb designs. Maybe there's going to be some new weapon that pops up in 15 years and it's like linked to this breach. And we've seen from these breaches, like if you go all the way back to some of the first ones that have been publicly reported, you know, we've often seen that the goal of these is either to spy on individuals and get some kind of intelligence there to steal the designs for things that people want to go recreate. Right. And don't forget

Starting point is 00:27:52 that oftentimes, I think we often forget to talk about when we talk about intelligence, it's often in the form of blackmail, right? Like, we're not just talking about stealing IP and obvious secrets because a lot of people dismiss this as, oh, email, I just book events and share like photos with the family in my email. I don't think they realize that it's such a vector to all these ways of really exposing who you are. It's your identity in many ways. So that's another way to think about that too. Absolutely. Anything else on the open question side? So a bunch of other secondary breaches are now being reported on. Some of the Microsoft stuff, you saw that there were people creating reseller accounts or trying to get reseller access to

Starting point is 00:28:28 people's Office 365 enterprises. And then there were certificates that were compromised for things like Mimecast and maybe perhaps other services that are out there. And so like this picture starts to emerge that there's these lots of fires just started burning. And it's always really difficult to tell if it's one fire massing together or just a bunch of different people that are acting independent? That's actually something I wanted to really quickly touch on

Starting point is 00:28:50 before we go into the rest of this. Because the thing that was confusing to me is, okay, so I read the Microsoft post. You know, like there's some intrusions that there's a partner for Microsoft, actually, that handles cloud access services. We don't know how connected or not connected it is. Then you have a reseller gaining access

Starting point is 00:29:07 to Microsoft customers Azure accounts. Then you have this reported Russian state-sponsored effort exploiting a VMware flaw that the NSA warned about last month that takes advantage of a recently announced vulnerability in VMware workspace one access, access connector, identity manager, et cetera. And this is according to the NSA that they've had at least one case that they've successfully accessed protective systems by exploiting the flaw. And then you have like, you know, one after another and they issued a patch. I mean, I'm reading all these at the same time and I'm like, is it all the same thing or not? And I think that's what you're saying, Joel,

Starting point is 00:29:41 about we don't know if it's all one fire or a bunch of fires. And do you guys have any thoughts on how to connect those dots, if at all? So as a general statement, I would say what we know about this hacker that we call Dark Haler, the people behind the solar wind attacker, they're extremely adept in methods that allow them to gain access to email or systems involved with email. So things like trying to get access to an Office 365 or Azure AD environment through a partner organization or by stealing some sample tokens or some kind of authentication mechanism or trying to get access through, you know,

Starting point is 00:30:14 some other possibly through a vendor to get access to that same data or to email data, essentially by any means necessary. I would say all of those are very on par with what we've seen this attacker do and focus on and what others have seen. Very good chance that they are related. But even if they weren't, it just kind of underscores that there's a lot of people trying to get access to this data. And now you need to focus a lot more on the cloud, on the technologies that are used to secure the cloud or that have access into it. And the things and places where people don't always look because it's new to them or they

Starting point is 00:30:48 never looked at it or they didn't know to look at it. So I think this event will actually end up advancing security in many ways because it's causing people to think about and do things that they weren't realizing before. And as you can see, the bar has been set higher to where they can't walk right in the front door anymore, right? They're not easily able to get right into these organizations by compromising, you know, the core network or the system administrator and the other ways which you get there. So in some ways, it's a sign that security has improved a lot, but also that there's a

Starting point is 00:31:15 massive amount of work to do at the same time. It makes me, again, think of the chess analogy. And when you have a player that comes to the table that has a set of moves, like patterns that are well beyond what the human mind can even comprehend. And that makes me think a little bit of even like AlphaGo playing Go with the real chess player in Korea and how, you know, the system made moves that they considered very alien, but that a human being would never have done. But that still follow the rules of the game, the constraints of the game that is, and yet we're completely novel. And you just keep seeing more and more moves kind of grow and become more and more sophisticated on both sides, even as we may improve. Like, there are going to be alien moves at

Starting point is 00:31:53 some point. But they're completely honest, they're undoubtedly highly skilled in discipline, which if you think about it, okay, if we go back to the chess analogy, you know, are they a master? Are they a grandmaster? In some ways, you guess, okay, they're a grandmaster. But most of their opponents are unranked. So they have this like kind of lower skill. and their strategy is easier. But then they've been able to go to these people. Maybe their security and their defenses are much higher rank. And they're using that skill set, that knowledge and that kind of cat and mouse to still get

Starting point is 00:32:21 into those organizations. But to have to do that, that shows that people have leveled up quite a bit, which is a good thing for these companies, the security industry. But at the end of the day, they still managed to either capture that king or get them to knock it down. I guess no one's really thrown in the towel. No one has surrendered that I've seen so far. but I would say they're winning a lot of matches

Starting point is 00:32:39 and they're playing a lot of them simultaneously. Right, but they're not, to be clear, to your point, an alien player, like an AlphaGo, there's still moves that are human, just very skilled. At least from what we've seen, but who knows, like, what we're missing though, right? Right. Okay, so now the big picture questions.

Starting point is 00:32:56 We've covered what happened, how it happened, the details. We talked about this, you know, phenomenon of supply chain attacks, chain of chains, what it means. I would love to hear what you think about this. when you think about the broader trends at play. Yeah, absolutely. I think on the podcast several times,

Starting point is 00:33:11 I know I sound a bit like a broken record, but we've talked about the biggest challenge being securing the supply chain and how all these businesses that are becoming software businesses are actually becoming reliant on other people's software. And so it's not just a matter of the stuff that you write to run your company.

Starting point is 00:33:27 It's also the matter of the stuff that your suppliers are writing. And as everyone knows, security is really difficult and it's hard to secure your own things and then having to worry about the security of your suppliers is adding an additional layer of complexity. And so over the last couple of years, there's been a lot of investment in trying to understand third-party risk management, vendor risk management, how to glue these things

Starting point is 00:33:48 together. There are several different approaches, everything from private systems that will look for vulnerabilities and report on the risk. There are publicly available standards. Different trade groups are trying to develop their own standards for security, and then certain vendors are trying to come up with their own standards. There is no easy answer. And so what you've got is a lot of different approaches that are being tried and a lot of experimentation that's taking place.

Starting point is 00:34:11 This is probably the first breach with such a size scale and scope. So this is kind of the watershed moment for that third-party risk management. And there's any number of other suppliers that are out there that are in very similar positions, right? And it could be a company like SolarWinds or it could be an open source repository that a bunch of people are building into their applications. There are any a number of different ways. The thing that's really difficult for me, based on where I sit and what I see, is if you play through all the different potential solutions that are out there, it's really hard to know which one of them would have actually prevented this. So like, if I went to any of SolarWinds customers

Starting point is 00:34:47 and said, hey, what's your vendor risk review report on SolarWinds? You know, before the breach, I'm sure they would have said it was a wonderful company. It was doing everything. They passed our review. They answered our questionnaire. You know, they've got the people hired. They have a program. And so it really comes down to how do you actually measure these things and how do you measure the risk in that third party and how do you effectively mitigate against it? The third party risk of the vendor risk management or how someone evaluates this, it can only go so far, right? Like how would you evaluate solar winds and the Orion product differently than you would Microsoft Windows and the Fender and how it updates and things like that, right? So there's limitations to what you can do. I mean, you can audit them or find out their code review process and all that stuff.

Starting point is 00:35:26 And they could have passed out all the flying colors or is your checklist say, are you, you're looking for advanced adversaries, you know, injecting themselves into your build process at the most highest levels of sophistication and espionage. But even if they check yes to that, which they might not, they probably aren't having an effective way and mechanism to do that. One of the things that Alex Stamos, people tend to overquote him, but he did have a good tweet about this, which is, quote, there was no good reason for most enterprise software products to talk to random internet hosts all day.

Starting point is 00:35:54 It might be time to move on to an outbound network permission model for Windows servers. So connections only allowed to domains and signed manifest plus internet as defined in GPO. Is that the right thing to do? Should people be air gaping? Like, what should people be doing? We deal with sophisticated breaches all the time. And this can even apply for like primware and other stuff. But that is a recommendation that Vlexity has been getting for years and years and years organizations. And it's often in an incident that we say, hey, your domain controller, for example, doesn't need to be able to talk to internet. There's obviously exceptions to the rules and everything. But usually this can be defined, especially with next

Starting point is 00:36:30 generation firewalls or modern firewalls, you can define what is actually needed and allow them to do those things and not allow them to do anything. They're not explicitly required. And that's a model that is the least privilege, just like the least access type model. That's a little bit harder depending on your organization to enforce for users and workstations where you need to browse the web and do all this stuff. And that's what content filters and certain restrictions are for, you know, unless you're in like a DOD environment or something where it's a lot more locked down, but that's usually accepted in a lot of commercial organizations. And the server is where an attacker, if they're going to install malware, do things usually go for it because that's where the supply

Starting point is 00:37:04 chain, that's like one of the big areas get to it, or those are these machines that are not at home or requiring a VPN. They're always on. They don't get rebooted frequently. That's where malware gets installed a lot because it's something that they can count on in its regular, being able to prevent that and limit what those can do. That model, if that had been put in place for organizations with solar winds, in this specific instance, it would have mitigated that threat. Now, if I started thinking outside the box and this attacker used DNS, well, what if they had done command and controlled command and had done that all over DNS? So the SolarWind server talks to its local DNS server. Local DNS server goes out to the internet. If they had

Starting point is 00:37:41 modified this malware and actually did all the command of control over DNS instead of doing it over this connection, that paradigm and that shift would have been a lot more difficult to mitigate, but that's the type of issue and security item. We need to think about, you could proactively try to address that or to say, hey, that's a lower likelihood and I'll address it if that happens. But by and large, it's a best practice with regards to minimal access, specifically for servers connecting to the internet and different resources. It's funny talking about this, because it's like the history of the security industry is the history of unreasonable requests. I know that a lot of people are jumping up and down

Starting point is 00:38:13 talking about, like, don't let production talk directly to the internet. And if you worked at a bank, you know, for the last 20 years, that's been the case, right? Like highly regulated industries And people that have invested heavily on security have always focused on doing these rather idiosyncratic things that don't make a lot of sense, but made a lot of sense to people who've either come from an incident response or a deep security background. You know, back in the 90s, I remember being involved in strenuous debates about why you need to encrypt traffic moving within your data center. And everyone thought it was the most asinine thing because it's a private link. You've got MPLS. No one's going to listen to you. And then Snowden releases documents.

Starting point is 00:38:50 And it became really obvious why you want to encrypt your data within your data center. So this is just another example where people have been giving best practice advice saying, hey, you need to make sure that random servers, random production systems can't just talk arbitrarily to the internet. And the response to that has generally been, well, that's an unreasonable request, that takes a lot of work. I don't know that we necessarily want to do it. And there was never a particularly great reason or piece of evidence to point to to say, well, this is why. So this is why, why you want to limit that access. And there's probably a list of other things that are equally unreasonable requests that security people would ask you to do.

Starting point is 00:39:24 And eventually they're going to have their This is Why Moment. But something that Joel mentioned earlier, which I think is really important, is a lot of organizations aren't doing blocking and tackling. They don't have two-factor authentication on the remote access to their network. They're using weak passwords. They're not patching. They don't know where their assets even are. But their build process is not secure. They don't even do code auditing or check in their code.

Starting point is 00:39:45 I mean, there's a lot of low-hanging fruit for most organizations. And they haven't even be able to kind of get in some of the basics. But I think a big problem that a lot of organizations, whether that's a government, commercial organization, or really anyone, whether they're a small company or these massive companies with huge budgets, a problem that they're facing is if you had certain security data, you could immediately and very easily answer, did I have a problem? One, did I run that vulnerable software? Maybe you patching. You're like, oh, I don't know. Maybe I never ran it, and I skipped a version. If you had all your DNS queries log and the responses, you would say, did I get a C name?

Starting point is 00:40:20 Did I even call out to that command and control activity? There's certain logs from the endpoints that solar winds was instrumented, these event log data. If you had been capturing that data, you could answer that question. Most companies do capture that data, don't they? It depends. If you want into SMBs and mid-sized businesses, even some large businesses, I would say a lot of them aren't actually logging or keeping DNS logs. And if they are keeping DNS data, it may not be query and response. and event logs, the vast majority of organizations don't have a centralized and long-running retention

Starting point is 00:40:49 policy for event logs. But even if they do, their data retention of how long they were keeping this data did not go back far enough. They actually had data. They had data going back 30 days. They had data back 60 days, 90 days. So they're finding out in December about a breach and set of activity that happened and potentially initiated in May. And, oh, I kept all this great data, but I can only go back three months. And three months from December is September. And for a brief that happened in June and July, that's, in some of the respects, useless. That's a scary place to be in to not know if you were compromised or if you were when it started or what happened or where did they go. How did they pivot?

Starting point is 00:41:25 It's a missed opportunity and probably a bit scary for some of these companies is that I was collecting all the right data, but I didn't have it for long enough. So I don't actually know. Wow. We're helping a lot of companies right now to see what resources they have. We specialize in memory forensics, requiring memory from their solar windsor requiring disk artifacts or full disk images, you know, any log sources. And if we have some stuff that we conventionally go in and say, doesn't look like it or definitely, yes, you were. You know, we see these items that clearly indicate that you got a second-speed breach and you need to expand this out. But we can't give anyone, if they're unlimited data, a confirmed clean bill of health.

Starting point is 00:41:59 It's a little bit like going to the doctor and having like maybe a continuous glucose monitor for the last year, but you only have the data for the last three weeks stored. And it's sort of like, okay, here's where it's happening. I'm getting sick, but I only have the three weeks. It's just like a really tough thing to figure out. I want to break this down by advice for big companies, like large enterprises, advice for small and medium-sized businesses, and advice for consumers. So let's start with the big companies, because the best threat actors, they understand the reality of modern enterprise IT.

Starting point is 00:42:31 What are pieces of advice or mindsets even that you have to offer for how chief security officers, CEOs, leaders should be thinking about the implications of this for their business? I mean, I've spent a lot of my career in big companies. think the thing to do right now is to think about strategy. Like the tactics are great and there's going to be a lot of people chasing a lot of actions over the next days, weeks, months. But I think the strategic view of how an organization wants to think about security as we start to understand what happened and how it happened will consistently see in some organizations that

Starting point is 00:43:02 security either wasn't funded, it wasn't empowered, it didn't have a remit to act. It may have been under assault. People often view security as being a cost center as something that, you know, contributes to the lack of performance in a business, and that is an attitude that is still quite popular. So I would say that, like, it's really going to be about figuring out strategically where does security sit, what's the right amount to spend on it, how do you effectively empower it, and then how do you partner and build security into your business so that it's something that helps enable it versus something that holds it back? Yeah, generally, no one really thinks like security is not important. I don't think we ever hear that. Now, action may speak louder than words

Starting point is 00:43:40 sometimes. But I think a lot of people think about, oh, it's an afterthought. I'm going to add it later or, yeah, yeah, we'll do that one day. And I think, like, our main advice to a lot of these different organizations, whether it's a startup or a mid-sized company, or a company that's growing really rapidly, is not necessarily that they need to come out of the gate and have to have every imaginal security product. They need to be auditing all their source code on day one. They need to have everything locked down and the latest firewalls and this filter and all these EDR products. But it's like, think about that stuff. Are you doing the two-factor. Are you lazy? I don't need to put, you know, two-factor on my sales force account

Starting point is 00:44:14 where all my most sensitive contacts and information is in my organization. Or, yeah, I don't really need to put an email. It's like, it's easier if everyone can just log straight in or I'm just going to share this route, you know, Amazon key to get into AWS because that's just how our organization's growing and we're not formal. There's things that people can do best practices, actions that organizations can take, see what you can do now, see what you can do along the way. and put that on your radar so you're not in a position where you're starting from scratch or trying to investigate a breach or figure out if you even had a breach. We all knew that we should have done. And we knew that two years ago. And we run into that a lot. Don't wait till later.

Starting point is 00:44:52 And now advice for consumers, like just day-to-day people like family members, et cetera. What would your advice be for how to think about things like this? We wrote a really excellent blog post last year called 16 Things You Can Do to Protect Yourself. and I would strongly recommend that people do all of those 16 things. It's all really basic stuff, and it starts with two-factor authentication, patching your systems, and goes all the way down to how you want to think about securing

Starting point is 00:45:17 your potential social media accounts, etc. Yeah, we issued some guidance, and it's a couple of different sections of prevention and detection, and then remediation, if you have an actual threat or concern. From the prevention side, prevent unnecessary access from your servers, like your solar wind server or other devices from talking to the internet. That's a prevention mechanism. You know, monitor your assets,

Starting point is 00:45:35 to kind of see where they're logging in from. If you have that kind of centralized logging or like a SIM, same thing, make sure you're capturing either from event logging or your endpoint security products that actual commands being run on the system are being logged because that can be pivotal and be critical to one detection. But even if you're not actively monitoring it, you can go back and say, hey, what commands are running this server? That's not consistent with what our system admin or the typical activity would do.

Starting point is 00:45:59 But take a look at your mail server. Look at where your email is going because that's where the attackers, I believe they're way ahead of the game with regards to the things that they can do in Office 365 and Azure AD where they are so familiar with the administrative commands and what to do from a sysadmin aspect. They're able to do a bunch of things

Starting point is 00:46:16 and hide in ways that people have never even thought about and encountered, and it's not necessarily like their ghosts or it can't be found. People just don't know to even look for it. And then just from the general remediation perspective, once in the device is in backdoor to compromise, it's an untrusted system now. Don't just, like, roll back to an earlier version, or I'm just going to upgrade the new version.

Starting point is 00:46:36 We say, hey, blow that whole system away. Start with a fresh, clean install. And if you're putting SolarWinds Orion back on it, download the newest version that's not backdoor and start everything from scratch. If anything used on that server, if your SolarWinds set up for the Orion had credential, says it change all those passwords and make sure those passwords aren't similar to, like, old passwords that you're used. And the other thing, too, is kind of any sensitive API key integration and things. We saw two-factor bypassed to get into email by this threat actor because they had taken a secret key and we'd be able to generate cookies and skip into the email system while not actually being challenged for two-factor. You've got to think about the stuff that someone could steal if they're in your network related to this, but also that advice extends well beyond this threat actor and Solarwin specifically.

Starting point is 00:47:18 That's great. I'll include links to Vlexity's blog post as well as the 16 things that you can do to secure yourself in the show notes. bottom line it for me what's your takeaway it's consistent with what we've been saying for a while now the hardest problem to solve is third party risk and this is probably the most significant third party breach that we've seen in history and so i think it's going to take us months to really understand what happened and probably years to fix it thank you so much you guys for joining this episode of 16 minutes which is a 3x 16 minutes definitely thanks for having me yeah thank you so much and step it seems that we're always catching up when the world is burning down

a16z Podcast - Anatomy of the SolarWinds Hack: Who What Where When How

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.