Screaming in the Cloud - Security Made Simple in the Data Economy with Mark Curphey

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production.

Starting point is 00:00:39 I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you and watch for the wince. If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules.

Starting point is 00:01:28 If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the cloud. Low effort, high visibility and detection. To learn more, visit lacework.com. Welcome to Screaming in the Cloud. I'm Corey Quinn. A recurring theme of a lot of my nonsense has been finding hapless companies who have not been adequate stewards of the data with which they have been entrusted and giving them the ignomious S3 bucket negligence award. That seems to be something that isn't well appreciated in some areas, so I

Starting point is 00:02:06 figured let's have a conversation about that in a bit more depth. Today's episode is sponsored slash promoted by OpenRaven, and I'm joined by Mark Kerfee, their co-founder and chief product officer. Mark, thanks for joining me. Thanks for having me. So let's start at the very beginning. As a co-founder and chief product officer, that means that you're one of those folks who very early on presumably had part of the idea, if not the entire idea, for what the company does. What is OpenRaven and where did you folks come from? What problem were you able to solve?

Starting point is 00:02:38 Sure. So actually, it's an interesting story. I had previously done an application security company called source clear that I sold to C. A. my co founder Dave Cole was the early product guy at a company called crowd strike which recently I. P. O. D. and David I'd always wanted to work together really didn't know what to go do. And the honest truth is we we decided to go be good capitalists and went out and asked our chief security officer friends what's the biggest problem that you've got. And resoundingly it came back that I don't know where my data is. I don't know what type of data I have. I don't know how it's being protected. And data breaches are happening all the time.

Starting point is 00:03:14 And it's probably the big thing that I'm going to get fired for. So frankly, Dave and I rubbed our hands together and said, I think we can make money off of that and solve a meaningful problem. And hence, the Open Raven company as it is now. Which is absolutely something that is increasingly in the public eye. Well, we'd like to hope at some point people just shrug, give up, assume that everything about them is public and that's the end of privacy to some extent and get on with their lives. At least that's the negative story. I like to believe that on some level, getting better than we are today is possible. And what infuriates me and why I started giving out S3 bucket negligence awards personally, isn't because you wound up getting breached. It's,

Starting point is 00:03:56 I view on some level that is being akin to taking an outage. It happens to everyone on some level and you have to prepare for it as best you can. All right, I get that. One of the problems that we tend to see from all corners is that companies that wind up getting breached are in many cases exposing data that isn't theirs, that no one consented to have handled by these folks. We see it in some cases with some of the credit reporting agencies and some of the data brokers. And it's not always S3 buckets, but it is the consistent drumbeat of companies not being adequate stewards of the data that has been entrusted to their care. Yeah, I mean, look, it's certainly true that a lot of people have breach fatigue. This stuff's been going on for years and years and

Starting point is 00:04:40 years. I think that the S3 negligence awards, the bucket wall of shame, go back down to DEF CON, the hacker conferences, it's all the wall of shame from passwords. It's not necessarily a new phenomenon. And I would also say that whilst S3, you open the register and every day there's an S3 bucket thing, it's certainly not only S3. We know that. We've been doing some profiling of things and Elastic and MongoDB and everything else is hanging out there. But I guess buckets sort of tend to be so easy just to make them open and host data on them in the first place. But I think you're right, like companies that have data, whether it's knowingly capturing

Starting point is 00:05:16 it or processing it, you have a duty of care of managing and holding someone else's data. And it just feels like people don't take that duty of care seriously enough, right? And what's more is that you'll often see a company get breached and, oh, your data has been subject to a data breach. And ideally, you wind up getting that notification before you read about it in the papers. And a lot of the companies that you do business with that contact you are very quick to point the finger of blame at a third-party contractor. Well, I didn't hire the third-party contractor. You did. And if you're not willing to wind up owning up to that, well, you're effectively trying to outsource the work, which is fair, and the blame, which is not. How do you stand on that?

Starting point is 00:05:59 Yeah, well, it's a system that we happen to be using, but it was someone else's problem that the default configuration was their problem, right? I mean, also, Corey, I can tell you, I have a lot of friends in the forensics industry who deal with incident response. And still to this day, the vast majority of data breaches are never reported. Like I know of breaches that have happened in major public companies where all the breach laws are such that they should have notified their customers and they should have notified the authorities and it just doesn't happen. So it's one of those problems that I think is like, yeah, it's like the iceberg problem, right? And to a large extent, it's kind of an interesting one in that when someone notifies their customers,

Starting point is 00:06:34 they're doing it from transparency. And whilst I think you and I would both appreciate that and place more trust in those companies, the reality is a lot of the public wouldn't. And so the incentives aren't necessarily aligned up there, right, around why and why they should do it. I would take it even a step further than that. I would argue that I don't know if it's a majority, but a significant is going to be met on some level with, good Lord, no, why would we want to do that? We're happier not knowing. And that depresses me. Absolutely true. I've been building security tools for 20 years. And you'll be surprised the amount of people that if I deploy your tool, even as a trial, and we find that we have problems, then I'm legally responsible for going and dealing with it, and I won't touch it. The other thing that's kind of related to that is that the security guys are incredibly busy as well. And there's the security

Starting point is 00:07:28 tools historically generate lots and lots of noise, very low signal, lots and lots of noise. And so the security teams look at it and they go, oh my gosh, I'm going to get a whole bunch more noise that I have to go deal with. Right. And a bunch more work that I have to go do. Like, can I bury my head in the sand? Like, sure. And it happens. That's just the reality of the world we're living in, unfortunately. It is. And for better or worse, I think that it's a world that we're sort of stuck in to some extent. Do you think that the drumbeat of open S3 buckets that have been misconfigured containing sensitive data, it feels like we aren't seeing as many of those as we once did, but is that just because people aren't reporting them? Is it something that is going away slowly but surely, or is it just as

Starting point is 00:08:10 bad as it's ever been, but it's not making headlines anymore? So I'm actually building a tool to profile the AWS IP space for all the open buckets and all of the open Elasticsearch and MongoDB thing. There's a few of those that are out there like Greyhound Warfare, right, which you can go search for for S3 only, but not all the other stuff. I think when I looked up there the other day, I want to say there were about 750,000 open buckets. Now, of course, you know, not all of those open buckets. A number of them should be open, right? That's kind of why they're there and etc. So it's... Oh, I keep getting alerts constantly about open buckets that I have that are intentionally open. And I get alerts in the console, and I get emails at least quarterly. And this bucket that starts with the word assets dot and then some domain. Yeah, how about that? That is, in fact, designed to be open. On some level, I wind up, a few of them, just slapping a CloudFront distribution in front of it, not because I need it, just because I want it to stop nagging me. Of course, that's the signal noise problem again, right? And honestly, that's part of the reason

Starting point is 00:09:07 why OpenRaven's doing very well, is that all these companies have been hiring people to go around, chase down open buckets, which are designed by functionality and to be open in their organizations, but don't contain any sensitive data. So I don't know. Look, to answer your question around the SV bucket thing, I think, think again like like breaches people have got a bit of fatigue and you know there's only so many articles the register can can do with with hey s3 bucket open to the world and what is it you know 27 terabits of data or 900 terabits of data it's like you know it's not necessarily one upmanship on the headlines anymore but my you know the gut tells me the faster the faster we deploy things, the whole kind of DevOps movement in many ways is moving against the security grain, and rightly so. So to think that the problem is getting better I don't think is reasonable.

Starting point is 00:09:59 And then I think that Amazon in some ways have done good things with the security policy and all of those types of things. But they're largely designed for greenfield environments. And when you step off the reservation, I mean, gosh, what do you think the average developer is going to be able to figure out how to configure that XML policy? In an average XML bucket, of course not. They're just going to make the damn thing open and they're just going to move on and do their job. So we've got to figure out how to get better tooling, easier, secure by defaults. And then, like you said, we've got to figure out ways to reduce the noise so that people can act on signal and not get bombarded by noise and just shut it down. My approach to cutting through that noise on open S3 buckets, as I tweeted out a couple of times now,

Starting point is 00:10:41 is to just copy a few petabytes of data into the open buckets. My operating theory is that while you're going to ignore a politely worded email from a security researcher, you're probably not going to ignore a bill that is 80 times larger than it's supposed to be at the end of the month. That seems like it might be, among other things, legally fraught. What's your approach at OpenRaven to solving this particular problem? Well, so for us, it's all about improving the signal to noise, right? So it's about, you know, setting up a policy that you can say on this bucket, this is the type of data that I'm expecting. These are the security controls I'm expecting. If it deviates from that in any way, go let me know. And then we use OPA, Open Policy Agent, to go check that and go send alerts or pump stuff out

Starting point is 00:11:27 to a firehose, pump it to a security event system or whatever. So in general, that's it. It's like define what good's meant to be and then let me know when good is not occurring so I can go figure out how to deal with it. And of course, you create generic policies like, look, I never want to see financial data on a bucket that's open to the Internet and that's unencrypted. Or I never want to see something with a CIDR range of 0, 0, 0, 0 and through some security group or something. So that way, essentially what companies get to do is sort of encode their intended use policy and alert when that's not there. So for us, it's all about that. Part of what we see, and I've seen this in the security industry for the last 15 or 20 years, is there's textbook security and then there's the real world security. You go look at

Starting point is 00:12:08 a lot of kind of textbook security solutions and they're fine. They work absolutely fine. I worked at Microsoft for a long time and I was always amazed at how everything worked perfectly at Microsoft. And then when you stepped off the reservation, nothing worked properly. But everyone in Microsoft would be scratching their heads going like, well, it all works fine here. It's like a developer saying, hey, it works on my laptop, right? There was only one I committed it to CI, the problem occurred. So it's that same thing. And you've got to design stuff for the real world. You also say it goes beyond just S3 buckets, which I believe for a while there, was it Elasticsearch or was it Mongo that had a default password of change me or something

Starting point is 00:12:42 horrifying like that? Yeah. One of those, I forget which one it was. Elastic's a big offender for sure. Mongo's a big offender for sure. But I mean, you also see like Jenkins servers that are sat out there, right? I mean, that camera thing, was it the Vedaka thing recently? Like that was a CI server that happened to have a script that had access to loads of things. But the amount of Jenkins servers that are accessible through the internet is shocking. It's not just buckets for sure. It definitely becomes a weird thing. I don't know if there's a fix here. I really don't, longer term. But instead of looking forward for a minute, let's go back and visit the past for a bit. You were the founder of the OWASP reporting list. What is OWASP? Is it a list? I'm most familiar with the OWASP 10, but I'm certain

Starting point is 00:13:25 you'll have a better story on that than I will. Yeah, no, no, no. Top 10 was a small thing. So I was running software security at Charles Schwab early 2000s, 2001. Before it was kind of a really big thing. And we used to get vendors coming in trying to sell me products. And honestly, it was kind of a joke, right? Like my market opened, we'd have 8 million accounts, like trillion dollars under asset. And people would come in and try and sell me a web application firewall, which maximum throughput was like, you know, 0.01% of my market open traffic and things. But there was nothing out there on the internet to go point to and to say, well, this is good,

Starting point is 00:13:59 right? It was basically me versus a vendor coming in. And so I said, right, this is kind of crap, right? And I got together with a bunch of other people that were also doing similar things, some other people at some other banks, some other people in other companies. And I said, right, I'm going to go publish something. And I wrote it over a weekend, literally wrote this guide called the OWASP guide. And it was basically a set of principles around software security, like least privilege and nothing sophisticated, but

Starting point is 00:14:25 it was those types of things and published it. And then OWASP was basically born, right? So it's the open web application security project. Then it got a lot of traction because a lot of people had signed up and a lot of people were then starting referencing this to build their own application security programs. Over time, of course, OWASP got very successful. I think it's like 40,000 people or something like that, that turn up and there's conferences all around the world and chapters all around the world. And there's lots and lots of projects that have taken place. One of which is the top 10 that you referenced that a lot of people know of. And the top 10

Starting point is 00:14:59 has been around, I want to say, since like 2004 or something like that. I don't know. I have to go back and check with history. It's hardly changed since 2004. And you can have a good conversation around why that is. But yes, that's the history of OWASP. And now, of course, you have this list that doesn't seem to have changed significantly in a while. I mean, back when I was starting up the Meanwhile in Security podcast and newsletter with Jesse Trucks, we talked about that being one of the key problems is everyone wants to know how to handle security and cloud. But if we take a look at how a lot of application vulnerabilities exist, that list hasn't materially changed. If anything, the advent of cloud has fixed some security issues in that you're not allowed to muck with them anymore.

Starting point is 00:15:40 Data center physical security is no longer a vector for most folks who are all in on a public cloud provider. But you're also dealing with this other problem of, okay, now it's a list of enumerated S3 buckets, for example. And if you misconfigure that, it's something that's globally known. And I guess it removes the security through obscurity argument insofar as there ever was one. Have things changed in a time of cloud, or is it just the same thing with new labels on it? Well, I mean, there's a couple of things, right? So you've got to ask yourself, what is the OWASP top 10 or top 10 of, right? Is it the top 10 most popular issues, the top 10 most severe issues, the top 10 voted by security people? No one's ever really been able to get to that apart from an arbitrary top 10. And I don't want to take anything away from

Starting point is 00:16:23 it because the top 10 has been incredibly useful in getting to developers, giving them a tangible like 10 things, go focus on these 10 things and you'll raise the bar. So that's kind of piece number one. But like, you know, has it changed? Well, should you have expected it to change? Depends on what you believe it's based on. If you don't look at them though, like no, like things like injection and broken authentication

Starting point is 00:16:44 and sensitive data exposure, those things haven't changed because they're just general things and they're going to be around forever, right? You think sensitive data exposure is going to go, it doesn't matter what technology we change. It's always going to be there. For me though, what's kind of interesting about it. And maybe I'm a bit of a skeptic about it is that you can eradicate total classes of problems, I believe by changing patterns. So a good example is like, look, if you go use one of these modern development frameworks, application frameworks, like it's built in inherently and the same with a lot of SQL injection problems that you used to see all over the place, right?

Starting point is 00:17:15 Like you'd have to intentionally go create those problems for the most part now. And I think the cloud's done the same, right? It's taken a lot of problems away. It's extrapolated them into a service. It's extrapolated them into a pattern. It's extrapolated them into a pattern. And the pattern can then go away. So, you know, back to the S3 thing, like, I think there's hope, right? Because, you know, if you can make a change upstream, I mean, you've probably seen recently all these damn, you know, supply chain attacks, right? Like, and the bad guys are going further and further upstream where

Starting point is 00:17:42 they can affect things downstream. The good news about all of that is if you can figure out upstream the way to go secure it, everything downstream gets secured as well. So I think with a lot of these things, if we can set up trying to play whack-a-mole or put the finger in the dike, if we can start thinking about patterns and ways to go solve them at a class or problem level, then we stand a chance of fixing them. This episode is sponsored in part by Chaos Search. As basically everyone knows, trying to do log analytics at scale with an elk stack is expensive, unstable, time-sucking, demeaning, and just basically all around horrible.

Starting point is 00:18:16 So why are you still doing it, or even thinking about it, when there's Chaos Search? Chaos Search is a fully managed, scalable log analysis service that lets you add new workloads in minutes and easily retain weeks, months, or years of data. With Chaos Search, you store, connect, and analyze, and you're done. The data lives and stays within your S3 buckets,

Starting point is 00:18:40 which means no managing servers, no data movement, and you can save up to 80% versus running an Elk stack the old-fashioned way. It's why companies like Equifax, HubSpot, Klarna, AlertLogic, and many more have all turned to Chaos Search. So if you're tired of your Elk stack falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try Chaos Search today and tell them I sent you. To learn more, visit chaossearch.io.

Starting point is 00:19:10 I sure hope you're right. I mean, in an ideal world, you will be, but it's, I have so much trepidation around all this, and I don't know how it's going to wind up playing out. And I hope that it's going to go well, but it just feels like it's, you're constantly railing against the tide. And I don't know how to wind up playing out. And I hope that it's going to go well. But it just feels like you're constantly railing against the tide. And I don't know how to wind up addressing that. I really don't. I wish I

Starting point is 00:19:32 did. Is there anything you can say that helps me be more optimistic about this? I mean, you're right. Look, I'm no longer in the application security business after spending 15 or 20 years in there, because I just gave up trying to convince developers to care about security. I just, and I don't blame the developers. They've got another job to go do and security is too hard. So, you know, like for me, it was just pushing my lasses uphill. And I think to your point, like, yeah, why would you expect anything different if we carry on doing the same thing? And the reality is we're moving faster and faster. We're making it easier and easier to deploy things. We're getting

Starting point is 00:20:09 more and more complex systems. Why would you expect anything different? So yeah, I don't think you're skeptical for a bad reason. No, for better or worse, we still wind up having these problems. I don't know how to solve it. I really don't. I mean, look, for the reality, if you go back to the old days, like the old school, like honestly, I'm a bit of an old person, right? But you go back to like, you know, some of the military things used to be like trust, but verify. That model works incredibly well, right? You trust people are going to do the right thing, you verify they've done the right thing.

Starting point is 00:20:40 That means you don't hinder the speed, but you go back and check and if anything happens, you come back and it's like accepting things. One of the other ones around that was like, it's people processing and tools, people processing technology. And again, technology is never going to solve the problem of security. It's a people problem. Like, you know, you can't patch stupidity and all of those phrases. But if someone gives someone access to a local root account or whatever the thing is, it doesn't matter how many other security controls you've got. I mean, I've seen it in cloud environments, as I'm sure you have. Someone goes and creates a security group, 0000, so they can happen the thing from home and don't have to come in and go through all of the other control points. And it's just the way stuff works, right?

Starting point is 00:21:17 So if you have that, if you take that mentality of people, process and technology and trust, but verify, I think use the right technologies and build the right process around it, then you could at least manage the risk. The risk is never going to be zero, but you can at least manage the risk during its acceptable level. Let's pivot a little bit and talk about the flip side of data security. And that comes down to privacy. There's been a bunch of regulatory efforts around that. GDPR, for example, California has its own version of that that's going out. And there's also a growing school of thought that thinks on some level we're post-privacy. Where do you stand with that?

Starting point is 00:21:55 Yeah, I mean, look, the privacy regulations are raging right now, right? You've got GDPR, you've got CIPRA, the California one, you've got HIPAA, the Health Information Privacy Protection Act, and they're all over the world. Japan has them, Australia has them, they're all over the place. And I think the US now is talking about having a central breach law around privacy data. The great challenge is, is that we're all becoming a data economy and we're all, companies are all becoming data companies. And so they want to gather more and more data. And, you know, the reality I think is that this whole stuff around cookie consent, I just think it's just nonsense. Like when was the last time you said, hey, I'm not

Starting point is 00:22:29 going to consent to using my cookies. It's kind of like back in the old days when you said, hey, I'm not going to allow JavaScript to run on my browser, right? Like all of a sudden nothing works and you're like, oh, I'll succumb. And but then before you know it, data is, it's been overreached, right? Like you probably saw the Alexa the other day that has the radar so it can watch you sleep in your bed, right? You know, sure, of course, they're not going to use that data for anything bad. But next time a breach happens or some clever data science person decides to go to correlate something, I don't know what it might be in the middle of the night, it happens. So I think what you're starting to see is that you're starting to see regulators and legal people who don't really understand technology regulating to prevent those bad things happening.

Starting point is 00:23:09 And then technology trying to figure out how to go and meet those regulations, but meeting it with the absolute minimum bar versus trying to figure out what the actual intention is. And I think you're going to see a bigger and bigger gap. I mean, look at what happened with third party cookies as an example, right? Like the whole third party cookie thing. We saw what was, the cause headers. We saw anti-cross-site scripting headers because all of those things started happening. And then, you know, what does everyone do? They just go call a tracking pixel, right?

Starting point is 00:23:34 And then all the marketing automation tools carry on working as possible. So, I mean, I think you've got a balance between technology working as intended in certain good use cases, and there are people using that for their own use cases, which, you know, either break or push over the line of privacy. I don't know. How do you see it? I think on some level, it's not necessarily that people care necessarily that some company in the aggregate knows what they're doing. There are some that do, and I'm not disputing that. But for most of us, I don't necessarily care if Google, for example, knows what I browse on the internet. I care much more if you personally know what I personally am browsing on the internet. So there's a question of, once they have that data, do I really care that much about what they do with

Starting point is 00:24:21 an aggregate? Not really. Do I care what they do about individualized basis? Kind of, yeah. And do I care if they're making then that individualized data available to third parties? Absolutely. Yeah. It comes down to what the use of that thing is. Now, I know that I am not going to win friends with that particular argument myself. And I get it.

Starting point is 00:24:43 In an ideal world, I think that advertising should be something radically different than it is. There are advertisements in this podcast, for example, and they're catering to an audience that cares about the topics we talk about on this podcast. But I have no tracking data of who listens to this other than raw download numbers and rough GOIP by continent.

Starting point is 00:25:04 It's not something that is ever going to be attributed, at least from where I sit, to individual listeners, nor would I want it to be. Yeah, but look, here's why I might be able to convince you otherwise of that. In China, there is a well-known place called the Beijing Genomics Institute. And the Beijing Genomics Institute do genetic engineering, and not necessarily for good, right? So it's not necessarily to find cures for things. It's also for other nefarious purposes. And the Beijing Genomics Institute acquire DNA data from US hospitals, US healthcare systems, when you get your blood checked.

Starting point is 00:25:40 Now, that data is supposedly aggregated. But once you can start pulling apart DNA strands, you can start identifying people at different levels. And I think that's the danger. There's been a lot of cases where de-anonymizing information is possible. And so you're making the assumption that that data is generally de-anonymized and used for the right reasons, but there's been case after case where that's not the case. So maybe you'll change your mind on that, Corey. I don't know. Maybe. I also, on some level, feel like I'm fighting a losing battle against the tide. Yeah. Yeah. My wife says, aren't you worried about your credit card going missing? And I'm like, I'm sure it's in many, many databases at this point. I rely on Visa right at that point. Well, that's also a separate problem too. I mean, this idea of, oh, your identity was stolen because someone else has opened a credit card in your name or stolen your credit card. My very honest response to that is, oh, so you weren't cautious about who you decided to lend money to and validate they were the person you thought. And you're trying to make this my problem because why exactly?

Starting point is 00:26:37 Yeah. I mean, look, in those cases, that's why it's the corporate's responsibility to deal with those issues, right? I guess it's the same with social security numbers and that they're out there in so many places and the internet, they're pushed around in so many different ways, aren't they? I think we've got to start moving into some of these zero trust kind of protocols and zero knowledge ways

Starting point is 00:26:53 and all of that type of thing in the future. Indeed. And I think that there's one thing that every corporate entity listening to this or representative of the same can agree on, and that is they prefer this conversation to remain hypothetical and aimed at the abstract, not at them right after they've had a data breach, which of course brings us back to OpenRaven and how it aims at these things. You do have a, at the time

Starting point is 00:27:16 of this recording, it is still upcoming, a paper coming out contrasting what you have built with, I believe it's Amazon Macy? That's right. Yep. Yep. So when Dave and I founded the company, we went out, like I said, and we asked everyone, what's the biggest problem? And it was data security. And then when you broke that down, it broke down into, let me know where my data stores are. So, you know, do I have buckets? Do I have stuff in RDS? Do I have stuff in file systems, et cetera? What type of data do I have there? How is that data being protected? Access control and encryption and all that things, and who has access to it? So it basically broke down to those things. Those things haven't changed at all. So think of that piece number two as what type of data do I

Starting point is 00:27:54 have as being data classification. Amazon have a service called Macy, which works on S3. So we've built that feature. Now, lucky for us, as it turned out, you get a few really good breaks in the startup world, is that Amazon Macy's, it turns out, is not very good and incredibly expensive and very, very slow. So frankly, the way we market it is cheaper, faster, and better than Macy. And we believe in transparency of that. Every vendor will say we're way better than everything, right? So we've kind of done what you would do with a clinical trial in that we have basically built a, you know, here's the test,

Starting point is 00:28:28 here's exactly what we're going to test for, kind of like laying it out in an academic paper. Here is the data. So you can go rerun the test yourself and here are the results. And we know that we are way, way, way more accurate than Macy. We're deployed as Lambda functions

Starting point is 00:28:41 so we can scale up and run much, much faster than Macy. And then certainly way, way cheaper than Macy. But that wouldn't surprise you at all in that case, would it? No. Even after their massive recent price reduction, it was still, okay, that is in fact still incredibly expensive across the board. I mean, my argument with the original Macy and its pricing was I had a customer at that point eyeing it and doing some math. And yeah, okay, first month would have been $76 million to run it in their existing stuff, which was significantly more than at that point their annual AWS bill. So it was, okay, let's go with option B, which is literally anything except that, and you'll save money. Even a data breach

Starting point is 00:29:21 wouldn't have been that disastrous compared to the pricing story. And now they've cut it to 20% of that, but that's still an eight-figure bill to run these analytics on their data set. And that's not tenable. And on some level, it becomes the differentiated value of doing that isn't there for customers. If I wound up running all of the various security services that AWS offers on an environment, it's pretty clear that it would cost more than the data breach would. Well, it doesn't even work. Even if the cost thing was put aside, like one of our customers tried it, I think they

Starting point is 00:29:53 spent a million and a half on a trial in a month, and it found 30 first names in a credit card database. I mean, it's kind of crazy. And when you pick it apart underneath the hood, it's a giant regex, essentially, and just doesn't really, really work. I mean, the reality is of crazy. And when you pick it apart underneath the hood, it's a giant regex, essentially, and just doesn't really, really work. I mean, the reality is that that thing was built. It was actually a, it was originally an In-Q-Tel project, which is the funding arm of the US intelligence agencies.

Starting point is 00:30:13 It was called Harvest.io. It was an acquisition that they bought in and it was built a long, long time ago. If you want to do data classification today, you have to be able to not only identify structured, unstructured, and semi-structured data. And it comes in all places, right? And it goes into all file formats. In S3 buckets, it's parquet files, which are at the back end of lake formation and lake houses and things like that.

Starting point is 00:30:36 But when you find a piece of data, you've got to be able to go and validate, is that data real? I mean, take an AWS API key as an example. It's very easy to go figure out how to push that thing into that format, but is it a real key? Whereas if you use validators, go log into an AWS API and you'll get a return that will say, is this a valid key? And which account is it associated with? And so we've done both in terms of the accuracy of identifying the information first, the tests that we've got show in general, we are twice or three times more accurate than basically on finding the initial piece of data. But then we have these validators. So you get a

Starting point is 00:31:08 credit card, go call a credit card API. Is it a real credit card or is it just a 16 digit int? And you can go check that stuff. Data classification has moved on since that stuff was there. So even if the pricing thing was fixed, and as you point out, it certainly isn't, it's just not a good option for people. And then, you know, the kind of second piece to that is that the majority of customers that we see and people are looking at things like Snowflake. I mean, if you look at these data platforms, Databricks, CloudAra, Snowflake in particular, you know, they're built on top of AWS services, but people are moving data to those places. So it's not just an S3 problem.

Starting point is 00:31:42 As I said, it's about people putting data in Elasticsearch, in RDS, in file systems. The data is everywhere in backups, like all of the stuff gets pushed up into backups and stuff as well. And so you've got to have a service which goes and checks it. We decided to go compete with S3 and beat Macie first, but that's certainly not where the tool and technology is going. Now, for better or worse, it would seem not. Thank you so much for taking the time to speak with me. If people want to learn more about OpenRaven, what you're doing, how you're doing it, where can they find you? Yeah, OpenRaven.com is the best place to go. We've also got a pretty exciting open source tool coming up soon, which is called Magpie. Magpie is a cloud security posture manager. So think of it as we'll go out and check all of the security settings on your AWS environment. And so we're releasing that open source around the end of April as well. So keep an eye out for Magpie. We're taking the core out of OpenRaven that does all of the discovery across the orgs, pulls back all the attributes or the IAM or the security groups,

Starting point is 00:32:41 and then allows you to go write security rules on top of that, not data rules, which is what the OpenRaven platform does, but security rules. So also go check that out, but all linked off of OpenRaven.com. And we'll, of course, put links to that in the show notes. Wonderful. Thank you so much for taking the time to speak with me today. I really appreciate it. No, thank you very much, Corey. Much appreciated. Mark Kerfee, co-founder and chief product officer at OpenRaven. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice.

Starting point is 00:33:15 Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment enumerating all of the S3 buckets you have inadvertently left open. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point.

Starting point is 00:33:54 Visit duckbillgroup.com to get started. This has been a humble pod production stay humble

Screaming in the Cloud - Security Made Simple in the Data Economy with Mark Curphey

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.