Screaming in the Cloud - Security Made Simple in the Data Economy with Mark Curphey
Episode Date: April 13, 2021About MarkMark Curphey is the co-founder at Open Raven, a cloud native data security company. Mark’s fingerprints can be found all over the security industry, but perhaps most visibly from ...his role as the founder of OWASP. His contributions range from his time as a hands-on application security director at Charles Schwab, Product Unit Manager of Microsoft’s MSDN program and his more recent role as founder and CEO of SourceClear. Mark’s obsessed with building elegant products that solve hard problems for discerning customers.Links:Open Raven: https://www.openraven.com/
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by LaunchDarkly.
Take a look at what it takes to get your code into production.
I'm going to just guess that it's awful because it's always awful.
No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy?
What if you could test on a small subset of users and then roll it back immediately if results
aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and
tell them Corey sent you and watch for the wince. If your mean time to WTF for a security alert is more than a minute, it's time to look at
Lacework. Lacework will help you get your security act together for everything from
compliance service configurations to container app relationships, all without the need for PhDs
in AWS to write the rules.
If you're building a secure business on AWS with compliance requirements,
you don't really have time to choose between antivirus or firewall companies to help you secure your stack.
That's why Lacework is built from the ground up for the cloud.
Low effort, high visibility and detection.
To learn more, visit lacework.com.
Welcome to Screaming in the Cloud. I'm Corey Quinn. A recurring theme of a lot of my nonsense has been finding hapless companies who have not
been adequate stewards of the data with which they have been entrusted and giving them the
ignomious S3 bucket negligence award. That seems to be something that isn't well appreciated in some areas, so I
figured let's have a conversation about that in a bit more depth. Today's episode is sponsored
slash promoted by OpenRaven, and I'm joined by Mark Kerfee, their co-founder and chief product
officer. Mark, thanks for joining me. Thanks for having me. So let's start at the very beginning.
As a co-founder and chief product officer, that means that you're one of those folks
who very early on presumably had part of the idea, if not the entire idea, for what the
company does.
What is OpenRaven and where did you folks come from?
What problem were you able to solve?
Sure.
So actually, it's an interesting story.
I had previously done an application security company called source clear that I sold to C. A. my co founder Dave Cole was the early product guy at a company called crowd strike
which recently I. P. O. D. and David I'd always wanted to work together really didn't know
what to go do. And the honest truth is we we decided to go be good capitalists and went
out and asked our chief security officer friends what's the biggest problem that you've got.
And resoundingly it came back that I don't know where my data is. I don't know what type of data
I have. I don't know how it's being protected. And data breaches are happening all the time.
And it's probably the big thing that I'm going to get fired for. So frankly, Dave and I rubbed
our hands together and said, I think we can make money off of that and solve a meaningful problem.
And hence, the Open Raven company as it is now. Which is absolutely something that is increasingly in the public eye.
Well, we'd like to hope at some point people just shrug, give up, assume that everything about them
is public and that's the end of privacy to some extent and get on with their lives. At least
that's the negative story. I like to believe that on some level,
getting better than we are today is possible. And what infuriates me and why I started giving out
S3 bucket negligence awards personally, isn't because you wound up getting breached. It's,
I view on some level that is being akin to taking an outage. It happens to everyone on some level
and you have to prepare for it as best you can. All right, I get that.
One of the problems that we tend to see from all corners is that companies that wind up getting
breached are in many cases exposing data that isn't theirs, that no one consented to have handled
by these folks. We see it in some cases with some of the credit reporting agencies and some of the
data brokers. And it's not always S3 buckets, but it is the consistent drumbeat of companies not being adequate stewards
of the data that has been entrusted to their care. Yeah, I mean, look, it's certainly true
that a lot of people have breach fatigue. This stuff's been going on for years and years and
years. I think that the S3 negligence awards, the bucket wall of shame, go back down
to DEF CON, the hacker conferences, it's all the wall of shame from passwords. It's not necessarily
a new phenomenon. And I would also say that whilst S3, you open the register and every day there's
an S3 bucket thing, it's certainly not only S3. We know that. We've been doing some profiling of
things and Elastic and MongoDB and everything else is hanging out there.
But I guess buckets sort of tend to be so easy just to make them open and host data
on them in the first place.
But I think you're right, like companies that have data, whether it's knowingly capturing
it or processing it, you have a duty of care of managing and holding someone else's data.
And it just feels like people don't take that duty of care seriously enough, right?
And what's more is that you'll often see a company get breached and, oh, your data has been
subject to a data breach. And ideally, you wind up getting that notification before you read about it
in the papers. And a lot of the companies that you do business with that contact you are very quick
to point the finger of blame at a third-party contractor. Well, I didn't hire the third-party
contractor. You did. And if you're not willing to wind up owning up to that, well, you're effectively
trying to outsource the work, which is fair, and the blame, which is not. How do you stand on that?
Yeah, well, it's a system that we happen to be using, but it was someone else's problem that
the default configuration was their problem, right? I mean, also, Corey, I can tell you, I have a lot of friends in the
forensics industry who deal with incident response. And still to this day, the vast
majority of data breaches are never reported. Like I know of breaches that have happened in
major public companies where all the breach laws are such that they should have notified their
customers and they should have notified the authorities and it just doesn't happen. So
it's one of those problems that I think is like, yeah, it's like the iceberg problem, right?
And to a large extent, it's kind of an interesting one in that when someone notifies their customers,
they're doing it from transparency. And whilst I think you and I would both appreciate that and
place more trust in those companies, the reality is a lot of the public wouldn't. And so the
incentives aren't necessarily aligned up there, right, around why and why they should do it.
I would take it even a step further than that. I would argue that I don't know if it's a majority, but a significant is going to be met on some level with,
good Lord, no, why would we want to do that? We're happier not knowing. And that depresses me.
Absolutely true. I've been building security tools for 20 years. And you'll be surprised the amount of people that if I deploy your tool, even as a trial, and we find that we have problems,
then I'm legally responsible for going and dealing with it, and I won't touch it.
The other thing that's kind of related to that is that the security guys are incredibly busy as well. And there's the security
tools historically generate lots and lots of noise, very low signal, lots and lots of noise.
And so the security teams look at it and they go, oh my gosh, I'm going to get a whole bunch more
noise that I have to go deal with. Right. And a bunch more work that I have to go do. Like,
can I bury my head in the sand? Like, sure. And it happens. That's just the reality of the world we're living in, unfortunately. It is. And for better or worse,
I think that it's a world that we're sort of stuck in to some extent. Do you think that the
drumbeat of open S3 buckets that have been misconfigured containing sensitive data,
it feels like we aren't seeing as many of those as we once did, but is that just because people
aren't reporting them? Is it something that is going away slowly but surely, or is it just as
bad as it's ever been, but it's not making headlines anymore? So I'm actually building a
tool to profile the AWS IP space for all the open buckets and all of the open Elasticsearch and
MongoDB thing. There's a few of those that are out there like Greyhound Warfare, right, which you can
go search for for S3 only, but not all the other stuff. I think when I looked up
there the other day, I want to say there were about 750,000 open buckets. Now, of course,
you know, not all of those open buckets. A number of them should be open, right? That's kind of why
they're there and etc. So it's... Oh, I keep getting alerts constantly about open buckets
that I have that are intentionally open. And I get alerts in the console, and I get emails at least quarterly. And this bucket that starts with the word assets dot and then some domain. Yeah, how about that? That is, in fact, designed to be open. On some level, I wind up, a few of them, just slapping a CloudFront distribution in front of it, not because I need it, just because I want it to stop nagging me. Of course, that's the signal noise problem again, right? And honestly, that's part of the reason
why OpenRaven's doing very well, is that all these companies have been hiring people to go
around, chase down open buckets, which are designed by functionality and to be open in
their organizations, but don't contain any sensitive data. So I don't know. Look, to answer
your question around the SV bucket thing, I think, think again like like breaches people have got a bit of fatigue and you know there's only so many articles the register can can do with with
hey s3 bucket open to the world and what is it you know 27 terabits of data or 900 terabits of data
it's like you know it's not necessarily one upmanship on the headlines anymore but my you
know the gut tells me the faster the faster we deploy things, the whole kind of DevOps movement in many ways is moving against the security grain, and rightly so.
So to think that the problem is getting better I don't think is reasonable.
And then I think that Amazon in some ways have done good things with the security policy and all of those types of things.
But they're largely designed for greenfield environments.
And when you step off the reservation, I mean, gosh, what do you think the average developer is going to be able to figure out how to configure that XML policy?
In an average XML bucket, of course not.
They're just going to make the damn thing open and they're just going to move on and do their job.
So we've got to figure out how to get better tooling,
easier, secure by defaults. And then, like you said, we've got to figure out ways to reduce the noise so that people can act on signal and not get bombarded by noise and just shut it down.
My approach to cutting through that noise on open S3 buckets, as I tweeted out a couple of times now,
is to just copy a few petabytes of data into the open
buckets. My operating theory is that while you're going to ignore a politely worded email from a
security researcher, you're probably not going to ignore a bill that is 80 times larger than it's
supposed to be at the end of the month. That seems like it might be, among other things, legally
fraught. What's your approach at OpenRaven to solving this particular problem?
Well, so for us, it's all about improving the signal to noise, right? So it's about,
you know, setting up a policy that you can say on this bucket, this is the type of data that I'm expecting. These are the security controls I'm expecting. If it deviates from that in any way,
go let me know. And then we use OPA, Open Policy Agent, to go check that and go send alerts or pump stuff out
to a firehose, pump it to a security event system or whatever. So in general, that's it. It's like
define what good's meant to be and then let me know when good is not occurring so I can go figure
out how to deal with it. And of course, you create generic policies like, look, I never want to see
financial data on a bucket that's open to the Internet and that's unencrypted.
Or I never want to see something with a CIDR range of 0, 0, 0, 0 and through some security group or something.
So that way, essentially what companies get to do is sort of encode their intended use policy and alert when that's not there.
So for us, it's all about that.
Part of what we see, and I've seen this in the security industry for the last 15 or 20 years, is there's textbook security and then there's the real world security. You go look at
a lot of kind of textbook security solutions and they're fine. They work absolutely fine. I worked
at Microsoft for a long time and I was always amazed at how everything worked perfectly at
Microsoft. And then when you stepped off the reservation, nothing worked properly. But
everyone in Microsoft would be scratching their heads going like, well, it all works fine here.
It's like a developer saying, hey, it works on my laptop, right? There was only one I committed
it to CI, the problem occurred. So it's that same thing. And you've got to design stuff for the real
world. You also say it goes beyond just S3 buckets, which I believe for a while there,
was it Elasticsearch or was it Mongo that had a default password of change me or something
horrifying like that? Yeah. One of those, I forget which one it was. Elastic's a big offender for sure. Mongo's a
big offender for sure. But I mean, you also see like Jenkins servers that are sat out there,
right? I mean, that camera thing, was it the Vedaka thing recently? Like that was a CI server
that happened to have a script that had access to loads of things. But the amount of Jenkins
servers that are accessible through the internet is shocking. It's not just buckets for sure. It definitely becomes a weird thing. I don't know
if there's a fix here. I really don't, longer term. But instead of looking forward for a minute,
let's go back and visit the past for a bit. You were the founder of the OWASP reporting list.
What is OWASP? Is it a list? I'm most familiar with the OWASP 10, but I'm certain
you'll have a better story on that than I will. Yeah, no, no, no. Top 10 was a small thing.
So I was running software security at Charles Schwab early 2000s, 2001. Before it was kind of
a really big thing. And we used to get vendors coming in trying to sell me products. And honestly,
it was kind of a joke, right? Like my market opened, we'd have 8 million accounts, like
trillion dollars under asset.
And people would come in and try and sell me a web application firewall, which maximum
throughput was like, you know, 0.01% of my market open traffic and things.
But there was nothing out there on the internet to go point to and to say, well, this is good,
right?
It was basically me versus a vendor coming in.
And so I said, right, this is kind of crap,
right? And I got together with a bunch of other people that were also doing similar things,
some other people at some other banks, some other people in other companies. And I said, right,
I'm going to go publish something. And I wrote it over a weekend, literally wrote this guide
called the OWASP guide. And it was basically a set of principles around software security,
like least privilege and nothing sophisticated, but
it was those types of things and published it. And then OWASP was basically born, right? So it's
the open web application security project. Then it got a lot of traction because a lot of people
had signed up and a lot of people were then starting referencing this to build their own
application security programs. Over time, of course, OWASP got very successful. I think it's
like 40,000
people or something like that, that turn up and there's conferences all around the world and
chapters all around the world. And there's lots and lots of projects that have taken place.
One of which is the top 10 that you referenced that a lot of people know of. And the top 10
has been around, I want to say, since like 2004 or something like that. I don't know. I have to
go back and check with history. It's hardly changed since 2004. And you can have a good conversation around
why that is. But yes, that's the history of OWASP. And now, of course, you have this list that
doesn't seem to have changed significantly in a while. I mean, back when I was starting up the
Meanwhile in Security podcast and newsletter with Jesse Trucks, we talked about that being one of
the key problems is everyone wants to know how to handle security and cloud. But if we take a look at how a lot
of application vulnerabilities exist, that list hasn't materially changed. If anything, the advent
of cloud has fixed some security issues in that you're not allowed to muck with them anymore.
Data center physical security is no longer a vector for most folks who are all in on a public cloud provider. But you're also dealing with this other problem of, okay, now it's a list
of enumerated S3 buckets, for example. And if you misconfigure that, it's something that's globally
known. And I guess it removes the security through obscurity argument insofar as there ever was one.
Have things changed in a time of cloud, or is it just the same thing with new labels on it?
Well, I mean, there's a couple of things, right? So you've got to ask yourself,
what is the OWASP top 10 or top 10 of, right? Is it the top 10 most popular issues,
the top 10 most severe issues, the top 10 voted by security people? No one's ever really been
able to get to that apart from an arbitrary top 10. And I don't want to take anything away from
it because the top 10 has been incredibly
useful in getting to developers, giving them a tangible like 10 things, go focus on these
10 things and you'll raise the bar.
So that's kind of piece number one.
But like, you know, has it changed?
Well, should you have expected it to change?
Depends on what you believe it's based on.
If you don't look at them though, like no, like things like injection and broken authentication
and sensitive data exposure,
those things haven't changed because they're just general things and they're going to be around
forever, right? You think sensitive data exposure is going to go, it doesn't matter what technology
we change. It's always going to be there. For me though, what's kind of interesting about it. And
maybe I'm a bit of a skeptic about it is that you can eradicate total classes of problems, I believe
by changing patterns.
So a good example is like, look, if you go use one of these modern development frameworks,
application frameworks, like it's built in inherently and the same with a lot of SQL injection problems that you used to see all over the place, right?
Like you'd have to intentionally go create those problems for the most part now.
And I think the cloud's done the same, right?
It's taken a lot of problems away.
It's extrapolated them into a service.
It's extrapolated them into a pattern. It's extrapolated them into a pattern. And the pattern can then go away. So,
you know, back to the S3 thing, like, I think there's hope, right? Because, you know, if you
can make a change upstream, I mean, you've probably seen recently all these damn, you know,
supply chain attacks, right? Like, and the bad guys are going further and further upstream where
they can affect things downstream. The good news about all of that is if you can figure out upstream the way to go secure it, everything downstream gets
secured as well. So I think with a lot of these things, if we can set up trying to play whack-a-mole
or put the finger in the dike, if we can start thinking about patterns and ways to go solve them
at a class or problem level, then we stand a chance of fixing them. This episode is sponsored in part by Chaos Search.
As basically everyone knows,
trying to do log analytics at scale with an elk stack
is expensive, unstable, time-sucking, demeaning,
and just basically all around horrible.
So why are you still doing it,
or even thinking about it,
when there's Chaos Search?
Chaos Search is a fully managed, scalable log analysis service
that lets you add new workloads in minutes
and easily retain weeks, months, or years of data.
With Chaos Search, you store, connect, and analyze, and you're done.
The data lives and stays within your S3 buckets,
which means no managing servers, no data movement,
and you can save up to 80%
versus running an Elk stack the old-fashioned way. It's why companies like Equifax, HubSpot,
Klarna, AlertLogic, and many more have all turned to Chaos Search. So if you're tired of your Elk
stack falling over before it suffers, or of having your log analytics data retention squeezed by the
cost, then try Chaos Search today
and tell them I sent you.
To learn more, visit chaossearch.io.
I sure hope you're right.
I mean, in an ideal world, you will be,
but it's, I have so much trepidation
around all this,
and I don't know how it's going to wind up playing out.
And I hope that it's going to go well,
but it just feels like it's, you're constantly railing against the tide. And I don't know how to wind up playing out. And I hope that it's going to go well. But it just feels like you're constantly
railing against the tide. And I don't know how to wind up addressing that. I really don't. I wish I
did. Is there anything you can say that helps me be more optimistic about this?
I mean, you're right. Look, I'm no longer in the application security business after spending 15
or 20 years in there, because I just gave up trying to convince developers to care about security.
I just, and I don't blame the developers.
They've got another job to go do and security is too hard.
So, you know, like for me, it was just pushing my lasses uphill.
And I think to your point, like, yeah, why would you expect anything different if we carry on doing the same thing?
And the reality is we're moving faster and faster. We're making it easier and easier to deploy things. We're getting
more and more complex systems. Why would you expect anything different? So yeah, I don't think
you're skeptical for a bad reason. No, for better or worse, we still wind up having these problems.
I don't know how to solve it. I really don't. I mean, look, for the reality, if you go back to the old days, like the old school,
like honestly, I'm a bit of an old person, right?
But you go back to like, you know, some of the military things used to be like trust,
but verify.
That model works incredibly well, right?
You trust people are going to do the right thing, you verify they've done the right thing.
That means you don't hinder the speed, but you go back and check and if anything happens,
you come back and it's like accepting things. One of the other ones around that was like,
it's people processing and tools, people processing technology. And again, technology is never going
to solve the problem of security. It's a people problem. Like, you know, you can't patch stupidity
and all of those phrases. But if someone gives someone access to a local root account or whatever
the thing is, it doesn't matter how many other security controls you've got. I mean, I've seen it in cloud environments, as I'm sure you have. Someone
goes and creates a security group, 0000, so they can happen the thing from home and don't have to
come in and go through all of the other control points. And it's just the way stuff works, right?
So if you have that, if you take that mentality of people, process and technology and trust,
but verify, I think use the right technologies and build the
right process around it, then you could at least manage the risk. The risk is never going to be
zero, but you can at least manage the risk during its acceptable level.
Let's pivot a little bit and talk about the flip side of data security. And that comes down to
privacy. There's been a bunch of regulatory efforts around that. GDPR, for example,
California has its own version of that that's going out. And there's also a growing school
of thought that thinks on some level we're post-privacy. Where do you stand with that?
Yeah, I mean, look, the privacy regulations are raging right now, right? You've got GDPR,
you've got CIPRA, the California one, you've got HIPAA, the Health Information Privacy Protection
Act, and they're all over the world. Japan has them, Australia has them, they're all over the place.
And I think the US now is talking about having a central breach law around privacy data.
The great challenge is, is that we're all becoming a data economy and we're all,
companies are all becoming data companies. And so they want to gather more and more data.
And, you know, the reality I think is that this whole stuff around cookie
consent, I just think it's just nonsense. Like when was the last time you said, hey, I'm not
going to consent to using my cookies. It's kind of like back in the old days when you said, hey,
I'm not going to allow JavaScript to run on my browser, right? Like all of a sudden nothing
works and you're like, oh, I'll succumb. And but then before you know it, data is,
it's been overreached, right? Like you probably saw the Alexa the other day that has the radar so it can watch you sleep in your bed, right? You know, sure, of course,
they're not going to use that data for anything bad. But next time a breach happens or some clever
data science person decides to go to correlate something, I don't know what it might be in the
middle of the night, it happens. So I think what you're starting to see is that you're starting to
see regulators and legal people who don't really understand technology regulating to prevent those bad things happening.
And then technology trying to figure out how to go and meet those regulations, but meeting it with the absolute minimum bar versus trying to figure out what the actual intention is.
And I think you're going to see a bigger and bigger gap.
I mean, look at what happened with third party cookies as an example, right?
Like the whole third party cookie thing.
We saw what was, the cause headers.
We saw anti-cross-site scripting headers because all of those things started happening.
And then, you know, what does everyone do?
They just go call a tracking pixel, right?
And then all the marketing automation tools carry on working as possible.
So, I mean, I think you've got a balance between technology working as intended in certain good use cases, and there are people
using that for their own use cases, which, you know, either break or push over the line of
privacy. I don't know. How do you see it? I think on some level, it's not necessarily
that people care necessarily that some company in the aggregate knows what they're doing. There
are some that do, and I'm not disputing that. But for most of us, I don't necessarily care if Google, for example, knows what I browse on the internet.
I care much more if you personally know what I personally am browsing on the internet. So there's
a question of, once they have that data, do I really care that much about what they do with
an aggregate? Not really. Do I care what they do about individualized basis?
Kind of, yeah.
And do I care if they're making then that individualized data available to third parties?
Absolutely.
Yeah.
It comes down to what the use of that thing is.
Now, I know that I am not going to win friends with that particular argument myself.
And I get it.
In an ideal world, I think that advertising should be something
radically different than it is.
There are advertisements in this podcast, for example,
and they're catering to an audience
that cares about the topics we talk about on this podcast.
But I have no tracking data of who listens to this
other than raw download numbers
and rough GOIP by continent.
It's not something
that is ever going to be attributed, at least from where I sit, to individual listeners,
nor would I want it to be. Yeah, but look, here's why I might be able to convince you otherwise of
that. In China, there is a well-known place called the Beijing Genomics Institute. And the Beijing
Genomics Institute do genetic engineering, and not necessarily for good, right?
So it's not necessarily to find cures for things. It's also for other nefarious purposes.
And the Beijing Genomics Institute acquire DNA data from US hospitals, US healthcare
systems, when you get your blood checked.
Now, that data is supposedly aggregated.
But once you can start pulling apart DNA strands, you can start identifying people at different levels. And I think that's the danger. There's been a lot of cases where de-anonymizing information is possible. And so you're making the assumption that that data is generally de-anonymized and used for the right reasons, but there's been case after case where that's not the case. So maybe you'll change your mind on that, Corey. I don't know.
Maybe. I also, on some level, feel like I'm fighting a losing battle against the tide. Yeah. Yeah. My wife says, aren't you worried about your credit card going missing? And I'm
like, I'm sure it's in many, many databases at this point. I rely on Visa right at that point.
Well, that's also a separate problem too. I mean, this idea of, oh, your identity was stolen because
someone else has opened a credit card in your name or stolen your credit card.
My very honest response to that is, oh, so you weren't cautious about who you decided to lend money to and validate they were the person you thought.
And you're trying to make this my problem because why exactly?
Yeah.
I mean, look, in those cases, that's why it's the corporate's responsibility to deal with those issues, right?
I guess it's the same with social security numbers and that they're out there in so many places
and the internet,
they're pushed around in so many different ways, aren't they?
I think we've got to start moving
into some of these zero trust kind of protocols
and zero knowledge ways
and all of that type of thing in the future.
Indeed.
And I think that there's one thing
that every corporate entity listening to this
or representative of the same can agree on,
and that is they prefer this conversation
to remain hypothetical and aimed at the abstract, not at them right after they've had a data breach, which
of course brings us back to OpenRaven and how it aims at these things. You do have a, at the time
of this recording, it is still upcoming, a paper coming out contrasting what you have built with,
I believe it's Amazon Macy? That's right. Yep. Yep. So when Dave and I founded the company, we went out, like I said,
and we asked everyone, what's the biggest problem? And it was data security. And then when you broke
that down, it broke down into, let me know where my data stores are. So, you know, do I have buckets?
Do I have stuff in RDS? Do I have stuff in file systems, et cetera? What type of data do I have
there? How is that data being protected? Access control and
encryption and all that things, and who has access to it? So it basically broke down to those things.
Those things haven't changed at all. So think of that piece number two as what type of data do I
have as being data classification. Amazon have a service called Macy, which works on S3. So we've
built that feature. Now, lucky for us, as it turned out, you get a few really
good breaks in the startup world, is that Amazon Macy's, it turns out, is not very good and
incredibly expensive and very, very slow. So frankly, the way we market it is cheaper, faster,
and better than Macy. And we believe in transparency of that. Every vendor will say we're
way better than everything, right? So we've kind of done what you would do with a clinical trial
in that we have basically built a,
you know, here's the test,
here's exactly what we're going to test for,
kind of like laying it out in an academic paper.
Here is the data.
So you can go rerun the test yourself
and here are the results.
And we know that we are way, way,
way more accurate than Macy.
We're deployed as Lambda functions
so we can scale up and run much, much faster than Macy.
And then certainly way, way cheaper than Macy.
But that wouldn't surprise you at all in that case, would it?
No.
Even after their massive recent price reduction, it was still, okay, that is in fact still incredibly expensive across the board.
I mean, my argument with the original Macy and its pricing was I had a customer at that point eyeing it and doing some math. And yeah, okay, first month would have been $76 million to run it in their existing stuff,
which was significantly more than at that point their annual AWS bill. So it was, okay, let's go
with option B, which is literally anything except that, and you'll save money. Even a data breach
wouldn't have been that disastrous compared to the pricing story. And now they've cut it to 20% of that, but that's still an eight-figure bill to run these
analytics on their data set.
And that's not tenable.
And on some level, it becomes the differentiated value of doing that isn't there for customers.
If I wound up running all of the various security services that AWS offers on an environment,
it's pretty clear that it would cost more than the data breach would.
Well, it doesn't even work.
Even if the cost thing was put aside, like one of our customers tried it, I think they
spent a million and a half on a trial in a month, and it found 30 first names in a
credit card database.
I mean, it's kind of crazy.
And when you pick it apart underneath the hood, it's a giant regex, essentially, and
just doesn't really, really work. I mean, the reality is of crazy. And when you pick it apart underneath the hood, it's a giant regex, essentially, and just doesn't really, really work.
I mean, the reality is that that thing was built.
It was actually a, it was originally an In-Q-Tel project, which is the funding arm of the US
intelligence agencies.
It was called Harvest.io.
It was an acquisition that they bought in and it was built a long, long time ago.
If you want to do data classification today, you have to be able to not only identify structured, unstructured, and semi-structured
data.
And it comes in all places, right?
And it goes into all file formats.
In S3 buckets, it's parquet files, which are at the back end of lake formation and lake
houses and things like that.
But when you find a piece of data, you've got to be able to go and validate, is that
data real?
I mean, take an AWS API key as an example.
It's very easy to go figure out how to push that thing into that format, but is it a real key? Whereas if you use validators,
go log into an AWS API and you'll get a return that will say, is this a valid key? And which
account is it associated with? And so we've done both in terms of the accuracy of identifying the
information first, the tests that we've got show in general, we are twice or three times more
accurate than basically on finding the initial piece of data. But then we have these validators. So you get a
credit card, go call a credit card API. Is it a real credit card or is it just a 16 digit int?
And you can go check that stuff. Data classification has moved on since that stuff was
there. So even if the pricing thing was fixed, and as you point out, it certainly isn't,
it's just not a good option for people. And then, you know, the kind of second piece to that is that the majority of customers
that we see and people are looking at things like Snowflake.
I mean, if you look at these data platforms, Databricks, CloudAra, Snowflake in particular,
you know, they're built on top of AWS services, but people are moving data to those places.
So it's not just an S3 problem.
As I said, it's about people putting data in Elasticsearch, in RDS, in file systems. The data is everywhere in backups, like all of the stuff gets pushed up into backups and stuff as well. And so you've got to have a service which goes and checks it. We decided to go compete with S3 and beat Macie first, but that's certainly not where the tool and technology is going. Now, for better or worse, it would seem not. Thank you so much for taking the time to
speak with me. If people want to learn more about OpenRaven, what you're doing, how you're doing it,
where can they find you? Yeah, OpenRaven.com is the best place to go. We've also got a pretty
exciting open source tool coming up soon, which is called Magpie. Magpie is a cloud security
posture manager. So think of it as we'll go out and check all of the security settings
on your AWS environment. And so we're releasing that open source around the end of April as well.
So keep an eye out for Magpie. We're taking the core out of OpenRaven that does all of the
discovery across the orgs, pulls back all the attributes or the IAM or the security groups,
and then allows you to go write security rules on top of that, not data rules,
which is what the OpenRaven platform does, but security rules. So also go check that out,
but all linked off of OpenRaven.com. And we'll, of course, put links to that in the show notes.
Wonderful. Thank you so much for taking the time to speak with me today. I really appreciate it.
No, thank you very much, Corey. Much appreciated.
Mark Kerfee, co-founder and chief product officer at OpenRaven.
I'm cloud economist Corey Quinn, and this is Screaming in the Cloud.
If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice.
Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice,
along with a comment enumerating all of the S3 buckets you have inadvertently left open.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business,
and we get to the point.
Visit duckbillgroup.com to get started. This has been a humble pod production
stay humble