Storage Developer Conference - #13: Mobile and Secure - Cloud Encrypted Objects using CDMI
Episode Date: August 1, 2016...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 13.
Today we hear from David Slick, Technical Director with NetApp,
as he presents Mobile and Secure Cloud Encrypted Objects Using CDMI
from the 2015 Storage Developer Conference.
My name is David Slick. I work for NetApp.
And today I'm going to be presenting about encryption security in the cloud.
And I want to thank you for coming.
There's a whole bunch of really interesting other sessions going on in this slot.
I know I was looking through the schedule and saying, oh, I'd want to go to those other sessions myself. So I appreciate you for all being here.
So today we're going to talk about data security in the cloud and some of the challenges and some
of the areas that we really still need to standardize in order to make interoperable
and end-to-end systems. So right now, organizations and individuals, I certainly count myself in this
bucket. I suspect many of you do as well. We're concerned about a bunch of things around the
cloud. The cloud's very compelling. But what happens if the cloud provider's compromised?
If you think about the volume of data in the cloud and the value of the data in that cloud, that makes these cloud providers such a ripe target. I don't even want to think about the individual, institutional,
and government efforts going on right now to compromise and access data in the major cloud
providers. Just such a big target. Huge respect for the security teams of those organizations.
They probably have one of the biggest challenges outside of the governments.
What happens if your cloud account is compromised? Because even if their system is completely secure,
if your credentials slip out, and this is actually really critical because a lot of these cloud systems, it's just a secret key that has to float around.
All your apps have to have this secret key programmed into it.
If your app's compromised, if your disk's compromised, it may not just be what's local to that system that gets compromised.
It may be those credentials that then allow someone to go in and get access to all
the data you have in the cloud. Likewise, as we heard about in the last presentation, if someone
can get into your device when it's running, if they can exploit a buffer overflow, etc., and get
into your running system, they can go and they can look for keys, they can get access to it, and they
can just fool the cloud into thinking the legitimate client's going and accessing the data.
So that's another common path that an attacker can go to compromise
and then escalate their privileges up to gaining access to all of your data in the cloud,
not just the data that resides locally.
And then finally, this is one we've seen once and this
won't be the last time, what happens if the cloud provider goes out of business? The last thing you
want to see is a situation where all the hardware is being sold off to who knows who with your data
on it. You don't know where that's gone. And a lot of regulatory environments and a lot of areas of
concern about the traceability of where this
data goes. It's the new tapes off the back of the truck problem. In all of these scenarios,
these can result in massive data breaches, and they're difficult to detect. This is a liability
that can sit there, you know, sword over your shoulders, over your head.
You never know when it's going to fall.
You never know if it's going to fall.
And that is a very difficult thing to have to operate in,
especially as legal policies and precedents make people way more responsible
for the consequences of being insecure.
For example, in medical arena, which is where I originally came from,
HIPAA states that if you can be proven that your organization is not following best industry
practices for security, you as a company are now liable. So there's a lot of lawyers looking at
that as a target rich environment too. So ultimately, we as an industry
need to explore and look at ways that we can move that bar, raise that bar, to help everyone be more
secure where it's appropriate to protect against these risks. So I'm going to start with talking
about what do we mean when we talk about keys and data security in the cloud?
And I go through a quick taxonomy.
So the first one that we see very often in the cloud,
this is pretty much the state of the art today,
is what we call cloud-managed encryption.
I take my data in plain text.
I send it to the cloud over a secure tunnel.
TLS is the industry standard. It's encrypted over the wire. Someone's
sitting on the wire looking at it. They can't do much. They can do traffic analysis and other
things which are very useful, but they can't actually look at the plain text unless they've
gone and started spoofing certificates and some of these much more difficult attacks that you,
once again, don't tend to see unless it's a government actor.
Once it gets to the cloud, at the other end of that secure tunnel, it goes back into the clear.
They process your data, and they store your data. And, once again, best practices is when they store that data onto disk, that data is encrypted at rest. What that means is if a disk goes bad,
if someone walks out with a disk, etc.,
there isn't a risk there.
This is best practices.
If your cloud provider isn't doing this today,
find another cloud provider.
This is the baseline.
When you have cloud providers that have multiple sites,
let's say spanning multiple geographies,
they have a presence in North America,
they have a presence in Europe, and you're moving data, specifically they're instructing them to move data from cloud location to
cloud location, the cloud provider does the same thing. They send it over a
secure tunnel, TLS tunnel, it gets encrypted, it was decrypted when it went
off the disk, through that tunnel, plain text in the
new location goes back onto disk, encrypted at rest. Then, when someone accesses it in
Europe, goes through another tunnel, this time to their client endpoint, it's secure
on the wire, they get the plain text. So, there's a number of advantages and a number
of disadvantages with this.
The advantages are, this is really simple.
We understand how this works.
We can implement it.
Everybody's been looking at these algorithms.
You know, TLS is probably one of the most tested.
We still find problems.
Look at Heartbleed, etc.
The other thing is, you don't need to change your clients.
Your clients just do standard HTTP or standard S3 protocols, etc.
Your client doesn't have to know about the encryption.
All of this is happening transparently to you.
But it has some pretty significant disadvantages.
The first one is the cloud provider has all the keys. They have the
keys to those sessions for your TLS security. They have the keys that they use to encrypt it on the
disk. So if you're wanting to protect against your cloud provider, against a compromise of the cloud
provider, against someone who's a knowledgeable attacker in that arena, not going to protect you there.
If your cloud is compromised,
they can bypass all your access controls,
they can bypass all the encryption,
and they can bypass all the audit.
The other area is, this is really inefficient.
If we flip back to the diagram,
there's 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 encryption and decryption operations going on here. That's a lot of CPU cycles and a lot of electrical power going. So that's a final aspect
just as an engineer that really rubs me the wrong way. That's a lot of wasted effort. We can do
better. So what's the next level? This is what people are doing that are really paranoid
about security, who are in the kind of, I don't trust anybody. What they do is they take it at
the edge and they encrypt it. They encrypt it with their own key and they don't give their own key
to the cloud. That encrypted data goes out to the cloud. Typically the cloud encrypts it over another secure tunnel
because we just use TLS everywhere.
If you're not using TLS everywhere,
you shouldn't be designing these systems.
But you don't have to because it's already encrypted,
and some systems take advantage of that.
Once it gets to the cloud, it's already encrypted.
They don't need to encrypt it a second time to store it at rest.
You know, once again, common systems, it probably gets encrypted again. Now it's doubly encrypted,
one with the cloud provider's key, one with your key. When it gets moved to the next site,
it's still encrypted with your key. And when it goes to the edge, when someone requests it,
it's still ciphertext. It's ciphertext with your key, and the client has to get that key.
And that is the big drawback with this system.
The advantages, keys remain private, cloud provider never sees the keys.
Even if the cloud's compromised, they don't have your keys,
so therefore you're not compromised.
You can assume the cloud's public.
You can just post your data wherever.
If your algorithm and your key length and your implementation
are secure to the levels that we understand with cryptographic algorithms,
best practices today,
there should be no reason why you shouldn't be willing to post your ciphertext on the front page of the New York Times.
That's the whole point. If you have to hide it, if your security comes from obscurity, you're not doing it right.
And this is, they talked a little bit about this in the last session, white box versus black box.
Good security, the knowledge of the white box, does not make your system less secure.
So this is the most efficient if you're running your cloud in that mode,
because you don't need to do all these encrypt, decrypt, encrypt, decrypt.
And you can get audit, because every time a client at that endpoint needs to access that data,
they have to get your key.
Well, who are they going to get your key from?
You. You're the only person who has it. So therefore, you have a one-to-one relationship
in that anytime someone accesses your data, you had to hand them the key. And when you get to
hand them the key, when you verify, you know who you're handing the key and that you trust them,
and you're going to keep a record of it. So now all of a sudden, you have that first access audit,
which is not something you could guarantee in the previous environments.
Well, why doesn't everybody do this?
Well, it's hugely complex.
I don't know how many people in the room have been involved in PKI deployments,
which was probably our last attempt to do this at scale.
Didn't go very well.
Unfortunately, we have two things working against us.
First of all, complex things are more expensive
and tend to do less,
especially when you're laying in a lot of security.
The second thing is complex things
tend to have more security holes.
So you ultimately get to a system,
just like any software system,
as the complexity of the system approaches
an equilibrium where introducing fixes causes more bugs, it's the same with security.
As your complexity gets to a certain level, introducing more security introduces more security vulnerabilities, and you'll reach the stability level.
So the ultimate lesson is, don't make things more complicated than you absolutely
can. It's not done until it's as simple as possible and no more simple. And that's my
philosophy because that works great for bugs and for security. So these edge systems have to be
involved in and participate with key management. That's a big deal. We really have problems here
and we are only just starting to get
standards for dealing with managing these keys, but how are we going to get these out
to all these edge clients? How are we going to do that in a secure way? How are we going
to do that in a cloud environment? We've got a big gap.
The other thing is that a lot of people put things into the cloud not just to use it as
a dumb conduit to get from A to B,
but because they want to take advantage of all the compute capabilities
and all the services offered by the cloud.
If you're encrypting everything and keeping all the keys and storing it up into the cloud,
you can't take advantage of a lot of these services anymore.
How could the cloud do virus scanning for you if everything's ciphertext?
The cloud needs to look inside your objects in order to see if there's a virus there.
How could the cloud do big data analytics?
How could the cloud do transcoding, processing, analysis, all these different things we love about the cloud?
I can take 10,000 instances and spin them up for an hour and do something across all my data.
Well, not if it's all encrypted.
So hybrid edge cloud key management is where you take the data,
you encrypt it with your key, you send it off into the cloud.
The cloud stores it encrypted as you've encrypted.
But then if you want the cloud to do something with your data,
whether that's, say, on an EC2 instance that you control, you can share your keys to the data you
want processed in the cloud with that instance, and then that cloud can go and access the plain
text under your control. Other providers, such as as Dropbox have provided mechanisms by which they
can integrate with your enterprise key manager. So when someone goes to access an object,
they call back to you and say, give me the key. That means you can cut off at any time and you've
got that audit, etc. So this sort of hybrid thing, this isn't theoretical. All of these three models are being done today.
So what are the advantages here?
Well, with the hybrid, the edge, right?
You own the keys, you manage the keys.
The cloud can still get access to them if and when needed.
You, the edge, make those decisions about access control. Clients can access plain
text or ciphertext, depending on the scenario. You can revoke access. Uh-oh, they just called
up. There's a compromise. Shut down that connection. No one can get the keys anymore. But what
are some of the disadvantages? Well, we're back to the scenario where you need to trust
the cloud, which in some
cases, you know, value, risk value,
analysis. There's lots of times where the value
is a lot more than the risk, and you can architect
things to make that more secure.
And you need some sort of
a protocol for how
does the cloud talk to
your key server? We've seen a couple
proprietary mechanisms. Dropbox has one, as I mentioned. AWS
has one now. No standardization going on here.
So what are our gaps? There's two big gaps.
How do I securely get keys from the source
to the edge in a cloud world? And how do I
securely get keys from the source
to a cloud provider in these scenarios?
One we've tried and haven't been able to scale.
The second one we're just starting to do,
but there's a bunch of proprietary, non-interoperable ways to do this.
My philosophy with standards is,
if no one's doing it, you're inventing something new.
Don't standardize. Standards should never invent things.
The second philosophy is if everybody's doing it in a consistent way and this is widespread, it's too late.
Now don't try to standardize it then.
You're going to end up having something that's already there become a standard.
So when is the most value out of standards?
Right between that time when people say, yeah, there's a lot of value.
We're not 100% sure how to do this, but we all know we want to do it.
And not very many people have it implemented and fully deployed yet.
So we're in that stage.
We're in that stage where this is becoming an interesting problem that's ripe for standardization. So what do we need to standardize to make this be able to be fully interoperable
and to allow these sorts of use cases to be widely deployed? Well, let's talk about the actors first.
We have the data source, the person who's owning and controlling
and originates that data.
We've defined that they're going to want to keep
ownership of the keys.
Of course, you could have more complex situations
where I delegate that to a third party, etc.
But let's just draw a box around where the data comes from
and say that includes the data and the key management.
I have my internal filers,
I have my KMIPS server, I have my cloud gateway, and what I'm doing is I'm taking each object and
when I go to put it in the cloud, I go through my cloud gateway, I generate a key, I encrypt it,
I register it, and send it up in the cloud. There we are. Then we have, as another actor, the data client.
These are people out there with their mobile phones,
their laptops, other enterprise systems,
other businesses that are sharing data.
This crosses administrative jurisdictions.
This has devices that float around and get lost and stolen,
employees leaving the company, other organizations.
There's a huge range of these, and you should include in this,
your clients may include people attacking you.
There's what we call a blind cloud.
This is a cloud that, by virtue of how it's being used.
Cannot see into what you are storing.
And then there's a trusted cloud.
That can see what you're storing.
And the reason why we emphasize trusted cloud.
Because ultimately once it is normal.
For them to decrypt your data and see the contents, you have to have some degree of trust.
Once they can get this plain text, there's not really much you can do from a technology standpoint.
I did about five years of work in DRM.
That is always a losing battle.
Once they've got the keys, they've got the plain text, they've got the data. So there has to
be some degree of trust. No technology will ever get rid completely of the need for trust. And we
see this especially in the healthcare industry. The only reason why the healthcare industry works
is because there's some degree of trust. A lot of times there's too much trust. A lot of times
there's misplaced trust.
But we can never solve the trust problem with technology. Trust is always the foundation.
So I'm going to have some degree of trusted providers, but I want to be able to have untrusted ones too. So what sort of threats? We need to consider the threat of the network-borne attacker,
the AT&T fiber T that sends all my data going between sites off to somewhere else.
The person sitting in the network, plugged into Ethernet,
watching traffic they shouldn't really be looking at.
The compromised device in your enterprise that may be doing the same.
Second threat, compromise clouds.
It hasn't happened at a large scale. It's only a matter of time. For all we know, certain government agencies are happily using the clouds to collect information on us. Data clients. This is often
the way in. If you compromise the client, you can get the keys.
You can get into the cloud account.
This is an area where I think there's a huge risk.
If I was doing architecture for cloud providers, I wouldn't have the tokens being just a secret key.
Anyone with this magic number can get in.
It's too easy to copy.
I'd have a much more complicated way of storing
that. But once again, remember what I said earlier about complexity. I have to be careful I don't
introduce more security vulnerabilities there. And then finally, people will attack you at the
source. If you have your data all in the plane, if your key management system is attackable. That becomes the treasure trove.
All keys do is concentrate your insecurity.
Before, all your data was insecure.
It was kind of diffuse value.
When you encrypt everything and have everything with keys
and put them all in one place,
that's going to become the highest value target
because if everyone can compromise your key management system, they got the keys to the kingdom. So lots of different things we have to
worry about. So let's start with the data source. From a standardization standpoint, right now we
have no way of encrypting the data in an interoperable way. If I encrypt the data, I can encrypt the
data in tons of different ways, and I can store it in the cloud in ways that the client then has
to know about, the cloud then has to know about. So the number one thing is if we don't have an
interoperable, consistent way just to package this encrypted data, deal with all the little
things like what algorithm is it
using? What key length is it using? How is the data laid out? How is it encoded? How does it even
identify that it's an encrypted object versus random data, which if it's encrypted, it should
look like? There's a couple standards out there. How many folks in the audience are familiar with
the CMS standard, which came out of the SMIME work.
So there's an example of a standard that's been fairly widely deployed. It's unfortunately,
when you start digging into it, it goes all the way back into the X series of protocols,
and it's not very well defined. But it's a standard, and it's interoperable, and it's out there working in the field. So that's one way we could do it. We can use CMS envelopes, and then we can store those into the cloud.
There's some new approaches for this as well.
Microsoft and a number of other folks have been putting together a set of standards.
These are now IETF RFCs called JOIST.
And this is JavaScript security signing and encryption.
And it's a way to represent signed and encrypted data in JavaScript.
So here's another way, potentially, to say if you want to have interoperable encrypted data flowing through your clouds,
follow these RFCs.
And then the client can discover and do the right thing.
You know how to enter and you
can have system to system interoperability. So that's really important because we want to have
a goal that clouds be that your objects be portable across clouds. We don't want to have
it such that there's, you know, an S3 way of doing things and an Azure way of doing things
to really have the value of clouds,
you want to have the data representation be interoperable. And this is actually not as
hard as it sounds, because one of the things that cloud providers do is they guarantee
that they're not going to go into the contents of the data you store and change it. So if
I decide I'm going to use JOSC to store encrypted objects,
Amazon's not going to change it, Microsoft's not going to change it,
and therefore, as long as me as a data source and my clients all expect this
and all support it as a standard, we're going to have this interoperability.
Whether that's originally gone into S3 and then mirrored over to Azure
and then S3's dropped, etc.,
that data representation found inside every object that's encrypted remains constant.
So that's important.
The second thing is that I want to note that while the rest of this presentation
is going to talk about how we're looking at adding this to CDMI,
all the concepts that I'm presenting here are generic and applicable to any cloud system out there.
So this isn't CDMI specific.
One of the goals of CDMI,
CDMI just for those who aren't familiar with,
is the Cloud Data Management Interface.
It's a management-centric standard
that overlays on top of your LUNs, your file systems, NFS, CIFS shares, your objects.
It's designed to work with all these object stores.
So this isn't something that's just designed to be specific to anyone.
So the other area we need to standardize, right, we've got a, a standard way to store this is how do I get the
keys? I'm in the edge, how do I get that access key? And in the cloud world, this is a challenge
because it may be portable, it may be constantly dynamically changing, you may have lots of
different clouds. This particular standardization work came out of an EU electronic health records
initiative where you're looking at tying together all the EU
countries and providing access to out-of-EU infrastructure. We're talking potentially
thousands of different clouds all working together. So you can't feasibly have a system
where every client is going to be able to talk to every cloud. But in an electronic medical
record, your data can be spread across dozens, maybe even hundreds of different clouds. So
you can't have the every to every. You have to have some sort of cloud resident, cloud
mediated system for key access. Otherwise, it becomes impractical. And finally, audit, because audit's
the key. Especially in the healthcare world, all the trust is bound together with audit.
And that's what gives you the confidence that your system's working and that your trust
indeed is worthy of the trust you're giving it. So, for the data client, the flip side
of how can I provide access to the keys is how can I get access to the keys.
Downloading the encrypted data is easy, right?
We have the standard representation.
When I go to get an object, I do a get.
I get the plain text.
It comes down.
Now I need a mechanism to get the keys.
And how is audit requested and managed?
Because ultimately, you want audit to extend past the,
I gave you the key.
You want to be more sure, and I say more sure
because we're on the client now.
This client may be compromised.
This client may be altered.
You want to be more sure that the client's doing the right thing
and dropping the key according to the policy
and not passing the key on to other people, etc.
So, for the trusted cloud,
the trusted cloud role is very similar to the role of an edge
because it's acting as an edge.
It is getting your keys and then being able to do access with your data.
It needs to be able to request the keys.
It needs to be able to allow a trusted client access to both plain text and ciphertext.
And, once again, audit.
So, very similar problem set.
So, what are some of the common themes?
We need an over-the-wire, self-describing, encrypted object format.
Luckily, there's a number to pick from.
We need a way to distinguish plain text versus ciphertext requests. So if I'm
a client, sometimes I want the plain text, sometimes I want the ciphertext. The cloud has to know the
difference. Because if I ask for the ciphertext, the cloud's going to do something very different.
It's going to say, yeah, you look like you're legit. Here's the ciphertext. But if I ask for
the plain text, the cloud's going
to be like, oh, I'm going to maybe pass your information on to the source along with the
request information, and it's going to do a lot more validation of who you are before it hands
the key back to the cloud provider that then decrypts the data on behalf of you and sends you
the plain text. And this is something people actually want to do,
because sometimes they trust the cloud more than they trust the endpoint.
And by revealing the key to the cloud upon request,
rather than revealing the key all the way to the endpoint,
the cloud's able to do that decryption, and the key's never revealed to the end, the user.
So you really have to think about where you want those keys going. And just
as another aside, whether you're doing key per object or key for
groups of objects also matters here. Because if you have a key for
every object and someone leaks, discloses a key,
you've compromised one object. But if you use
a key for a whole bunch of your objects
and that key leaks out or is disclosed,
they now have access to a whole bunch of objects.
So the trend we've seen is moving towards key per object,
but from a scalability standpoint,
we're having trouble just managing keys
for a bunch of encrypted hard drives and tapes,
let alone keys for our millions to billions of files and objects.
So it's a challenge.
It's a scaling challenge.
But I think it's ultimately one we're going to have to do.
And there's folks out there who have solved that problem.
Okay.
So how do you securely request a key?
How do you securely receive a key?
How do you specify what events need to be audited?
And how do you report back those audited events?
Those are the key parts of a standard.
So if you had these things standardized,
this would make it far easier to implement these edge-based encrypted transfers
that really move us to the next level in terms of being able to securely use clouds.
So we're taking a stab at this inside the SNEA Cloud Technical Working Group. We're working
with a number of different groups, some healthcare groups out of the EU, some groups here in the US
that have been researching this, a number of vendors, on what's called the encrypted object CDMI extension.
And this is ultimately a set of guidelines saying,
we recommend, if you're using CDMI,
and you want to store an object that's encrypted into a CDMI cloud,
here's how we recommend you do it.
And this includes a bunch of conventions.
What MIME type should you use?
What's the content?
We haven't yet decided on CMS versus JOSC.
There's a lot of advantages to both,
and this is exactly the sort of debates that go on inside a standards organization.
How do we deal with metadata?
Because you may encrypt the contents
but if the metadata says patient name, XYZ
date of birth, XYZ, diagnosis, fatal
that can be just as important to protect
as the actual content, some lab report
that shows why it's fatal.
So you can't just look at the data, you have to look at the metadata as well.
So that's just encrypted objects.
The meat of the problem is what we call delegated access control.
Right now, if I go and I do a get from the cloud,
the cloud makes an access control decision.
It looks at my credentials, who I say I am, how I prove
I am who I am, and says, yes, no. And then if it says yes, lets me in. If it says no,
it doesn't, and hopefully does a little bit more than that. With delegated access control,
the concept is that the cloud can delegate that access control decision to someone else.
That delegation can be to another cloud.
That delegation can be all the way back to the source of the data.
And this is key for two reasons.
First of all, it's at that access control decision point
where it's logical to decide if or if not to reveal the key. Who has the key? In this case,
only the originating source. So therefore, you need to delegate that all the way back to the
originating location. But because the cloud's involved now, we don't need every edge client
knowing about every source. I just talk to my cloud. I don't have to know that
it's coming from hospital A or hospital B and country C or country D or from somewhere completely
different. I just talk to my cloud and I do that in a standard and interoperable way. And the cloud
knows how to delegate and it can talk to hospital A and hospital B and country C and country D and
all the different organizations.
So all of a sudden now we've reduced a many-to-many problem into a many-to-one problem and a one-to-many problem.
And that's a lot easier and a lot simpler.
So what do you need to do to be able to have delegated access control?
Well, first of all, you need some. To attach to an object. To say if you want.
To make an access control decision.
About this object.
Here's where you need to delegate it to.
You need a source.
Locate a pointer.
Back to where.
Who can actually make that decision.
You also need.
Metadata.
To tell it how to make that decision.
Where to go to.
Etc. How to manage this process.
You need a way to provide plain text and key access,
i.e., I as a client may say,
give me the ciphertext, give me the plain text, give me the key.
These are three different operations
that need slightly different processing on the cloud side.
You need a way to redirect
because some people are so paranoid about some data,
they don't even want to give you the key
that's used for that object all through the cloud.
They want it that when you ask for the key,
they're like, ooh, you're asking for a key for A.
Well, let me make A' with a key just for you, and then redirect you to A' and give you the key for A'.
Now they've gone from key per object to key per access.
That's another use case people have.
If they really want to make sure that if you leak the key, you aren't able to do any more damage.
And this often is also used for the purposes of watermarking.
A lot of DRM systems use this or similar concepts to actually provide traceability.
So if someone leaks something, they know you can trace it back to who made that original request. Oh, so that movie floating
around on BitTorrent,
that was bought by so-and-so.
A lot of companies still
embed that sort of purchase information
in media
assets so that they
have that traceability.
Okay, so we have
redirection. You need to deal with key
expiry and caching controls.
They're just best practices.
If I'm an edge client and I got the key, there should be information to say,
you know, just keep the key for two minutes and then purge it.
That's your policy for managing how caching is done in expiry.
And then finally there's audit.
What should be audited?
Do I need an audit each time they view it? Do I need an audit
of what they view? Do I need an audit to tell the
system that that key was deleted? Once again, because
you never can fully trust your edge systems, this
is advisory, but it's very valuable still.
Because you do have a degree have a better degree of assurance
that the right thing happened.
Okay, so that's what we're looking at standardizing.
So let's walk through a couple examples.
So here's an example of a client requesting ciphertext
of an encrypted object for the cloud.
So client sends to the cloud,
get this object. Contents of that object are encrypted,
standardized the way we talked about earlier. The cloud goes and says, okay, I'm
looking at the credentials and the information
provided by the client. I look at their identity. I validate their identity.
I go and I compare that against the ACLs. Yes, they're allowed to get the data.
I'm going to send it back to them. Ciphertext delivered. That's one
of the most basic scenarios using the encrypted
object standardization. Let's get a little more complicated.
So, I want to get the ciphertext and I want to get
the key.
This is the most common scenario.
So, I as an edge send a request to the cloud.
And I include two pieces of information.
This may be all encapsulated in one set of information.
But I provide what I have to prove to the cloud that the cloud should listen to me.
I also include information that should prove that I'm worthy of having the key disclosed to me.
And finally, because that key is going to travel from the source of the key to me,
I need a way to allow that traversal, because the information is flowing through the cloud, to be such that I'm the only one that can read that key information. So typically
the way this is done is I would include my public key in the request
such that
when the key is encrypted with my public key,
I'm the only one that can decrypt it.
And of course, you need to do various things
to make sure that someone doesn't substitute
somebody else's public key
and protect against man-in-the-middle attacks.
This stuff always becomes tricky
when you're looking at cryptographic protocols
as opposed to algorithms, which are even more tricky.
So the cloud gets this request and says,
oh, I cannot make an access control decision to allow access to this data,
and I don't even have the key to disclose.
I've got to forward it on.
So the cloud looks at where that request has to be routed.
It might be through another cloud or direct.
Packages up a request for that access control operation and embeds the information the client provided.
Then forwards that information to the originating source.
The originating source can then unwrap that package, that request,
verify that it actually came from a cloud that it trusts,
take the information about the user and say,
yes, I'm going to grant access in the case of healthcare applications,
the business arrangement, the trust relationship might be.
If they're a physician, if they can attest that they're a physician,
and there's a clinical medical need to access this record,
I'm going to give you access.
So the source is able to look at that.
You know, Hospital X is attesting that their employee,
such and such is indeed a physician,
not an administrator, not a clerical staff, not a visitor,
a physician, and they need clinical access. Okay, I have a chain of trust in my business
relationships through between the organizations. I'm going to trust that relationship. And okay,
now I can go to my key management system. I can check out the key. I can then package that key up such
that it's encrypted for that specific requesting and accessing client. I can return that response
to the access control request, which has two things. Yes or no, and if yes, here's the
key. The cloud can then send the ciphertext and the key
protected once again for that client's access to the client.
The client can then unwrap the key and decrypt the data.
So that's a little bit more complicated, but we've solved
a big problem in now that clients can access this data
wherever it is, floating between clouds
without having to care and know
about where it came from.
So, let's take this a little further.
What happens if I'm a client and I just
want to get the plain text? I don't want to have
to deal with all this key management stuff,
but I still want this delegation.
Well, once again, instead of asking
for plain text, I ask for...
So instead of asking for ciphertext, I ask for plain text.
The cloud goes through the exact same process we talked about just before.
Delegations required.
Package up a request.
Include the client detail.
But instead of putting the client's information for disclosing it, the cloud puts its own information for disclosing it.
Instead of saying, I want you to give a key that's going to go to this endpoint, I want you to give a key that's going to go to this endpoint,
I want you to give a key that's going to go to me, to the cloud, to do an operation.
Goes out to the remote system.
The remote system looks at it.
Okay, yep, that's a physician.
I can trust that.
I'm going to go and allow this.
Packages up the key.
Response goes to the cloud.
Cloud unwraps that key. cloud does the decryption,
plain text goes to the end user, unmodified client.
Once again, because the cloud's doing it,
it can then honor the caching and key purging
and audit requirements as requested by the source.
So, what happens if you want to do key per request with a redirect?
Everything happens except
when the originating data source gets the request,
it creates a new object, that A prime,
encrypted with a new key,
and it returns that new key and a redirect.
That redirect goes back to the cloud, back to the client.
The client says, I've got a key, but I don't have any ciphertext,
but I have instructions on where to get the ciphertext.
The client then makes another get to the cloud for A prime,
and that data flows down, and we've used our same framework for solving this problem.
And finally,
what if they want the plain text and its
key per object? As we looked earlier, the request goes all the way
to press for plain text, request goes back to the source, source creates A prime,
new key, sends that back, the cloud
goes and unwraps the key, the cloud says, oh, I need to go and get
A prime, the cloud gets A prime, decrypts it, and passes
the plain text. So these two features, redirection
and cloud versus edge access,
can work together.
So to summarize this up, why is this important?
Why are people excited about this?
Why would we encourage yourselves, if you're interested in this, to get involved?
Well, ultimately, edge systems can use clouds
as untrusted repositories.
I really do like the model
where I can let the cloud
worry about keeping my data
and making my data accessible,
but I don't have to worry
about the cloud deciding
who can access my data
and protecting my data against all these
different faults that can result in security breaches. Just as an engineer, that feels a lot
more architecturally elegant to move, just like keys concentrate your protection, your security,
this sort of delegation of access control decisions,
also concentrate and eliminate a lot of potential threats.
So, second reason.
You can have trusted edge clouds with untrusted clouds.
If the U.S. government ran a cloud for data storage,
would you want to put all your plain text in it?
I mean, I'm Canadian, so I trust my government.
I probably shouldn't, but I've been down in the U.S. for too long.
But if you could have your own personal cloud
that federated with that GovCloud,
and everything that went in there was encrypted,
all of a sudden now this becomes a lot more viable model.
And this sort of thing's happening. Government clouds
are being built for medical record data, because we need
interchange between all these different health providers. So it's really
important to bake in security at these sorts of levels.
And so you don't have to trust the intermediary. Because I know
that for myself, if I don't even have to trust
the cloud, it's a lot easier decision to use it.
If I have to trust that cloud, all of a sudden
now, I may be at risk.
Third item.
This allows the clients in the clouds
to be abstracted from the provider.
And this is a huge area.
This is where standards work the best.
When systems talk to each other,
this is the low-hanging fruit for standardization.
You can allow access control decisions
for distributed cloud operations to be locally controlled. So I can concentrate all those decisions, yes, no, back to me or back to
my delegue. And you can now start to collect all these audit decisions. Operation happens on
someone's iPhone. Operation happens to someone's enterprise client. Operation happens in cloud A, in cloud B, in an EC2 instance, in cloud C, etc.
All those audit messages can start to flow to you.
And standardizing that audit collection is also really, really important.
Because audit is the underpinning of most of the legal frameworks around being able to demonstrate security.
So how can we get involved?
If you're interested, we have an encrypted object and delegated access control.
Extensions are currently in draft, published up on the web for public review.
And we have a technical working group as part of the cloud storage technical working group.
So if any of you here are members of SNEA, you can just join. If you're not members of SNEA,
I'd recommend you consider joining. It's relatively easy to do. We're a very accessible
standards organization. And we have weekly calls and bi-monthly face-to-face meetings where we go through and continue to iterate and improve
on the various extensions.
We're designing everything such that,
although this is specific to CDMI,
the implementation and the standardization
is not specific to CDMI.
So this could just as easily be added to Swift or S3, etc.
Standards work best when lots of people implement them.
So we're not just, you know, we're the cloud technical working group. We're not the CDMI
technical working group. So we do Swift work, we do S3 work as well. So if you're interested in
how this technology can apply to different clouds, please get involved. And we also have plug fests
for system and interoperability testing. And we're actually having a cloud plug fest get involved. And we also have plug fests for system and interoperability testing.
And we're actually having a cloud plug fest here today. For example, we're testing a deduplication
extension for cloud client data deduplication for efficient transfers. So what's going to happen
with respect to this encryption and delegated access control work is as people are interested and as people implement this, it becomes more finalized.
We don't actually add anything to the standard until we have at least an implementation, and we don't merge it in formally to the standard.
It's just an annex before this,
until we have two implementations that demonstrate interoperability.
So we're not there yet.
We have folks working on proving out and testing this technology.
If it's something you're interested in,
please don't hesitate to get involved and drop us a note.
So right on time.
Thank you and questions?
The CDMI extension you're talking about
that they support all five models?
That's correct.
That's correct. The
draft version
that is publicly on the web,
I don't believe it
talks about the model
where you have key per access,
but all the plumbing's there.
CDMI has redirects.
We have fields where that data can be sent over,
and the next draft's going to have that explicitly called out
saying, here's a redirect field.
David, you mentioned the health care system.
Yep.
What do you think is one of the biggest challenges facing health care?
There's so many local, state, government regulations,
and then you're talking about integrating these different associations of health.
What do you do with the security challenge?
Healthcare is a really complicated and difficult IT security challenge.
They have some of the worst operational security of any organization I have ever been involved in.
And that's because technically they don't care about security.
Security gets in the way and security kills people.
So if you go into a hospital, the doors aren't locked.
Because if a door is locked, someone might die if someone needs to get through that door.
And that's their attitude. Everything related to security in healthcare organizations is based on don't get in the way.
And this extends all the way to computers.
Then you have the fact that computers are largely an afterthought in the healthcare organization.
Once again, don't get in the way. So as a consequence, you have environments that have unpatched old systems
sitting on wide open networks with, it's just a mess. People are trying to improve it and there's
a lot of best practices, but you see such a wide spectrum. A lot of times, the only thing preventing
a compromise is that a lot of people don't target healthcare organizations because that's just low.
They're there to help you. Don't pick on them.
I'd rather not depend on the honor of hackers for security, but it's a huge problem. Just to emphasize, I have never seen worse security.
Maybe some of the industrial control folks running our power plants and dams and that sort of stuff,
they have pretty bad security too, but it is a huge problem. And building countrywide and
cross-country integrated electronic medical record systems is one of the biggest challenges of IT systems.
We're going to see a lot of spectacular failures before we finally get this right.
They're trying to build something of the complexity of the web.
So from my standpoint, if we can help provide some building blocks
that allow them to easily do it in a more secure way.
That's how we help improve security.
Why are all our emails not encrypted going across the Internet?
It's because there wasn't standards and tools available to make it transparent.
If we had had that infrastructure back in the early day,
I think we would have seen things a lot more secure.
You're good. Question?
Yes, so there's about 30 different implementations, combination of commercial vendors, open source, academic.
There's a bunch of companies that are using it internally.
Like, for example, some of this healthcare work being done in Europe, they're using
it internally for data management and exchange.
SNEA, we have a webpage that lists various vendors that we know about.
Now once again, I want to emphasize CDMI is a management
protocol. So a lot of these vendors will support S3,
SWIFT, CDMI, LUN, CIFS, NFS, etc.
Where's the key manager?
So typically the key manager is owned and run by the organization that owns the data and is charged
with securing it. So if I'm a hospital, the key management system would sit inside that hospital.
If I'm a group of hospitals that has a centralized data center model,
it would sit in one of those centralized data centers.
And does KMIP play a role in this?
Yes. So KMIP is, in my opinion, the preferred key management standard
that people should be using as the key management role in these standards.
And, you know, this isn't just a problem with organizations.
I have this problem personally.
I have data at my house.
I have data on my devices.
I'd love to be able to run a cloud and have my own keys, and if I want to access a document from my house, I
can go with my phone, I can go talk to Amazon and whoever public cloud provider I use, and
have those key access tunneled through the cloud such that I don't have to punch a hole
in my perimeter for any device to go and ask for a key, this would be something that I think is even viable
for sync and share and home private cloud type applications.
So we see this pattern way more than just medical records.
Well, thank you very much for coming.
Thanks for listening if you have questions about the material presented in this podcast
be sure and join our developers mailing list
by sending an email to developers-subscribe
at sneha.org
here you can ask questions and discuss this topic further
with your peers in the developer community
for additional information about the storage developer conference