Storage Developer Conference - #5: Object Drives: A new Architectural Partitioning

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast. You are listening to SDC Podcast Episode 5. Today we hear from me as I present on Object Drives, A New Architectural Partitioning from the 2015 Storage Developer Conference.

Starting point is 00:00:43 I'm Mark Carlson. I'm with Toshiba. I'm also the co-chair of the Object Drive Technical Work Group in SNIA. And that's the talk today. It's an official SNIA tutorial. As a SNIA tutorial, there's a bunch of legalese. Yeah. Oh, great. there's a bunch of legalese, abstract, you probably read this in order to come here. So I want to just jump right in, what are object drives, some of them have key value semantics, sort of an object store among others. In some cases there's actually hosted software down on the

Starting point is 00:01:36 drive and we'll talk about that. But the key thing is instead of using transport-based protocol like SCSI, we're talking about more of a network-based protocol such as IP, TCP, IP, HTTP. And the actual channel interconnect moves to an Ethernet network. If you're interested in getting, you know, helping us with this work, here's the URL to the actual work group. You need to be a SNEA member, of course, but that's easily fixed. A little bit of money. So that's what we're going to talk about today. An example of the key value type of interface is kinetic. There was a kinetic birds of feather session in here last night. The metadata is not really part of the value if you will by higher level

Starting point is 00:02:50 software right so the the key value interface is not expected to be used directly by end-user applications and and this is similar to the existing block interfaces that we have in that respect. You're going to have something sitting in front of that key value interface. It could be a database, for example. The semantics map very well to something like a Cassandra database. There's a big Cassandra conference going on here, if you haven't noticed. And then there's higher level examples of object drives as well.

Starting point is 00:03:28 CDMI is the Cloud Data Management Interface. There's been several talks on that this week. In the case of a higher-level interface, it does include metadata, and it is intended to be used by end-user applications and is used by end user applications. So you don't need a database in front of it. You don't need a separate file system or other sort of higher level objects. But because of this level of abstraction,

Starting point is 00:03:57 the interesting thing is that there's many more degrees of freedom in the implementations that are behind those abstractions, right? The hosts don't need to manage the mapping of objects to a block storage device and an object drive may manage how it places objects in the media. In fact there may be several kinds of media in the object drive. You might have some flash, You might have some regular disk drive media. You might have shingled media, right? And so the drive itself is able to do a lot of things

Starting point is 00:04:35 that current drives don't do, not necessarily because of the interface, but because of this abstraction concept. So what is driving the market for object drives? Well, there are a number of scale-out solutions out there that are meant to scale out, not up, right? And so incrementally you add another node and another node. When you do so, you're not just adding capacity, you're

Starting point is 00:05:05 adding performance. You're adding the ability to have objects in multiple places so that increases the availability, etc. Some of the examples of that include scale-out file systems as well as things like Ceph and Swift, which are all open source. But there's commercial examples as well. Scality is one, Cleversafe is another. You probably think of a few examples yourself. But these storage nodes are not typically at the drive level. The storage nodes are at the level of a box,

Starting point is 00:05:43 typically a server-type box or a tray that fits in a rack. And those drives, the existing SCSI or SATA drives, are usually front-ended by some sort of service software that runs down there as a storage node. So who would buy object drives? Well, certainly system vendors and integrators, just like today with SCSI drives, SATA drives. But what the abstraction I'm talking about does is it really simplifies the software stack. No longer do you need a volume manager, right?

Starting point is 00:06:23 No longer do you need a file system in order to take advantage of that. So the hyperscale data centers, certainly they're all about cost, right? Driving the cost out of the components that they buy and expected that this kind of a drive is just going to be as much a commodity as the current drives are. And they're not willing to pay anything for software either. They're all looking for open source software. And the key thing is when you start doing that, you need to hire developers, right? And those developers can then use this new kind of interface.

Starting point is 00:07:05 They're not locked in to anything that's come before. And then a lot of enterprising IT shops are also trying to follow this hyperscale model. They're actually hiring developers. They're using open source. They're trying to get commodity-cheap stuff in there. I don't know how many folks watched Gleb Budman's talk this morning, but his whole thing is cost, right? He doesn't

Starting point is 00:07:33 want to pay too much for a screw or a connector, right? So this is appealing. All these customers here are appealing. There's other customers, but it's not really something that you would see a consumer buy necessarily unless he was putting together his own NAS box for example. So there are some issues with the current sort of server node server based nodes you're going to use a commodity server the direct attached storage you're going to have CPU and networking there those have to be properly sized for the number of drives you have behind there and the kind of throughput and performance that you want for those storage nodes. And so they do use today

Starting point is 00:08:27 commodity servers and therefore they consume more power and more complex to manage than what you might buy from an array vendor let's say. And though they are less expensive to acquire they still require higher long-term ownership. So it's programmers that are using that open source software don't come free, right? So, you know, it could be a win for you overall versus buying a high-margin storage array,

Starting point is 00:08:59 but it's not necessarily cheaper in the long term. So how do object drives kind of move this forward? Well, the CPU power is now moved to the drive, right? And the idea there is you can optimize it to the task, which is IO, and it doesn't need to be general purpose in the case of something like a key value drive for example. And this is similar to what happened with dumb cell phone processors. They got bigger and bigger. I saw an article today that a single core iPhone 6 plus beats a single core of the latest

Starting point is 00:09:40 Mac Power. Now of course Mac Power has more cores but those processors on cell phones have gotten bigger and bigger over time of the latest Mac Power book. Now, of course, Mac Power has more cores, but those processors on cell phones have gotten bigger and bigger over time as you download apps to them, right? So for smartphones, as those apps increase their needs, that cell phone resources have grown. And this is something that object drives

Starting point is 00:10:01 would be able to take advantage as well. You could put a bigger and bigger CPU on there. Let's say if you're using solid state media, you're going to need a bigger CPU than if you're using SMR media. And then the whole thing, the whole drive is approved, and it's managed as such. And they're shipping millions of these, just like they're shipping millions of disks today.

Starting point is 00:10:27 And that's going to drive down the cost of those drives, regardless of whatever the interface is. And it enables those resources to match and tune with each other in size appropriately. And one of the things that we're doing is trying to standardize a little bit higher Ethernet speeds. Because in the case of solid state, maybe 10 gig Ethernet is fine. In the case of SMR, maybe 1 gig is fine. But there's mixed media kind of drives that are envisioned that would fall in between

Starting point is 00:10:59 those and the cost of that Ethernet interface is a key component of the cost of the drive. And I'll talk about that in a bit. But there are some things that object drives do not solve. Management complexity. You're still going to have to manage a lot of these things. And in fact, if you were managing each storage node, now you're going to manage individual drives. Instead of a server component that sort of abstracts and aggregates the management of the disk drives,

Starting point is 00:11:32 now each drive is independently accessible through its own IP address. There's still no management software necessarily included with the drives. So one of the things we're doing in SNE is concentrating on management first right how do you manage 10,000 of these things in a single data center how do you manage a hundred thousand of these things right and and it is a best place to do out scale up management is obviously above the individual storage node but you want to you want to have things like automation aggregation you

Starting point is 00:12:11 want to do operations across multiple of these things you don't want to have have to upgrade all of them at the same time when you tip firmware. We don't solve end-to-end security. Obviously, data is secured on the drive, and whatever software is talking to that drive secures the data on the drive. But it doesn't mean that necessarily all the credentials of all the users, of all the data that's on that drive, have to be in the drive. You don't want to be going and adding passwords to different drives for each user. So that needs to be

Starting point is 00:12:51 handled at a higher level as well. So there is no secure multi-tenancy in reality, but further upper layers are going to handle that for us. Yeah, Chris? You could say that it's obvious that data is secure in the drive as opposed to cryptic stuff. No, I mean, you could have multiple users of a single drive and they each authenticate separately and their data isn't accessible by other users of that drive. That's kind of typical.

Starting point is 00:13:23 So really the user of the drive is talking to it. Whatever system is talking to it, correct, and the software that's running on that. It could be a file system. It could be Ceph, right? It could be Swift, right? Swift is handling the authentication and authorization, let's say, of the end user.

Starting point is 00:13:44 It goes out to Keystone, grabs a token, uses that to access the data. That could be on several of these object drives, right? And, you know, sharding or whatever, right? So if, for example, I were to break into your data center and pull one of these drives out and bring it home, and I knew the keys, I could still read that. So the question is, if the drive goes out the back of the data center for repair or other means,

Starting point is 00:14:14 the data is not necessarily protected on there unless the software that's using the drive encrypts the value in a key value scheme. That's always true. But that's not what I mean by end-to-end security. Ceph is putting the encrypted shards on those drives, not the end user. So that's why we say that higher level software typically be trusted for authentication and access trial. We do have basic security support

Starting point is 00:14:46 on the drive as well as secure transport if needed. And then you know there is no end-to-end integrity at this point. Something like T10 diff does that for the SCSI protocols and certainly at this point we're not talking about you know having that kind of integrity check all through the path of the network, for example. Yeah? Objects like Swift or Datalisk where they have these kind of processes that can actually scrub the data and make sure the integrity is...

Starting point is 00:15:20 That's right, the higher level software. So the question is, or the comment is that Swift does this already. It will go out and scrub the various values, for example, and make sure that nothing's changed. It can, as part of its data or value, include a checksum of some sort. Is the separation of concerns an issue, mean uh eventually yeah a lot of this stuff could move down to the drive if possible if you put a big enough cpu there right so let me start with an example and this is uh with uh seph osds and and again this looks like a

Starting point is 00:16:02 desk side system but it these things are typically trays in a rack, right? They might be based on open compute project designs that are open source hardware out there, commodity suppliers of those. And inside are regular disk drives here. And then you've got some services that are running around the network as well. Some of them themselves are scaling out, some of them cluster and so forth. But when you look at object drives, you're really looking at a bunch of drives,

Starting point is 00:16:39 Ethernet connected and a switch in one of these trays. So because there's no server part of that, the cost of course is simpler. But not only that, all the drives are separate nodes on the network, so they're all communicating sort of on the back end amongst each other and on the front end out to the ultimate client. So we've looked at different ways to do this. The traditional hard drive, of course, is interconnected via SAS or SATA and SCSI T10 determines the protocol that's that. There is limited routability with that. There are high development costs because you have to develop drivers. As new technologies come along, like SMR, you have to go and change all your software. And typically these are attached to a single host, not networked, right?

Starting point is 00:17:38 And then T10 SCSI protocol itself is very low level, right? It's not designed for lossy network connectivity. In fact, it's not typically used in multi-client concurrent use cases. And then if you look at the kinetic key value drive, it's just got an Ethernet interface there. It's got an object API that you basically specify the key, specify the value, and that gets stored. You want to access it, you provide the key, you get the value back.

Starting point is 00:18:12 Pretty simple. There are some limitations to the size of the value in there. There are some limit, you know, key space or key name space kind of restrictions as well. But it is fully routable. or key namespace kind of restrictions as well, but it is fully routable. And it is a lower development cost because it's an interface that programmers are already familiar with.

Starting point is 00:18:35 They're already using, like I said, Cassandra. There's 2,000 people here at the DWARFS R conference. It is intended for multi-client access so that multiple different systems can go into a single drive and store their data there. And then there's a redundant ethernet port so that if one switch fails, let's say, you could get to the same drive through another switch.

Starting point is 00:19:02 And the Kinetic Protocol itself is now determined by an open source project that's in the Linux foundation if you google kinetic open source project you'll find the wiki page for it we're just getting started on that but it is a higher level key value interface it's designed for lossy network connectivity and so forth. And then you can actually have software running down on the drive if you have a big enough processor there, right? These days when you go to buy a processor, it's almost impossible to buy a single-core processor anymore. So you get the low-end processor and it's got four cores in it in some cases.

Starting point is 00:19:48 So if you're doing that, if you're putting that on the other end of the Ethernet connection, then you might have a custom application running down there. You might have Linux or some other embedded operating system there. You might even have a Docker system down there

Starting point is 00:20:04 where you can put a container. But that software down there can then use the standard block interface to talk to devices back here that are using SAS, SATA, NVMe or whatever, right? And then some combination of disk or flash down there. So that would be if you wanted to ship something like a Ceph drive, for example, right? Where you've loaded a configured version of software there that you want to ship to all your customers, right? And this will work really well from the drive manufacturers if they want to support that kind of embedded custom application. That's why we call it pre-configured,

Starting point is 00:20:49 because in this particular use case, we're not really looking at being able to download your own code. You can also load a kinetic API down there as your custom map as well. And you can have it running side by side with Ceph. It can also run in a container, or it can be sort of a minimal Linux, sort of just enough operating system kind of approach as well. But the custom application doesn't need to modify itself in this case because he sees your standard stack of stuff.

Starting point is 00:21:30 And then a provisionable in-storage compute drive really has applications that you can download. And we talked about this last night in the Internet of Things Birds of Feather session, whereby the data here may be getting colder over time, and you don't need as much of that CPU to do the I O what can use a CPU for you can you can put other things down there right and so what you provision for this drive can change over time now not everybody has object drives in their hands today.

Starting point is 00:22:05 These things are coming along and one of the approaches that we've seen out there is people are putting these sort of interposers in front of the standard drives today. And it can be a single drive, Each drive has its own interposer. An interposer could maybe connect to several drives in the same tray, but it would basically give you the same existing drive to be used as Ethernet connected object drives, allows connections of drives to be virtualized, and then you can be running Kinetic there, a custom app that's using the existing block interfaces that everybody knows and somewhat loves. Or you could actually have new kinds of applications

Starting point is 00:22:54 that use a key value interface. A key value interface, this would be new software that you're using key value and then talking to that. The advantage here is that you're eliminating quite a bit of the stack within the Linux kernel. No file system, no SCSI drivers, you're talking key value library in that case. So SNI has come up with some terms for these different use cases. The first one is a key value protocol object drive and the idea here is that when you're

Starting point is 00:23:35 just doing key value and you're not actually loading software down there, the resources that you require aren't really much more than what's currently on the shipping SCSI and SATA drives. Yes, you do need probably a little bit more memory. You do need a little more CPU horsepower. But it's a very simple mapping to the underlying storage. You have a table somewhere where the key is, and it lists the logical block address range, right? So that when you get the key from the customer or the client, you look up in that table, you know which logical block addresses to send back as part of the value. Pretty simple,

Starting point is 00:24:17 right? The in-storage compute object drive is these other use cases where you have enough CPU and memory for some object node software to be embedded on the drive or to be downloaded to the drive right and it you know if it's if it's embedded you know tip of this can be factory installed and shipped as a Ceph drive or a Swift drive right if it's a general-purpose download that means that Facebook wants to put their own software on that, right? So you ship them an empty drive, but with enough resources for Facebook

Starting point is 00:24:51 to put their object node software down there, right? It may have additional requirements, such as you don't want all simp ending lists there. For example, when you go and provision a Ceph node with a storage server today, they'll recommend that part of your media include a solid state drive because you'll get better performance for some of the operations that are going on there. So definitely we'll see some of these in storage compute drives have sort of a mixed media underlying that CPU. But again, whether you're in this camp or this camp, in both cases that interface abstracts the recording technology.

Starting point is 00:25:40 So when we go to HAM or all these other new media technologies, when we go to persistent memory, that interface stays the same. It's Ceph. It's kinetic. It's something that is going to get extra performance from those new media types. But you don't have to change the software that's above it. So digging down into the key value protocol object drives, I did mention it eliminates existing part of that usual storage stack, the block drivers, the logical volume managers, the file systems, and most importantly all the associated bugs and maintenance costs

Starting point is 00:26:24 and license fees that go along with it. So it is a greenfield kind of thing where we don't expect existing applications to work with these unless you actually have something that is very close to the key value protocol that Kinetic already has. Existing applications do need to be rewritten or adapted. And then the firmware is upgraded as an entire image as the key value protocol changes over time. And of course, hyperscale customers are already doing this, right? They're already creating their own apps. They're already creating their own, right? They're already creating their own apps, right?

Starting point is 00:27:06 They're already creating their own middleware. They're already creating their own storage software. They have their own sharding techniques that go across these drives, etc., right? And then, like I say, the key value organization of data is already growing in popularity. So if you have something like Cassandra or something key value already, using a key value drive is not a big step. The other one, drilling down on the in storage compute drives, it has the same value as the key value protocol. Plus, you don't need a separate server to run that object node service.

Starting point is 00:27:46 In Kinetic, you would, right? You'd still need a server to run Ceph, but it would be a smaller server because you knocked a whole chunk of the stack out. And then, because of that, because you don't need that separate object server, the scaling is smoother, right? Now a Ceph object node is a drive.

Starting point is 00:28:07 So you're adding object nodes when you add additional drives. And then if you want to add additional features into the object node software, they can all be deployed independently. So Ceph already has a way to upgrade drives incrementally, and that can be leveraged as well. And then there's fewer hardware types that need to be maintained for selected use cases.

Starting point is 00:28:34 So you have a bunch of spare object drives. You don't need a bunch of spare servers. You don't need a bunch of spare chassis. You may have to replace the Ethernet switch that's in that chassis at some point. But in general, the failure domains are more fine-grained, right? So the whole storage node with a dozen drives in it doesn't fail at once, alright? Each drive fails. So, you know, what you're seeing is the failure is more smooth. The failure scaling is more smooth.

Starting point is 00:29:09 You can fence off an individual drive instead of 12 drives at once. And then I mentioned this before. As data on a drive becomes colder, that CPU and memory becomes less utilized. In other words, I'm not having to service so many requests as I did initially when the drive was empty and things were filling up. So then the ability to host your software then allows you to put new features perhaps in the drive, some of the data services. You can actually extract metadata from the data down there

Starting point is 00:29:49 perhaps. You can perform preservation tasks moving things from Microsoft Word, whatever it is, to you know two years from now and three years from now. You can add these other data services you know use it as a spare drive for the second copy or third copy of the data that you have in there. You can use it for archiving, retention. When the data expires you can remove it. You have software down there that actually removes the data after a certain period of time. And then, as I said last night on the Internet of Things, you can put some data analysis down there.

Starting point is 00:30:32 You can analyze the old data. You can have microservices that get loaded down there that then analyze the data that they're looking for. So the Ethernet connectivity. Ethernet speeds currently standard at 1 in 10. The object drives are high value, low margin devices, so the current 10 gig is just really too expensive in the near term for those low-cost per gigabyte drives, right? And yet, if you add certain media types behind there,

Starting point is 00:31:15 one gig might not be enough. So we're looking at 2.5 and 5 gig. There's an effort that Paul Van Suler, is your name, Paul? is leading in SFF 8601 that's specifying auto negotiation using existing silicon implementation. So we know that two and a half and five are like half and a quarter or quarter and a half of the 10 gig. So the idea there is to just upgrade the firmware and the switch to do this auto-negotiation.

Starting point is 00:31:52 You might connect at one gig first and then sort of find out which side can do what speed and then reconnect at a higher speed and get things done there. And that's a very short-term effort. Paul is promising to get this done by next week, right? But you've got a draft now, right? That's right. And then we're hoping to get an effort kicked off in 802.3

Starting point is 00:32:29 to actually standardize a single-lane 2.5 and 5 gig speeds that people can put in in silicon. Currently they have what's called a steady group, which is sort of investigating the market costs involved, that kind of stuff and we're hoping that that becomes an actual standard

Starting point is 00:32:55 effort in November or something like that when they make that decision yeah yeah and they're getting pushback because I'm sure decision. Yeah. So yeah, contact your Ethernet vendor and tell them he needs to get involved. So this would be the first time that I get a message that the storage always doubles and network and stuff always goes up. Yeah, yeah, yeah.

Starting point is 00:33:38 So that seems to be quite a stretch. I get that you have a whole lot of points that you're going to solve. But if you look at what's happened with SAS, it did. It doubled to three and then six, right? But it's kind of hanging out at six for a while now. And that may be the sweet spot for the current spinning media, and especially with SMR. That's why we're interested in the 5 gig. We know that that could be a very good sweet spot for these kind of dry form factor medias. Now, At some point, if these are solid state and have persistent memory in them and the CPU is going like crazy, you might need 40 gigabytes, right? Yeah.

Starting point is 00:35:05 That's right. Yeah. Yeah. Where would you typically place this object right in a data center? Will this be in server or out in a separate rack or in the applications? So how do these get deployed in a data center? Will they be similar to the direct attached storage?

Starting point is 00:35:30 They don't need to be. They can be just drives in these trays that go into a rack. And they have an Ethernet switch probably on that tray. They may have one Ethernet switch per N trays, right? Depending upon the uplink speed from that Ethernet switch. So. So I'm just trying to understand why 10 gig is a complex choice.

Starting point is 00:35:59 10 gig is great for coming out of the tray, but from the switch to the drive, it's way overkill. And it, you know, how much does a 10 gig interface cost? Take that and add it to the cost of every drive. And a rack full of drives, it makes a big

Starting point is 00:36:18 difference. Right? So are you saying that every drive will have this interface? Yes. The question is, will every drive have this interface? Yes. Every drive will have a 2.5 or 5 gig or 1 gig interface. So why can't you do this in a storage node? And that's what we have today, right? You have to do that. But I think there's a better question here.

Starting point is 00:36:49 Okay. A better question is why Ethernet? Why Ethernet? And more than that, why IP? Why TCP IP? Right. You know, if you're looking at different bus speeds than anything else,

Starting point is 00:37:01 in most of the cases, these disks are all going to be in the same routing domain anyway. They're not going to need to be routed. I mean, you have some piece of equipment that's managing them, and it has to talk to them. So you don't need IP. You can do it on Ethernet, raw Ethernet, as an option. Or you can have a simpler network, even. I mean, Ethernet's pretty simple these days.

Starting point is 00:37:24 But, I mean, the point is you could have some other mechanism for talking to them. It doesn't have to be SATA, it doesn't have to be the things we're familiar with, but honestly, some of these things, NVRAM coming out, why not make that flexible so you can use RDMA? Yes. You said in RDMA, the same thing. And yeah, okay, that's a much faster way of going, but it might be more expensive or not.

Starting point is 00:37:54 But that piece of flexibility to say, we're not going to define that it's got to be this type of network interface. Right. So Chris is making the point that it doesn't necessarily need to be Ethernet, right? And so we're not restricting ourselves to just Ethernet or TCP. But we know that there's a market for the Ethernet-connected drives at least, right? But the TWIG itself is looking at things like PCIe, NVM attached ones, you know, so we will, the concept of object drive is not tied to Ethernet, right?

Starting point is 00:38:42 But we want to make sure there's interoperability with Ethernet as well, right? So when drive manufacturers ship Ethernet ones, we want to make sure those are interoperable. When they ship PCI ones, we'll have something for those. InfiniBand ones, I don't know. Yeah, absolutely. So we're not locking anything in. What we are doing is as people want to ship these things, we want to make sure there's interoperability between them. As you write software for them, you shouldn't tie yourself either to particular networking technologies. What do you think the latency floor is for Ethernet connector mics? The question is, what is the latency floor for Ethernet connector drives, right? That's a good question.

Starting point is 00:39:29 And it's the speed of light, obviously. But it's going to depend on the media, right? It's going to depend on how fast that CPU can turn around requests, right? So there's going to be a wide range of latencies

Starting point is 00:39:44 involved in these, and you're going to be a wide range of latencies involved in these, and you're going to say what the requirements are to the drive manufacturers, right? Does that make sense? Why do you think there's other protocols for other manufacturers that might be able to take you down to 100 mics? Yeah, so he's saying that the protocols demand something around 50 to 100 microseconds instead of

Starting point is 00:40:10 milliseconds these days. And the protocol may fall over at that point. Ethernet switch vendors are turning the switching mics down to sub-microseconds. Yeah. So Julian points out that Ethernet switches itself

Starting point is 00:40:24 are sub-microseconds. Yeah. Yeah. So Julian points out that Ethernet switches itself, or sub-microseconds. So, yeah. Yeah, in the back. So the question is, I think it's around kinetic, for example, that what is the limitation on the value size? I think it's one megabyte right now, currently. It's not necessarily fixing stone but again that's a protocol that this Linux open source foundation is doing so you know maybe that's the limitation maybe it's not but it's something certainly that that you might want to consider right in the case of a Ceph drive then

Starting point is 00:41:22 it's a Ceph limitation. In the case of a Swift drive, it's going to be Swift's limitation on object size and object numbers. Whether they work good with small objects or big objects, I can't tell you. Yeah? So from what I understand from here is that we are actually increasing the total number of the sources that are going to be provisioned for object storage. So you're going to have more CPU because every disk will have more CPU. You're going to have more network bandwidth because every disk will have more bandwidth. And I see the benefit that as you, it gives you scale out.

Starting point is 00:42:01 You add more disks, you get more, like, the capacity increases for processing. Incrementally, right. Smoothly, incrementally, yeah. But isn't there a downside to this as well, where we don't know about the costs of these things? Presumably it's like having a small server on each disk, and the cost of this will increase compared to what you have today where obviously the disks are only disks and they don't have so so the question is around cost and the added resources that these things require and sort of what's your cost per gigabyte as a result yes right but this has been going on for a while, right? I mean, solid-state costs more than spinning rust, right? And how do they sell solid-state?

Starting point is 00:42:52 They say it's a cost per IOP, right? Not cost per gigabyte. So you can imagine these things are like a combination of IOPs and gigabytes, right? So there will be a sweet spot for some of these tasks that you want to do down there where it actually makes financial sense right so you will get you know they have a certain cost per gigabyte and you will have a certain cost per iop and and what it but what it allows is you to sort of tweak that and between the cost per gigabyte of a disk and the cost per gigabyte of a solid state, but the IOPS also between the cost of a disk and the cost

Starting point is 00:43:32 of a solid state. And the fact that you're saving a bunch of money on things like software, maintenance, management, deployment, through stocking, those are all the factors that make that, make customers like Facebook want to pay that little bit higher cost per gigabyte. So you get a little bit more IOPS or they get a little less other costs around the system. Does that make sense? Yeah. I think SW is next. So does the fact that you're hosted in the Linux Foundation mean that you're tied to the running Linux? So there's a fact that we're hosted in... So Kinetic is hosted in Linux Foundation.

Starting point is 00:44:17 And Kinetic doesn't care. I mean, in the case of Kinetic, it doesn't matter what... You could be running, what is it, Wind River down there, who cares, right? In the case of a hosted software thing, that's not being done in the Linux Foundation, that's going to be a choice that, you know, SNEA hasn't gotten to the point where we're specifying operating systems down there. No, I wasn't going to discuss SNEA. Yeah, no, no. Keep in mind

Starting point is 00:44:46 kinetic is everything's abstraction behind the kinetic interface. So they don't care. But in the case of in storage compute you're going to want to go where all the software is. We can put Windows Server down there

Starting point is 00:45:01 and we'll talk. Nanoserver. Who's next? Go ahead, Julian. Is there a standardization on whether or not the objects presented by these drives are immutable or not? Is there a standard on whether the objects are immutable or not? In the kinetic protocol, no.

Starting point is 00:45:20 But other protocols, I expect Swift eventually to handle immutable objects at some point. So Kinetic allows you to change an object without pumping and deleting? Yes. Kinetic allows you to change the value for a given key. But you have to change the entire value, you can't override it? Correct, I think. Yeah, currently, yeah. Morali? Yes. Correct, I think. Yeah, currently, yeah.

Starting point is 00:45:47 Raleigh. Yeah, so extending your connectivity, right? One thing, do they have those speeds? What about the infrastructure that connects them to the speeds? Do they support those speeds? The what? You don't support those speeds. How do you make that key connect them to large programs?

Starting point is 00:46:06 Actually, we are talking to the switch vendors. They're in that 802.3, pushing for this as well. Some switch vendors. The whole ecosystem has to come together? Yes. Well, it comes together in what Facebook buys, right? They buy a tray with an Ethernet switch in it and a bunch of drives. They may buy it from, they may buy the drives from the disk drive vendors, they may buy the chassis from Supermicro and some other vendors. Those guys may incorporate an Ethernet port. But But there's existing shipping trays for this already at one gig.

Starting point is 00:46:49 I don't know about 2.5. I know there are all of them. All of them. Yeah, but that's what we're talking about. That's why we think this existing silicon thing is important. I haven't heard anything like that. But there's discussions going in their OCP group, which their founder Yeah. As a hyperscaler, right?

Starting point is 00:47:26 Could be Amazon, could be Google. We have time for a couple more questions. Correct. Yes, that's a key point. Right. And all they've done is they've taken 10 and divide by 4 and say, okay, there's your single-link 2.5. Other questions?

Starting point is 00:47:55 Okay, well, this is an old slide, but I want to thank the Object Drive Twig. Got some contributions from David Slick and Robert Quinn and Paul Suler. Thank you. Thank you very much. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.

Starting point is 00:48:26 Here you can ask questions and discuss this topic further with your peers in the developer community. For additional information about the Storage Developer Conference, visit storage-developer.org.

Your Ad Here

Storage Developer Conference - #5: Object Drives: A new Architectural Partitioning

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.