Storage Developer Conference - #103: PCI Express: What’s Next for Storage

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, Episode 103. All right, good morning. My name is Devendra Das Sharma, and I work for Intel. I'm a PCI-SIG board member. I'm going to talk about PCI Express, what's coming up in general for the technology, and more specifically for the storage. Here's a brief agenda. We'll talk about how PCI Express has evolved, how it delivers power-efficient performance,

Starting point is 00:01:06 significant number of RAS enhancements that we have done over the years, IO virtualization, different form factors, compliance program by the PCI SIGIG is a fairly huge body with 750 member companies worldwide, and has developed the IO technologies for more than three decades, right? And we'll talk about that. So fairly healthy ecosystem that exists with the PCI-SIG and with the PCI Express technology that we drive. Back in 1992, this is what the PCI, PCI-X-based system used to look like.

Starting point is 00:01:57 It was a bus-based system. You had multiple PCI devices sitting on the bus. You had a CPU connected to a host bridge. Memory used to be there. And this was more of a PC-centric technology. Graphics was not on PCI bus. It was an AGP bus. And then you got either PCI devices

Starting point is 00:02:16 hooked directly to the host bridge, or you had bridges followed by bridges, and then you had a bunch of PCI devices hanging off the bridge. So we started in 1992 with PCI. There were about five to six generations of evolution in that technology. You started off with around 32 bits, 33 megahertz, went to 64 bits, 33 megahertz, 64 bits, 66 megahertz,

Starting point is 00:02:40 every single time effectively doubling the bandwidth on the bus, and went for five generations, six if you count the QDR data rate that happened with PCI. And at some point, bus-based systems ran out of bandwidth. You cannot deliver enough bandwidth when multiple devices coexist on a bus. So around 2004, we moved from a bus-based interconnect to a links-based interconnect, which is the PCI Express. It's a full duplex differential signaling because it's much more pin-efficient. You can deliver a lot of bandwidth with a lot less pins.

Starting point is 00:03:17 All of this backwards compatibility that existed with the bus-based system, when we moved to the links-based interconnect, we naturally, you cannot make things in a hardware backwards compatible manner. Silicon-based compatibility or even form factor-based compatibility, we had to break those. But what we did was we maintained the software-based backwards compatibility. You can still take a PCI driver and it will run on the PCI Express-based system. And we also maintained the architectural producer-consumer ordering model

Starting point is 00:03:51 for data consistency. Those are the critical pieces that carried along. We evolved on top of that, but the fundamental basics of it was continued forward, so you could completely interoperate even through this transition. And during that, you know, the transition basically looked like you got CPU, you got your root complex, notice that graphics becomes PCI Express, and then these are direct PCI Express links in which you have networking, storage, or you could go through switches and build your hierarchy of PCI Express,

Starting point is 00:04:21 or you could even put a PCI bridge in order to manage that transition, because not everybody would have moved from PCI bus-based devices to PCI Express-based links. So that's how the transition was managed. It was fairly successful from that perspective, especially with the bridges and especially with software-based backwards compatibility. It was a fairly smooth transition. Around 2004, we had the first generation of PCI Express-based systems out and available in the marketplace. And then evolution has continued.

Starting point is 00:04:57 From there, we have moved into an SOC-based methodology where everything gets integrated into the CPU, and we'll see a picture of that. But effectively, everything goes into a CPU complex here. You've got PCI Express links coming out directly from the CPU, and currently we are in the fifth generation of the technology. We double the bandwidth every single generation, right? So 2004 through now,

Starting point is 00:05:20 we are on the verge of delivering the fifth generation of the technology, so 32x in terms of the bandwidth per pin. Every single generation we double the bandwidth. And during this journey, the entire compute landscape has seen a lot of changes. We have moved from PC-based systems to mostly handheld devices and a bunch of devices that are out there, right? The things and devices connected to data centers

Starting point is 00:05:52 and the edge and everything. So the compute landscape has changed significantly, and all throughout this, PCI Express has remained as the ubiquitous IO technology that is driving this revolution. In the context of storage, as the ubiquitous IO technology that is driving this revolution. In the context of storage, what has happened is that because of all of these things and devices, billions of them that are out there,

Starting point is 00:06:15 there is an explosion in the data. And with data, you have to have three things that needs to happen. You have to store more, you have to move more, and you have to have three things that needs to happen. You have to store more, you have to move more, and you have to process more. And that is what is triggering this, what we call, virtuous cycle of growth. So from a storage perspective, there is a data explosion

Starting point is 00:06:36 that is driving SSD innovations and adoption. And you will see that, you know, if you look into PCI Express versus other types of interconnects that connect to the storage, we are on the upswing. Whether it is number of units or whether it is the number of petabytes

Starting point is 00:06:55 and the CAGRs are already there. But fundamentally what's happening is when you have this volume of data, when you have to process so much, what is the interconnect that can deliver you the lower latency, the higher bandwidth, right? And bandwidth that is scalable bandwidth, not just from a speeds and feeds point of view.

Starting point is 00:07:17 In PCI Express, you can move single lane, two lanes, four lanes, eight lanes, all the way to 16 lanes and deliver a lot of the bandwidth. In a low latency manner coming from the CPU, the natural choice is PCI Express. And with NVM Express, that move happened very rapidly. And that is what is moving a lot of this move

Starting point is 00:07:38 into the PCI Express-based links. When that happens, when fundamentally storage is no longer the bottleneck, there's a lot of pressure on the networking side because you need to move the data faster. So there is more of a revolution going on there. The transition from 10 to 40 to 100 to 200 is happening at a much more faster pace than what probably anybody would have projected four or five years back. And when that happens,

Starting point is 00:08:05 the other thing that is happening is you have to process the data. So you've got a lot of AI, neural networks, all of that kind of things that are coming in, and you see a bunch of accelerators coming up. All of that is driving a lot of bandwidth demand on the IO, which is the virtual cycle that I referred to. So whether it is in the context of storage or any other application,

Starting point is 00:08:30 we will take a look into this in more detail. There is a pictorial version of it, but PCI Express 3.0 came out in 2010. Products are out in 2011, and I will show a picture of that. 4.0, 2017. 5.0, we are expecting 0.9 end of this year of the spec, and a final version of the spec sometime in Q1 of 2019. So this picture speaks to that particular evolution that I was referring to. Around

Starting point is 00:09:00 2004, we came out with Gen 1. At that time, you had CPUs connected through the front side bus. And this is, you know, in the server context, normally we refer to the higher volume, highest volume is two socket server. So everything is normalized here with respect to, up until this point, right, is with respect to a two socket based system. So CPUs are connected through front side bus. You've got your hub here that is connecting to memory, which is an ASIC, and then you've got a bunch of PCI Express lanes coming out. At that time, there were about 28 lanes coming out. So if you, if you

Starting point is 00:09:36 do it on a per CPU socket basis, you get 14 lanes per CPU socket. Notice that IO has moved to differential signaling in this time frame when whereas the coherency link is still on a bus based system right so io has preceded in terms of where uh cpu to cpu connectivity had been and this is not just you know this is across the board right across different vendors if you look into it in that time frame. From 2.5 gigatransfers per second, PCIe Gen 2 products came out in 2007 in the market. And, you know, by this time frame,

Starting point is 00:10:18 notice that memory has moved to the CPU, right, in order to get more memory bandwidth. Demand is going up, so you need to deliver not just more number of cores, but more amount of memory bandwidth. You've got to feed the beast. And then these are the coherency-based links, and you still had an IO hub kind of a concept, which was basically which would take the coherent link and transfer it into PCI Express. Notice the lane count. It has gone up from 14 lanes per socket to either 18 or 36 lanes per socket, depending on a rich IO kind of a topology, in which case you will deliver 36 times to 72 lanes

Starting point is 00:10:50 of PCI Express. Those that don't need as much, you will deliver 36 lanes of PCI Express. Doubling the bandwidth, and also more than doubling the number of lanes, depending on your usage model. PCI Express Gen 3-based products come out in 2011, and you got CPUs connected through coherency links, and you got memory. And just like memory had moved to the CPU, PCI Express moves to the CPU. Huge move from an interconnect technology point of view. In order to get into the CPU, you have to be the ubiquitous IO of choice. If there are five or six contenders, you're not going to put them on the CPU.

Starting point is 00:11:28 You will put some kind of a translator chip and then have IO coming out of it, right? So by then, it's already well established that, you know, on a platform, you're not going to spend the real estate, the power, and all of that through a different component. With Moore's Law,

Starting point is 00:11:43 as you're getting more and more die area on the CPU, you move things into the CPU. That's where CPU gets, IO gets integrated into the CPU. Number of lane count increases yet again. Starting, it was 40 lanes and today, if you look into, a lot

Starting point is 00:12:00 of people are offering more than 100 lanes of PCI Express coming out of the CPU socket. Huge, huge number of lanes coming out of the CPU socket. And again, if you look into the usages, networking tends to be, you will put one networking device, typically a per-slot bandwidth consumer, which is the per-slot bandwidth that is being shown here. Storage will be lots of storage devices, lots of SSDs on the system, so it's a fan-out consumer. More number of lanes is what it wants to consume,

Starting point is 00:12:31 and also it's an aggregate IO bandwidth consumer. You are going to aggregate your bandwidth across multiple devices, and accelerators are in both the categories. They can consume, each of them can consume as much bandwidth as you can deliver, and people are putting multiple accelerators in the system. Combination of all of these three are moving both the vectors forward.

Starting point is 00:12:52 Number of lanes per socket and bandwidth per pin, which is the frequency, speeds and feeds. Both of them are getting pushed up. Gen 4 timeframe, you have 16 giga transfers per second. That's when the spec comes out. And then Gen 5 is 32 giga transfers per second. As I said, we are expecting it in 0.9 is pretty much when things are more or less stable. We are expecting that this year, by the end of this year. And Q1 2019 is when the 1.0 will come out. So we are doubling the data rate every single generation. Now you'll notice that 5 to 8 is not quite the doubling of the data rate, but what happened was we changed the fundamental encoding mechanism.

Starting point is 00:13:34 We moved from an 8-bit 10-bit encoding to 128-bit, 130-bit encoding. So if you take the encoding efficiency, it's 1.25 gain multiplied by 1.6 in the data rate, and you've got a 2x improvement in the bandwidth per pin. The reason we did that is... The reason we did that is that we wanted to still maintain the 20-inch two-connector connection in the server without requiring any retimer kind of a device or without requiring expensive

Starting point is 00:14:14 material or without requiring power, extra power, right? Any of these trade-offs you can make and run the link at 10 gigatransfers per second. But in the interest of making the technology not become a niche, we decided to take the hit on the logic side, put an extra encoder on the logic, which is, again, with Moore's law, that's not a big deal. But keep the channel reach the same. But Gen 3 to Gen 4, you either have to have better materials. And this gave us, there is enough time in here for us to go and make the changes in the entire industry because what used to be lower loss material, expensive material,

Starting point is 00:14:50 are now becoming more and more mainstream on the board. So a lot of people can route on their servers by making the channels a little shorter without requiring retimers. And those that need the extra length have got retimer-based devices to extend the channel reach. In addition to the speeds and feeds, you will see that, you know, we are introducing

Starting point is 00:15:11 new things in terms of the protocol. You know, around the Gen 2 timeframe, we introduced IO virtualization. So this is where you had multiple virtual machines running on the server. And in order to give customers or users the experience as if each VM owns its own device, so we virtualized the devices. So that way you can do a direct association between a VM and a virtual function or a VM or a virtual device. Effectively, a device can present itself as multiple virtual devices, and you can associate them. And PCI Express, as a specification, enabled that by making all of those I-O virtualization

Starting point is 00:15:53 extensions on the base specification itself. So those are part of that, and this is when the product started reflecting I-O virtualization in the root complex, right, as well as on the device side. When Gen 3 came along, you will notice that there are atomic ops, gassing hints, lower latency, and fundamentally what's happening there is, in this time frame, accelerators are becoming popular,

Starting point is 00:16:21 and you want to be able to have a way to offer an accelerator to atomic operations, not just to the host memory, but also amongst themselves through the CPU host. So PCI Express base specification got modified to introduce the notion of atomic operations, so you could atomically modify things, right? Different atomic operations were introduced. We introduced the notion of caching hints and lower latency especially with IO moving into the CPU what can happen is you can now take advantage of the

Starting point is 00:16:52 caching hierarchy inside the CPU that way the effectively descriptors and things like that that you are reusing often can reside in the caching hierarchy you don't have to go to the system memory in order or even. You don't have to go to the system memory in order, or even worse, you don't have to go

Starting point is 00:17:08 through the coherent link to the system memory on the other side to access those. So things that you are going to access more frequently, you could take advantage of it through hints on the link, and the processing element there would try to keep it in its caching hierarchy because you give the hint that you're going to have locality of reference

Starting point is 00:17:28 as far as those addresses are concerned. And effectively, what that has enabled us to do is that that has made the transition for the, for example, networking fairly smooth because we could store all the descriptors locally in the last level cache, and then the cores can access them without having to go to the memory.

Starting point is 00:17:47 And that has enabled us to deliver 10 gigabit networking as well as 40 and all of that very smoothly, right? People that were way back when people were looking into TCP offload on Ethernet, the thinking there was there is going to be so much of processing involved, there is going to be so much of going back and forth to the system memory involved that

Starting point is 00:18:10 it's better to do all of that processing outside. With all of these changes, we haven't seen that becoming the bottleneck. We have been able to deliver the line rate on PCI Express. Improved power management becomes an interesting concept in this time frame because what has happened is during this compute evolution, handhelds have become very popular. And with PCI Express being a PC-based technology, we always had low power states. But our low power states were in the milliwatts range. And if you have your system connected to a power supply,

Starting point is 00:18:49 milliwatts is a fairly good low-power state to be in. But once you are on your battery-charged cell phone or whatever, milliwatts doesn't quite cut it. You need to be in the microwatts range, and that's where MIPI was doing a pretty good job. And at some point, we we had even tied with the notion of doing PCI Express protocol on MIPI 5. MPCI was the concept that got introduced.

Starting point is 00:19:15 However, that did not become very popular because PCI Express, we went ahead and fixed the power saving state and introduced a very aggressive L1 substate notion that took our power consumption down in the idle system to microwatts range. With all of these changes, you will see that

Starting point is 00:19:33 we are always not just trying to double the data rate, but looking into the type of applications that are coming up, and we are making the right set of changes in the IO technology to be able to match with those needs. These are the things that are coming up, and we are making the right set of changes in the IO technology to be able to match with those needs. So these are the things that are coming up, and then in addition to power enhancements, we have form factor and usage models. We'll talk about that. So fundamentally

Starting point is 00:19:56 we double the bandwidth every three to four years, and we make all the changes that we need to get the ecosystem going. PC Express is a layered architecture, and that helps us make the, make these transitions move, go through, right? So you'll see that, you know, the software, we try to preserve it.

Starting point is 00:20:23 There are enhancements, but we try to preserve that. Transaction is a split transaction, packet-based protocol, very modern concept introduced since the PCI Express Gen 1 days. Credit-based flow control, virtual channels to guarantee quality of service. Data link layer is responsible for reliable data transport services through CRC, Retry, ACNAC, all of those things. So very high reliability-based links. And if you have done a failure-in-time analysis, you will see that this link is pretty much nothing gets through.

Starting point is 00:20:56 Your fit is going to be significantly lower than 1. Let's put it in that way. Orders of magnitude lower than one, right? Logical, phi, and electrical are there for physical information exchange, interface initialization and maintenance. And so because these are very specific functionality, whenever we do the data rate change,

Starting point is 00:21:21 we just change this part. The rest of them remain the same. Whenever we make a protocol enhancements, we just change this part. The rest of them remain the same. Whenever we make a protocol enhancement, we just make the changes here. The rest of them can remain the same. So you can make the changes in different parts without having to move everything at the same time. And then the mechanical is the specific one, which is a market segment-based form factor. Because you don't really expect your form factors to be the same your form factor in a smartphone is very different than a form factor in a 2u chassis for

Starting point is 00:21:52 example right very different form factor so it's different types of form factors which pci express allows but all of them all of them work with the same silicon. You can take the silicon that came out in PCI Express Gen 1, single lane, and it's guaranteed to interoperate with a PCI Express Gen 5 device, which is 16 lanes. Of course, it will operate at the least common denominator, which is it will work as Gen 1 and Bi 1, but you've got

Starting point is 00:22:23 interoperability across the board. And that's a guarantee by the way the specification has been done. PCI-Seq recently had a liaison with SD Association, and we are extremely, we are very excited about it. It has got a huge volume in the IoT and handheld segment, and SD is looking to start using PCIe 3.0

Starting point is 00:22:55 to deliver higher data rate, and as you see that, and that's called SD Express, and it's going to take advantage of our compliance program, which we will talk about. It's a big, big volume, right? So we have an MOU and we are doing this jointly. It'll run on PCI Express technology. PCI Express, it delivers the power-efficient performance,

Starting point is 00:23:30 and from a scalability point of view, you got variable widths. You got by one, you got by two, by four, by eight. By 12 is not popular. By 16, and technically there is a by 12, and technically there is a by 32, but those don't exist from a real market point of view. So you can just ignore the by 12 there. So width-wise you can scale, right?

Starting point is 00:23:53 Single lane, two lanes, four lanes, eight lanes, 16 lanes. Frequency-wise, we have got five generations like we talked about. And low power, we, as we were talking about earlier, we have a very rich set of link and device states. On the link side we got different link substates, and I was talking about L1 substates, how that reduce the idle power consumption to micro watts. From a device

Starting point is 00:24:23 power state we have got a set of rich device power states, so that way not only is the link idle, but also you can put the device into a lower power state. And there are platform level optimizations and hooks, like dynamically you can allocate power. When you want to give more power to one device and less to the other device, you could do that dynamically in the platform. There is optimized buffer plus fill kind of mechanisms. The basic idea there is you want all of your I.O. as well as processing to happen within a certain window and then have the rest of the platform be at rest or in the idle state.

Starting point is 00:25:00 That way this is trying to coordinate when multiple devices are being active. That way the platform can save more power. It doesn't help you if you have got 10 devices in the system, and at any given point of time, somebody is using the link, and the rest nine of them are in a low-power state. So this is trying to coordinate that activity. And this goes along with that. Every device reports how much of latency it can tolerate.

Starting point is 00:25:27 So that way, the platform can orchestrate the power savings going through the entire platform. We have very low active power. So idle power is, like I said, in the microwatts, standby power. Active power is in the 5 picojoule per bit. It's the best in the industry. Absolutely the best amongst any competing standard that can exist, right. And this is what you get with 700 plus companies with an IO that becomes ubiquitous, people will innovate. And, you know, when you've got more and more people innovating,

Starting point is 00:25:57 you're going to get the best numbers. And that, that helps, right. That's a good feedback loop that's going on. And the Vibrand ecosystem with IP providers, so that way you can go and buy world-class IP from different vendors. And a lot of the IHVs tend to just focus on what they do the best. PCI Express, because it has got all of these ecosystem support,

Starting point is 00:26:22 you can get IPs, you can get validation infrastructure, both pre-Silicon and post-Silicon. So SIG has got a good compliance program, so that part of it is taken care of. That way it helps people focus on the other things where they can add the value. PCI Express offers a rich set of RAS features, reliability, availability, and serviceability.

Starting point is 00:26:43 All the transactions are protected by 32-bit CRC. Practically nothing passes through that. With link-level retry, and it can even cover dropped packets. So very... And it has been that way since Gen 1 days. There is hierarchical timeout support, and by that what we mean is

Starting point is 00:27:03 there are different levels of timeout in different parts of the hierarchy. So not all of them will be timing out at the same time, and you don't want multiple places where timeouts are happening, then you don't know what happened where, right? It's hierarchical in its nature, right? The one at the lowest level of the hierarchy

Starting point is 00:27:22 will timeout first, then the next level will timeout. If there is a bigger timer there, and it's all timer-based mechanism. A very well-defined algorithm for different error scenarios. We are very careful about not having things like what we call error pollution and all of those kind of things, which means that if there is an error, there is only one type of error that will get reported.

Starting point is 00:27:45 Because if there are multiple errors that happen for the same thing then you just don't know what happened when. And not only that, you will also report exactly what happened if it is a transaction type of an error. You've got the header that caused the error to be logged. So very elaborate set of advanced error reporting mechanism with logging and all of that information.

Starting point is 00:28:04 Everything from the physical layer to the link layer to the transaction layer has its own error reporting mechanism. Whenever there is a lane error, a lane fails more often, you can go for degraded link width because we support all of those widths in a mandatory manner. If you have a by 16 link and one lane is failing often, you will go down to a by manner. If you have a by 16 link and one lane is failing off, then you will go down to a by 8. If you've got something again failing,

Starting point is 00:28:28 you can go down to a by 4, to a by 2, to a by 1. So you've got all of those support, and then support for hot plug. You can do either planned hot plug, or you can do surprise hot plug, and the spec supports both. With the storage devices moving to PCI Express, we introduced the notion of downstream port containment and enhanced downstream port containment.

Starting point is 00:28:56 And the basic idea there is, if you have a root complex and you've got a PCI hierarchy, and each of them can be directly hooked to the root complex or go through switches, the error or failure in one of the SSDs, you don't want them to bring down the other SSDs, and hence the notion of the downstream port containment.

Starting point is 00:29:17 So what happens is that you, and in these kind of usage model, asynchronous removal of the SSD is fairly common, so what happens to the transactions that are outstanding to that SSD? If I have reads, for example,

Starting point is 00:29:32 that are outstanding to an SSD and they start timing out in my root port, then what happens is that if we didn't have all of these, I have no idea who it was targeting to, so I'm going to bring down the entire hierarchy so my storage would become inaccessible. So this basically provided the enhancements and it mandates that you're supposed to keep track of things on a per

Starting point is 00:29:54 device kind of a basis, right, at a very high level. That's the idea here. So it defines that mechanism and it tries to prevent the potential spread of corrupting the data while trying to keep the link back up. IO virtualization. So this is the other aspect, which is we all know that we need virtualization to reduce system cost and power, to improve the efficiency of the infrastructure and make sure that our infrastructure gets used more often. You don't want to have idle infrastructure sitting in your data center or even in your desktop or laptop or whatever, or handheld for that matter. Single-root IO virtualization happened in 2007, and this allowed multiple VMs, right, each of them is like an independent OS, to coexist in the same system. All of them gets orchestrated through I-O MMU, and you are taking a device, this is the PCIe device, this is a cartoon diagram of that. There's a physical function with multiple virtual functions, and this one gets

Starting point is 00:31:07 assigned to that VM, that assigns to that VM, that assigns, effectively you're time slicing a device. Or, if it is storage, you could start, think of it in terms of your allocating different units of storage to the VMs, right. So time or space slicing the device, and all of them get orchestrated through the virtual machine monitor which controls the config accesses but the rest of the dma accesses happen directly

Starting point is 00:31:33 between these and what pc express provides is this notion of it created this notion of physical function and virtual function and it created the notion of the IOMMU. If you want to make a, cast the IOMMU locally, it allowed that. So it allowed for the notion of what is host physical address, guest physical address, and you can provide that and also ask for the translations from the root port, locally cache it, and if the root port wanted to purge a particular TLB entry, you would get a part of that command, and then you would respond, right? It also provided things like the...

Starting point is 00:32:16 So that's the address translation services that it provided. It also provided with what is known as access control services. So that way, if you have a switch... Imagine you have a switch hierarchy and a bunch of devices underneath that, you don't want the devices to be talking directly to each other, especially if they're allotted to different VMs. If the translation happens in the root complex, effectively what happens is everything goes back up and then comes down. That way you get the real physical translation happening in the root complex, and then you're taking it back down. That way you get the real physical translation happening in the root complex, and then you're taking it back down.

Starting point is 00:32:46 So it defined all of those kind of the do's and the don'ts and codified in the base specification as to how things are supposed to work once you define that you are capable of doing ATS, which is the address translation service, or ACS, which is the access control service. Also we introduced the notion of page request interface,

Starting point is 00:33:07 and recently we introduced the notion of what we call PACID, process address space ID, to support direct assignment of I.O. to the user space. So that way, you don't have to go through an indirect mechanism or go through the driver to access the I.O. device. You could directly have the user space be exposed to the IO. And that way, IO can directly talk in the user space. So all of those mechanisms have been built into the spec. And these are more recent than the 2007 over there. So over time, these kind of things are moving on to take advantage or to enable the emerging usages

Starting point is 00:33:47 that we see in the broader ecosystem. Switching gears now to the form factors. We have a range of form factors, right? So you got the low power NVMe, which is M.2 and u.2 kind of form factor. These are mostly for your client kind of devices. You got the server performance NVMe, both as an add-in card form factor as well as your u.2. You also have the high profile, the server performance NVMe with low profile but taller form factors, right? By eight add-in card. And the power goes from low to high, naturally.

Starting point is 00:34:32 And you know, so is the amount of bandwidth that you get as well as the capacity that you get. And the different EDSFF family of form factors that exist and people are continuing to innovate in this area so that way you are going to have petabytes and petabytes of storage with your servers. So some of them are from the PCI-6, some of them are from other, different other places that are coming through, but they all run on PCI Express technology.

Starting point is 00:35:00 And this is where, again, the ecosystem will innovate based on the usage model, and we are happy to have that. Again, switching gears now, PCI-SIG has a very good compliance program. And SIG delivers the compliance program directly. So there are workshops that happen throughout the year, multiple workshops in multiple geos across the world. People come there with their devices. And the compliance program is fairly extensive. You test everything from electrically, whether your transmitters are having the right set of

Starting point is 00:35:44 voltage and timing parameters to check electrically, whether your transmitters are having the right set of voltage and timing parameters to check your receivers, whether you can tolerate stress die and all of that, to physical layer, do you link up properly under error conditions, to link layer, it will introduce errors, inject errors in the link and make sure that can you recover from that, to transaction layer, all the way to the software stack, right? It's a fairly extensive set of programs.

Starting point is 00:36:06 So think of it as you got the specs. From the specs, the SIG takes that and does compliance and interoperability test spec. These are formal specs, right, that come out. And they happen, you know, there is a little bit of lag, clearly. You know, your base spec cannot come out and the CNI test spec will be there, like, right away. One to two quarters later, they will come out. Then you've got your test hardware and software that will come from that. And

Starting point is 00:36:34 from there, you're going to run the entire compliance program, like I said, at each and every one of those layers. There is an extensive set of compliance tests. We will test it across different speeds, feeds, all of that. And there is clearly a pass or a fail. If you fail, not the end of the world. You go back, fix it, come back. A lot of the people that I know just use the SIG compliance program

Starting point is 00:37:01 to do the testing, because if you're a small company, you don't want to have the infrastructure, you just take it to the compliance program, figure out where all you failed, you know, try to debug it there, come back, turn the silicon and go back and do the testing again. So.

Starting point is 00:37:20 So the basic goal is that we want a predictable path for design compliance. And again, just because you passed compliance doesn't necessarily mean that you will interrupt, but your chances of interoperability goes up significantly. That's the whole idea here. And given that if you have an ecosystem of 750 companies, this is something that you need to do in order to make sure that people understand

Starting point is 00:37:47 what it takes for something to interoperate. Otherwise, it's very hard to get different people innovating at the same time to interoperate, right? If you have an open slot, which is PCI Express, this is the thing that you need to do in order to make sure that people have a good customer experience, right? So in conclusion,

Starting point is 00:38:07 from an IO perspective, PCI Express is the ubiquitous IO that goes across the entire compute continuum. And if you are on the CPU, you will have, of course, different CPUs targeted for the different segments, but they all have PCI Express coming out of them. It's a single standard.

Starting point is 00:38:31 We do not have different standards for handheld, for desktop, for server. Same standard, same silicon. Different fun factors, yes, but one standard, right, across the the board and that helps to focus our attention to deliver the best in class predominant direct interconnect and again scalable bandwidth both in terms of frequency and width low power definitely you know that's that's ingrained into us right we need to be low power in order not just for handheld but

Starting point is 00:39:06 also even for server. Server demands low power because if you consume a lot of power on the IO, you have less power left on the compute. People want to have more power on the compute because that's where you are getting a lot of the compute done. Everybody

Starting point is 00:39:22 is being squeezed to give more power back. So it's a common theme, right, across the board. High performance, of course, and predictive performance growth, spanning five generations with a very robust and mature compliance program, interoperability program, and the devices are everywhere, right? You can make

Starting point is 00:39:47 a PCI device and connect anywhere and figure out whether it works or not. So that helps in terms of the development. Not to mention the availability of a wide range of IPs and infrastructure, like testing infrastructure and all of that. All right. Thank you very much. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions

Starting point is 00:40:28 and discuss this topic further with your peers in the Storage Developer Community. For additional information about the Storage Developer Conference, visit

Your Ad Here

Storage Developer Conference - #103: PCI Express: What’s Next for Storage

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.