Storage Developer Conference - #103: PCI Express: What’s Next for Storage
Episode Date: July 29, 2019...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast, Episode 103.
All right, good morning. My name is Devendra Das Sharma, and I work for Intel.
I'm a PCI-SIG board member. I'm going to talk about PCI Express, what's coming up in general for the technology, and more specifically for the storage.
Here's a brief agenda. We'll talk about how PCI Express has evolved, how it delivers power-efficient performance,
significant number of RAS enhancements that we have done over the years, IO virtualization, different form
factors, compliance program by the PCI SIGIG is a fairly huge body with 750 member companies worldwide,
and has developed the IO technologies for more than three decades, right?
And we'll talk about that.
So fairly healthy ecosystem that exists with the PCI-SIG
and with the PCI Express technology that we drive.
Back in 1992, this is what the PCI,
PCI-X-based system used to look like.
It was a bus-based system.
You had multiple PCI devices sitting on the bus.
You had a CPU connected to a host bridge.
Memory used to be there.
And this was more of a PC-centric technology.
Graphics was not on PCI bus.
It was an AGP bus.
And then you got either PCI devices
hooked directly to the host bridge,
or you had bridges followed by bridges,
and then you had a bunch of PCI devices
hanging off the bridge.
So we started in 1992 with PCI.
There were about five to six generations of evolution in that technology.
You started off with around 32 bits, 33 megahertz,
went to 64 bits, 33 megahertz, 64 bits, 66 megahertz,
every single time effectively doubling the bandwidth on the bus,
and went
for five generations, six if you count the QDR data rate that happened with PCI.
And at some point, bus-based systems ran out of bandwidth.
You cannot deliver enough bandwidth when multiple devices coexist on a bus.
So around 2004, we moved from a bus-based interconnect to a links-based interconnect, which is the PCI Express.
It's a full duplex differential signaling because it's much more pin-efficient.
You can deliver a lot of bandwidth with a lot less pins.
All of this backwards compatibility that existed with the bus-based system, when we moved to the links-based interconnect, we naturally, you cannot make things in a
hardware backwards compatible manner. Silicon-based compatibility
or even form factor-based compatibility, we had to break
those. But what we did was we maintained the
software-based backwards compatibility. You can still take a PCI driver and
it will run on the PCI Express-based system.
And we also maintained
the architectural producer-consumer ordering model
for data consistency.
Those are the critical pieces that carried along.
We evolved on top of that,
but the fundamental basics of it was continued forward,
so you could completely interoperate
even through this transition.
And during that, you know, the transition basically looked like you got CPU, you got your root complex,
notice that graphics becomes PCI Express, and then these are direct PCI Express links in which you have networking, storage, or you could go through switches and build your hierarchy of PCI Express,
or you could even put a PCI bridge in order to manage that transition,
because not everybody would have moved from PCI bus-based devices to PCI Express-based
links. So that's how the transition was managed. It was fairly successful from that perspective,
especially with the bridges and especially with software-based backwards compatibility.
It was a fairly smooth transition. Around 2004, we had the first generation
of PCI Express-based systems out
and available in the marketplace.
And then evolution has continued.
From there, we have moved into an SOC-based methodology
where everything gets integrated into the CPU,
and we'll see a picture of that.
But effectively, everything goes into a CPU complex here.
You've got PCI Express links coming out directly from the CPU,
and currently we are in the fifth generation of the technology.
We double the bandwidth every single generation, right?
So 2004 through now,
we are on the verge of delivering
the fifth generation of the technology,
so 32x in terms of the bandwidth per pin.
Every single generation we double the bandwidth.
And during this journey, the entire compute landscape has seen a lot of changes.
We have moved from PC-based systems to mostly handheld devices
and a bunch of devices that are out there, right?
The things and devices connected to data centers
and the edge and everything.
So the compute landscape has changed significantly,
and all throughout this,
PCI Express has remained as the ubiquitous IO technology
that is driving this revolution.
In the context of storage, as the ubiquitous IO technology that is driving this revolution.
In the context of storage, what has happened is that because of all of these things and devices,
billions of them that are out there,
there is an explosion in the data.
And with data, you have to have
three things that needs to happen.
You have to store more, you have to move more,
and you have to have three things that needs to happen. You have to store more, you have to move more, and you have to process more.
And that is what is triggering this,
what we call, virtuous cycle of growth.
So from a storage perspective, there is a data explosion
that is driving SSD innovations and adoption.
And you will see that, you know,
if you look into PCI Express versus other types of interconnects
that connect to the
storage, we are
on the upswing.
Whether it is number of units
or whether it is the number of petabytes
and the CAGRs are already there.
But fundamentally what's happening is
when you have
this volume of data, when you have to process
so much, what is the interconnect that can deliver you
the lower latency, the higher bandwidth, right?
And bandwidth that is scalable bandwidth,
not just from a speeds and feeds point of view.
In PCI Express, you can move single lane,
two lanes, four lanes, eight lanes,
all the way to 16 lanes
and deliver a lot of the bandwidth.
In a low latency manner coming from the CPU,
the natural choice is PCI Express.
And with NVM Express, that move happened very rapidly.
And that is what is moving a lot of this move
into the PCI Express-based links.
When that happens, when fundamentally storage
is no longer the bottleneck,
there's a lot of pressure on the networking side because you need to move the data faster.
So there is more of a revolution going on there.
The transition from 10 to 40 to 100 to 200 is happening at a much more faster pace
than what probably anybody would have projected four or five years back.
And when that happens,
the other thing that is happening is you have to process the data.
So you've got a lot of AI, neural networks,
all of that kind of things that are coming in,
and you see a bunch of accelerators coming up.
All of that is driving a lot of bandwidth demand on the IO,
which is the virtual cycle that I referred to.
So whether it is in the context of storage
or any other application,
we will take a look into this in more detail.
There is a pictorial version of it,
but PCI Express 3.0 came out in 2010.
Products are out in 2011, and I will show a picture of that.
4.0, 2017.
5.0, we are expecting 0.9
end of this year of the spec, and a final version of the spec sometime in Q1 of 2019.
So this picture speaks to that particular evolution that I was referring to. Around
2004, we came out with Gen 1. At that time, you had CPUs connected through the front side bus.
And this is, you know, in the server context, normally we refer to the higher volume, highest
volume is two socket server.
So everything is normalized here with respect to, up until this point, right, is with respect
to a two socket based system.
So CPUs are connected through front side bus. You've got your hub here that
is connecting to memory, which is an ASIC, and then you've got a bunch of PCI Express
lanes coming out. At that time, there were about 28 lanes coming out. So if you, if you
do it on a per CPU socket basis, you get 14 lanes per CPU socket.
Notice that IO has moved to differential signaling in this time frame when whereas
the coherency link is still on a bus based system right so io has preceded in terms of where uh
cpu to cpu connectivity had been and this is not just you know this is across the board right
across different vendors if you look into it in that time frame.
From 2.5 gigatransfers per second,
PCIe Gen 2 products came out in 2007 in the market.
And, you know, by this time frame,
notice that memory has moved to the CPU, right,
in order to get more memory bandwidth.
Demand is going up, so you need to deliver not just more number of cores,
but more amount of memory bandwidth. You've got to feed the beast. And then these are
the coherency-based links, and you still had an IO hub kind of a concept, which was basically
which would take the coherent link and transfer it into PCI Express. Notice the lane count.
It has gone up from 14 lanes per socket to either 18 or 36 lanes per socket, depending
on a rich IO kind of a topology, in which case you will deliver 36 times to 72 lanes
of PCI Express. Those that don't need as much, you will deliver 36 lanes of PCI Express.
Doubling the bandwidth, and also more than doubling the number of lanes, depending on
your usage model. PCI Express Gen 3-based products come out in 2011,
and you got CPUs connected through coherency links, and you got memory. And just like memory
had moved to the CPU, PCI Express moves to the CPU. Huge move from an interconnect technology
point of view. In order to get into the CPU, you have to be the ubiquitous IO of choice.
If there are five or six contenders,
you're not going to put them on the CPU.
You will put some kind of a translator chip
and then have IO coming out of it, right?
So by then, it's already well established
that, you know, on a platform,
you're not going to spend the real estate,
the power, and all of that
through a different component.
With Moore's Law,
as you're getting more and more die area on the CPU,
you move things into
the CPU. That's where CPU gets,
IO gets integrated into the CPU.
Number of lane
count increases yet again.
Starting, it was 40 lanes
and today, if you look into, a lot
of people are offering more than 100 lanes
of PCI Express coming out of the CPU socket.
Huge, huge number of lanes coming out of the CPU socket. And again, if you look into the usages, networking tends to be, you will put one networking device, typically
a per-slot bandwidth consumer, which is the per-slot bandwidth that is being shown here.
Storage will be lots of storage devices,
lots of SSDs on the system,
so it's a fan-out consumer.
More number of lanes is what it wants to consume,
and also it's an aggregate IO bandwidth consumer.
You are going to aggregate your bandwidth
across multiple devices,
and accelerators are in both the categories.
They can consume, each of them can consume
as much bandwidth as you can deliver,
and people are putting multiple accelerators in the system.
Combination of all of these three are moving both the vectors forward.
Number of lanes per socket and bandwidth per pin, which is the frequency, speeds and feeds.
Both of them are getting pushed up.
Gen 4 timeframe, you have 16 giga transfers per second. That's when the
spec comes out. And then Gen 5 is 32 giga transfers per second. As I said, we are expecting it in
0.9 is pretty much when things are more or less stable. We are expecting that this year,
by the end of this year. And Q1 2019 is when the 1.0 will come out. So we are doubling the data rate every single generation.
Now you'll notice that 5 to 8 is not quite the doubling of the data rate,
but what happened was we changed the fundamental encoding mechanism.
We moved from an 8-bit 10-bit encoding to 128-bit, 130-bit encoding.
So if you take the encoding efficiency, it's 1.25 gain
multiplied by 1.6 in the data rate,
and you've got a 2x improvement in the bandwidth per pin.
The reason we did that is...
The reason we did that is that we wanted to
still maintain the 20-inch two-connector connection
in the server without requiring any retimer kind of a device or without requiring expensive
material or without requiring power, extra power, right? Any of these trade-offs you
can make and run the link at 10 gigatransfers per second. But in the interest of making the technology not become a niche,
we decided to take the hit on the logic side, put an extra encoder on the logic, which is,
again, with Moore's law, that's not a big deal. But keep the channel reach the same.
But Gen 3 to Gen 4, you either have to have better materials. And this gave us, there is
enough time in here for us to go and make the changes in the entire industry
because what used to be lower loss material,
expensive material,
are now becoming more and more mainstream on the board.
So a lot of people can route on their servers
by making the channels a little shorter
without requiring retimers.
And those that need the extra length
have got retimer-based devices
to extend the
channel reach. In addition to the speeds and feeds, you will see that, you know, we are introducing
new things in terms of the protocol. You know, around the Gen 2 timeframe, we introduced IO
virtualization. So this is where you had multiple virtual machines running on the server. And in order to give customers or users the experience as if each VM owns its own device,
so we virtualized the devices.
So that way you can do a direct association between a VM and a virtual function or a VM or a virtual device.
Effectively, a device can present itself as multiple virtual devices,
and you can associate them.
And PCI Express, as a specification,
enabled that by making all of those I-O virtualization
extensions on the base specification itself.
So those are part of that,
and this is when the product started reflecting
I-O virtualization in the root complex, right,
as well as on the device side.
When Gen 3 came along, you will notice that there are atomic ops,
gassing hints, lower latency, and fundamentally what's happening there is,
in this time frame, accelerators are becoming popular,
and you want to be able to have a way to offer an accelerator to atomic
operations, not just to the host memory, but also amongst themselves through the CPU host.
So PCI Express base specification got modified to introduce the notion of atomic operations,
so you could atomically modify things, right? Different atomic operations were introduced.
We introduced the notion of caching hints and lower latency especially with
IO moving into the CPU
what can happen is you can now
take advantage of the
caching hierarchy inside the CPU
that way the
effectively
descriptors and things like that that you are
reusing often can reside in the
caching hierarchy you don't have to go to
the system memory in order or even. You don't have to go to the system memory
in order, or even worse, you don't have to go
through the coherent link to the system memory
on the other side to access those.
So things that you are going to access more frequently,
you could take advantage of it through hints on the link,
and the processing element there would try to keep it
in its caching hierarchy
because you give the hint
that you're going to have locality of reference
as far as those addresses are concerned.
And effectively, what that has enabled us to do
is that that has made the transition for the,
for example, networking fairly smooth
because we could store all the descriptors locally
in the last level cache,
and then the cores can access them
without having to go to the memory.
And that has enabled us to deliver
10 gigabit networking as well as 40
and all of that very smoothly, right?
People that were way back when people were looking into
TCP offload on Ethernet,
the thinking there was there is going to be so much of processing involved,
there is going to be so much of going back and forth
to the system memory involved that
it's better to do all of that processing
outside. With
all of these changes, we haven't seen that
becoming the bottleneck. We have been able to
deliver the line rate on PCI Express.
Improved
power management becomes an interesting concept
in this time frame because what has happened is during this compute evolution, handhelds have become very popular. And with PCI Express being a PC-based technology, we always had low power states. But our low power states were in the milliwatts range. And if you have your system connected to a power supply,
milliwatts is a fairly good low-power state to be in.
But once you are on your battery-charged cell phone
or whatever, milliwatts doesn't quite cut it.
You need to be in the microwatts range,
and that's where MIPI was doing a pretty good job.
And at some point, we we had even tied with the notion
of doing PCI Express protocol on MIPI 5.
MPCI was the concept that got introduced.
However, that did not become very popular
because PCI Express, we went ahead
and fixed the power saving state
and introduced a very aggressive L1 substate notion
that took our power consumption down
in the idle system
to microwatts range.
With all of these changes, you will see that
we are always
not just trying to double the data
rate, but looking into the type of applications
that are coming up, and we are making the right
set of changes in the IO technology
to be able to match with those needs. These are the things that are coming up, and we are making the right set of changes in the IO technology to be able to match with those needs. So these are the things
that are coming up, and then in addition to power enhancements, we have form factor
and usage models. We'll talk about that. So fundamentally
we double the bandwidth every three to four years, and we make all the
changes that we need to get the ecosystem
going.
PC Express is a layered architecture,
and that helps us make the,
make these transitions move, go through, right?
So you'll see that, you know,
the software, we try to preserve it.
There are enhancements, but we try to preserve that.
Transaction is a split transaction, packet-based protocol, very modern concept introduced since the PCI Express Gen 1 days. Credit-based
flow control, virtual channels to guarantee quality of service. Data link
layer is responsible for reliable data transport services through
CRC, Retry, ACNAC, all of those things.
So very high reliability-based links.
And if you have done a failure-in-time analysis,
you will see that this link is pretty much nothing gets through.
Your fit is going to be significantly lower than 1.
Let's put it in that way.
Orders of magnitude lower than one, right?
Logical, phi, and electrical
are there for physical information exchange,
interface initialization and maintenance.
And so because these are very specific functionality,
whenever we do the data rate change,
we just change this part.
The rest of them remain the same.
Whenever we make a protocol enhancements, we just change this part. The rest of them remain the same. Whenever we make a protocol enhancement, we just make the changes here.
The rest of them can remain the same.
So you can make the changes in different parts without having to move everything at the same time.
And then the mechanical is the specific one, which is a market segment-based form factor.
Because you don't really expect your form factors to be
the same your form factor in a smartphone is very different than a form factor in a 2u chassis for
example right very different form factor so it's different types of form factors which pci express
allows but all of them all of them work with the same silicon. You
can take the silicon that came out in PCI Express
Gen 1, single lane, and it's guaranteed to
interoperate with a PCI Express Gen 5 device,
which is 16 lanes. Of course, it will operate
at the least common denominator, which is it
will work as Gen 1 and Bi 1, but you've got
interoperability across the board.
And that's a guarantee by the way the
specification has been done.
PCI-Seq recently had a liaison with SD
Association, and we are extremely, we are
very excited about it.
It has got a huge volume in the IoT and handheld segment,
and SD is looking to start using PCIe 3.0
to deliver higher data rate,
and as you see that, and that's called SD Express,
and it's going to take advantage of our compliance program,
which we will talk about.
It's a big, big volume, right?
So we have an MOU and we are doing this jointly.
It'll run on PCI Express technology.
PCI Express, it delivers the power-efficient performance,
and from a scalability point of view,
you got variable widths.
You got by one, you got by two, by four, by eight.
By 12 is not popular.
By 16, and technically there is a by 12,
and technically there is a by 32, but those don't exist from a real market point of view.
So you can just ignore the by 12 there.
So width-wise you can scale, right?
Single lane, two lanes, four lanes, eight lanes, 16 lanes.
Frequency-wise, we have got five generations like we talked about.
And low power, we, as we
were talking about earlier, we have a very rich
set of link and device states. On the link
side we got different link substates, and I
was talking about L1 substates, how that reduce the
idle power consumption to micro watts. From a device
power state we have got a set of rich device
power states, so that way not only is the link idle, but also you can put the device
into a lower power state. And there are platform level optimizations and hooks, like dynamically
you can allocate power. When you want to give more power to one device and less to the other
device, you could do that dynamically in the platform. There is optimized buffer plus fill kind of mechanisms.
The basic idea there is you want all of your
I.O. as well as processing to happen within a certain window
and then have the rest of the platform be at rest or in the idle state.
That way this is trying to coordinate when multiple devices are being active.
That way the platform can save more power.
It doesn't help you if you have got 10 devices in the system,
and at any given point of time, somebody is using the link,
and the rest nine of them are in a low-power state.
So this is trying to coordinate that activity.
And this goes along with that.
Every device reports how much of latency it can tolerate.
So that way, the platform can orchestrate the power savings going through the entire platform.
We have very low active power.
So idle power is, like I said, in the microwatts, standby power.
Active power is in the 5 picojoule per bit.
It's the best in the industry.
Absolutely the best amongst any competing standard that can exist,
right. And this is what you get with 700 plus companies with an IO that becomes ubiquitous,
people will innovate. And, you know, when you've got more and more people innovating,
you're going to get the best numbers. And that, that helps, right. That's a good feedback
loop that's going on. And the Vibrand ecosystem with IP providers,
so that way you can go and buy world-class IP
from different vendors.
And a lot of the IHVs tend to just focus
on what they do the best.
PCI Express, because it has got
all of these ecosystem support,
you can get IPs, you can get validation infrastructure,
both pre-Silicon and post-Silicon.
So SIG has got a good compliance program,
so that part of it is taken care of.
That way it helps people focus on the other things
where they can add the value.
PCI Express offers a rich set of RAS features,
reliability, availability, and serviceability.
All the transactions are protected by 32-bit CRC.
Practically nothing passes through that.
With link-level retry,
and it can even cover dropped packets.
So very...
And it has been that way since Gen 1 days.
There is hierarchical timeout support,
and by that what we mean is
there are different levels of timeout
in different parts of the hierarchy.
So not all of them will be timing out at the same time,
and you don't want multiple places
where timeouts are happening,
then you don't know what happened where, right?
It's hierarchical in its nature, right?
The one at the lowest level of the hierarchy
will timeout first,
then the next level will timeout.
If there is a bigger timer there,
and it's all timer-based mechanism.
A very well-defined algorithm for different error scenarios.
We are very careful about not having things like what we call error pollution
and all of those kind of things, which means that if there is an error,
there is only one type of error that will get reported.
Because if there are multiple errors
that happen for the same thing
then you just don't know what happened when.
And not only that, you will also report
exactly what happened if it is a transaction type of an error.
You've got the header that caused the error to be logged.
So very elaborate set of advanced error reporting mechanism
with logging and all of that information.
Everything from the physical layer to the link layer to the transaction layer
has its own error reporting mechanism. Whenever
there is a lane error, a lane fails more often, you can go for
degraded link width because we support all of those
widths in a mandatory manner. If you have a by 16 link and
one lane is failing often, you will go down to a by manner. If you have a by 16 link and one lane is failing off,
then you will go down to a by 8.
If you've got something again failing,
you can go down to a by 4, to a by 2, to a by 1.
So you've got all of those support,
and then support for hot plug.
You can do either planned hot plug,
or you can do surprise hot plug,
and the spec supports both.
With the storage devices moving to PCI Express,
we introduced the notion of downstream port containment and enhanced downstream port containment.
And the basic idea there is,
if you have a root complex and you've got a PCI hierarchy,
and each of them can be directly hooked to the root complex
or go through switches,
the error or failure
in one of the SSDs, you don't want them
to bring down the other SSDs, and hence
the notion of the downstream port containment.
So what happens is that
you, and
in these kind of usage
model, asynchronous removal of the
SSD is fairly common,
so what happens to the transactions
that are outstanding to that SSD?
If I have reads, for example,
that are outstanding to an SSD
and they start timing out in my root port,
then what happens is that
if we didn't have all of these,
I have no idea who it was targeting to,
so I'm going to bring
down the entire hierarchy so my storage would become inaccessible. So this basically provided
the enhancements and it mandates that you're supposed to keep track of things on a per
device kind of a basis, right, at a very high level. That's the idea here. So it defines
that mechanism and it tries to prevent the potential spread of corrupting the data while trying to keep the link back up.
IO virtualization.
So this is the other aspect, which is we all know that we need virtualization to reduce system cost and power,
to improve the efficiency of the infrastructure and make sure that our infrastructure gets used more often.
You don't want to have idle infrastructure sitting in your data center or even in your desktop or laptop or whatever, or handheld for that matter.
Single-root IO virtualization happened in 2007, and this allowed multiple VMs, right, each of them is like an independent OS, to coexist in the same system.
All of them gets orchestrated through I-O MMU, and you are taking a device, this is the PCIe device, this is a cartoon diagram of that. There's a physical function with multiple virtual functions, and this one gets
assigned to that VM, that assigns to that VM,
that assigns, effectively you're time slicing a device.
Or, if it is storage, you could start, think
of it in terms of your allocating different units
of storage to the VMs, right. So time or
space slicing the device, and all of them get
orchestrated through the virtual
machine monitor which controls the config accesses but the rest of the dma accesses happen directly
between these and what pc express provides is this notion of it created this notion of physical
function and virtual function and it created the notion of the IOMMU.
If you want to make a, cast the IOMMU locally, it allowed that.
So it allowed for the notion of what is host physical address, guest physical address,
and you can provide that and also ask for the translations from the root port, locally cache it, and if the root port wanted to purge a particular TLB entry,
you would get a part of that command,
and then you would respond, right?
It also provided things like the...
So that's the address translation services that it provided.
It also provided with what is known as access control services.
So that way, if you have a switch... Imagine you have a switch hierarchy and a bunch of devices underneath that, you
don't want the devices to be talking directly to each other, especially if they're allotted
to different VMs. If the translation happens in the root complex, effectively what happens
is everything goes back up and then comes down. That way you get the real physical translation
happening in the root complex, and then you're taking it back down. That way you get the real physical translation happening in the root complex,
and then you're taking it back down.
So it defined all of those kind of
the do's and the don'ts
and codified in the base specification
as to how things are supposed to work
once you define that you are capable of doing
ATS, which is the address translation service,
or ACS, which is the access control service.
Also we introduced the notion of page request interface,
and recently we introduced the notion of what we call PACID,
process address space ID, to support direct assignment of I.O. to the user space.
So that way, you don't have to go through an indirect mechanism
or go through the driver to access the I.O. device.
You could directly have the user space
be exposed to the IO. And that way, IO can directly talk in the user space. So all of those
mechanisms have been built into the spec. And these are more recent than the 2007 over there.
So over time, these kind of things are moving on to take advantage or to enable the emerging usages
that we see in the broader ecosystem.
Switching gears now to the form factors. We have a range of form factors, right? So you
got the low power NVMe, which is M.2 and u.2 kind of form factor.
These are mostly for your client kind of devices.
You got the server performance NVMe, both as an add-in card form factor as well as your u.2.
You also have the high profile, the server performance NVMe with low profile but taller form factors, right?
By eight add-in card.
And the power goes from low to high, naturally.
And you know, so is the amount of bandwidth that you get
as well as the capacity that you get.
And the different EDSFF family of form factors that exist
and people are continuing to innovate in this area
so that way you are going to have petabytes and petabytes of storage
with your servers. So some of them are from the
PCI-6, some of them are from other, different other places that are coming
through, but they all run on PCI Express technology.
And this is where, again, the ecosystem will innovate based on the usage
model, and we are happy to have that.
Again, switching gears now, PCI-SIG has a very good compliance program.
And SIG delivers the compliance program
directly. So there are workshops that happen throughout the year,
multiple workshops in multiple geos across the world. People
come there with their devices. And the compliance program is fairly
extensive. You test everything from electrically, whether your transmitters are having the right set of
voltage and timing parameters to check electrically, whether your transmitters are having the right set of voltage
and timing parameters to check your receivers,
whether you can tolerate stress die and all of that,
to physical layer, do you link up properly under error conditions,
to link layer, it will introduce errors, inject errors in the link
and make sure that can you recover from that,
to transaction layer, all the way to the software stack, right?
It's a fairly extensive set of programs.
So think of it as you got the specs.
From the specs, the SIG takes that
and does compliance and interoperability test spec.
These are formal specs, right, that come out.
And they happen, you know, there is a little bit of lag, clearly.
You know, your base spec cannot come out
and the CNI test spec will be there, like, right away. One to two quarters later, they will
come out. Then you've got your test hardware and software that will come from that. And
from there, you're going to run the entire compliance program, like I said, at each and
every one of those layers. There is an extensive set of compliance tests. We will test it across different speeds,
feeds, all of that.
And there is clearly a pass or a fail.
If you fail, not the end of the world.
You go back, fix it, come back.
A lot of the people that I know
just use the SIG compliance program
to do the testing,
because if you're a small company,
you don't want to have the infrastructure,
you just take it to the compliance program,
figure out where all you failed, you know,
try to debug it there, come back,
turn the silicon and go back and do the testing again.
So.
So the basic goal is that we want a predictable path
for design compliance.
And again, just because you passed compliance doesn't necessarily mean that you will interrupt,
but your chances of interoperability goes up significantly.
That's the whole idea here.
And given that if you have an ecosystem of 750 companies,
this is something that you need to do
in order to make sure that people understand
what it takes for something to interoperate.
Otherwise, it's very hard to get different people innovating
at the same time to interoperate, right?
If you have an open slot, which is PCI Express,
this is the thing that you need to do
in order to make sure that people have
a good customer experience, right?
So in conclusion,
from an IO perspective,
PCI Express is the ubiquitous IO
that goes across the entire compute continuum.
And if you are on the CPU,
you will have, of course, different CPUs
targeted for the different segments,
but they all have PCI Express coming out of them.
It's a single standard.
We do not have different standards for handheld,
for desktop, for server.
Same standard, same silicon.
Different fun factors, yes,
but one standard, right, across the the board and that helps to focus our
attention to deliver the best in class predominant direct interconnect and again scalable bandwidth
both in terms of frequency and width low power definitely you know that's that's ingrained into
us right we need to be low power in order not just for handheld but
also even for server. Server
demands low power because if you
consume a lot of power
on the IO, you have less power
left on the compute. People want to have
more power on the compute because that's where
you are getting a
lot of the compute done. Everybody
is being squeezed to give more power back.
So it's a common theme, right, across the board.
High performance, of course,
and predictive performance growth,
spanning five generations
with a very robust and mature
compliance program, interoperability program,
and the devices are everywhere, right? You can make
a PCI device and connect anywhere and figure out whether it works or not. So that helps
in terms of the development. Not to mention the availability of a wide range of IPs and
infrastructure, like testing infrastructure and all of that. All right.
Thank you very much.
Thanks for listening.
If you have questions about the material presented in this podcast,
be sure and join our developers mailing list by sending an email to
developers-subscribe at sneha.org. Here you can ask questions
and discuss this topic further with your peers in the
Storage Developer Community. For additional information
about the Storage Developer Conference, visit