SemiWiki.com - Podcast EP281: A Master Class in the Evolving Ethernet Standard with Jon Ames of Synopsys

Starting point is 00:00:00 Hello, my name is Daniel Nennie, founder of SemiWiki, the open forum for semiconductor professionals. Welcome to the Semiconductor Insiders podcast series. My guest today is John Ames, principal product manager for the Synopsys Ethernet IP portfolio. John has been working in the communications industry since 1988 and has led engineering and marketing activities from the early days of switched Ethernet to the latest data center and high-performance computing Ethernet technologies. Welcome to the podcast, John. Hi, yeah, thanks. It's good to be here. Can you tell us how you first got started in

Starting point is 00:00:41 semiconductors, John? Sure, just a little bit of background, I guess. And back in 1988, I worked on my first intern project with a technology called FDDI. And back then, there was expectation that that action might replace Ethernet. So I built this project, there was a diagnostics card using a chipset from AMD. After I graduated, I started out as a software engineer working at 3Com, focused on network management for Ethernet systems. And then I moved into a product management role where I led the introduction of gigabit Ethernet over twisted pair, as well as the layer 3 Ethernet switching. So obviously, you know, kind of, you know, working in the these, with these products at 3Com, I became pretty much exposed to the

Starting point is 00:01:26 underlying semiconductors. And I thought it might be quite interesting to dip my toes into this space. So I joined a company called PMCCERA to manage marketing of Ethernet switching silicon. Now this took me on a slightly new path where I led the introduction of Ethernet over Sonnet. And for a while, my focus was on the telecom world. Yeah, interesting. And what brought you to Synopsys? So I joined Synopsys about two years ago, or just over two years ago,

Starting point is 00:01:54 and I was keen to get back into Ethernet technology. So I moved into this role to look after the family of Ethernet controllers. And these are kind of interesting, it's very broad. And these products are used in many applications from automotive all the way up to high-performance computing. Great. And can you provide a brief overview

Starting point is 00:02:14 of the latest advancements in Ethernet standards and what sets them apart from previous generations? Sure. So, I mean, Ethernet has been evolving and growing now for like 40 years or so. You know originally it was 10 megabits per second over a shared cable and then we went through twisted pair cabling which simplified the physical side of the network and made it more reliable just because of the way the cables and connects worked.

Starting point is 00:02:39 It was less unreliable than using this shared coaxial cable. After that, we added switching. This saw each workstation had its own 10 megabits Ethernet connection. Shortly after that, with these Ethernet switches, we started to see faster connections on the servers. We had like 10-dink from the desktop, and then 100 megabits to the servers. Then over time, we saw

Starting point is 00:03:04 100 megabits coming to every desktop And then over time, we saw 100 megabits coming to every desktop and so the network started to grow. So, in addition to getting faster LAN or local area network connections and overall more capacity, we also saw Ethernet starting to push into the metro and the long haul networks. So, for relatively modest reaches, I'd say Ethernet was running over fiber optic

Starting point is 00:03:28 and that was kind of sufficient. But then for like the real longer metro reaches and for long haul, we saw Ethernet being mapped to Sony SDH and then later to OTN to really traverse these telecom networks. Now in between, we had the mechanisms for Ethernet over ATM. And that was kind of viewed as a higher speed technology at the time, but more importantly,

Starting point is 00:03:52 it offered improved quality of service. And that was the kind of thing that was being adopted by service providers. But very soon, this went away once gigabit Ethernet became more mainstream, and 10 gigabit ethernet switching came as well, together with a whole bunch of advancements in quality of service for ethernet. So ethernet, the technology, became suited to all kinds of things from voice traffic

Starting point is 00:04:16 all the way up to industrial control networks. Along the way, there was another interesting development called virtual LAN technology or VLANs. This enabled a degree of security or at least traffic separation, allowing multiple IP networks to share a single physical infrastructure. In short then, we've seen many developments, increasing link speeds and switch capacity together with these dalliances with alternative technologies. But the long and the short of it is that we've been sending the same kinds of packets over the network. That hasn't changed. So, you know, end to end, you've had interoperability

Starting point is 00:04:54 of Ethernet packets, regardless of the speeds and as speeds have increased, still that same packet mechanism has been retained. So, for example, with Ethernet today, you could connect, say, a printer from the 90s that might have a 10 megabit Ethernet interface on it, and you could connect that to your super high performance computer with, say, a 10 gigabit Ethernet connection. And they would connect it via a simple switch, and you don't need any protocol converters there. I mean, you can even use the same kind of cable and the same connectors. I mean, you could even have these two endpoints, you know, the printer and the computer, they could be separated by a huge capacity

Starting point is 00:05:36 backbone, like a 1.6 terabyte backbone. But again, no protocol conversion is required from end to end. And if you compare that to the alternative, you alternative, mention the printer, compare that to the alternative where in the same time frame, we used to have printers connected on RS-232, then we had USB and file wire and so on. And I'm sure we've all got a box of various cables sitting in the cupboard or sitting down

Starting point is 00:05:59 in the basement with all these different technologies. And that's very different from what we've seen with Ethernet. So where are we today with this, then? Well, we're making some updates in Ethernet to support high-performance computing and AI workloads. So we're taking the latest high-speed Ethernet technology and sharpening it to increase AI processing performance and to maximize the output of an AI cluster

Starting point is 00:06:26 and ultimately bring down the power and cost of the infrastructure. So what we're talking about then with ultra-Ethernet is a mechanism for making better use of physical links and reducing the time that AI processes might be idle otherwise, waiting for a package to be received from what is inherently

Starting point is 00:06:45 a lossy network. Okay, so what is ultra ethernet? Can you talk a little bit more about that and how does it differ from traditional ethernet in terms of performance and capabilities? Sure, there is actually a lot more to ultra ethernet than just ethernet. So ultra ethernet itself is an entire stack, and it includes a networking layer, that's

Starting point is 00:07:09 IP, together with a new transport layer. So in reality, we have ultra-Ethernet transport over Ethernet. There are, however, some important additions down at the lower Ethernet layers as well, and I can talk more about those a little later. First, I should explain why AltaEthernet has been developed. And it's really to enable the massive scale out of AI infrastructure,

Starting point is 00:07:31 integrating many, many AI accelerator processes like GPUs, so they can work with huge data sets. And in fact, AltaEthernet is being developed and specified to be able to support up to like a million nodes. So it really is quite the massive scale out for these systems. So basically, any processor can access the memory of any other processor.

Starting point is 00:07:56 So previously, we had remote DMA over converged Ethernet or Rocky. And that provided similar memory access capabilities. But one really important attribute now with Alterethernet is it allows data to be received that isn't necessarily in the order that it was transmitted. So it doesn't stick to the old must-be-in-order mechanisms that we saw in the past with the past protocols. So if we look at the common transport layer today,

Starting point is 00:08:25 so TCP, this not only requires packets to arrive in order, but if a packet is lost, then a request is made to resend the lost packet, and then all subsequent packets that had previously been successfully transferred get retransmitted as well. Now, the reason that the packet was determined to be lost in the first place was due to a timeout expiring.

Starting point is 00:08:47 So not only does it take a while to detect that a packet has been lost, but then also the retransmission takes additional time. So we don't only stall the processor waiting for this lost packet, but also we load up the network with traffic that has to be transferred again. Now, this time taken due to the timeout and retransmission, this leads to a long latency for this specific packet.

Starting point is 00:09:13 And this outlier latency, that's what's known as tail latency. That term comes up a lot when talking about alter ethernet. So reducing this tail latency, that helps to maximize the time spent by the accelerators actually doing the job that they're meant to do. Enables you basically to perform more with a set number of accelerators. So another aspect of ultra-Ethernet, and that's something that's implemented at the transport layer, is increased utilization of the physical network connections.

Starting point is 00:09:44 So we've had multipath in Ethernet now for some time, transport layer is increased utilization of the physical network connections. So we've had multipath in Ethernet now for some time, but in Ethernet the multipathing relies on specific flows being directed over distinct network paths. So in simple terms, for example, traffic from 100 gig port can be split over 10 links of 10 gig each. The problem here is that to ensure packets arrive for a given application in the correct order, in the order that they were sent, then specific flows are mapped to each of the links. So

Starting point is 00:10:15 although this kind of balances the performance over these different links, the likely scenario is that some links will be oversubscribed, whereas other links will be underutilized. And note that if links will be oversubscribed, whereas other links will be underutilized. And note that if a link is oversubscribed, the likelihood is packets will be dropped. UltraEthernet, on the other hand, will use packet spraying, which is where packets are sent over the links according to available capacity, nothing to do with specific end

Starting point is 00:10:41 to end flows. So as the transport layer accepts out ofof-order packets, it's okay for packets, if the packets don't arrive in the order in which they were sent, which of course can happen if packets are sent over over different network paths. So if you consider what we're trying to do, we're actually trying to move data from one GPU's memory to that of another. It doesn't actually matter the order in which the memory is copied, but it certainly is advantageous if the copying process takes the shortest possible time so that the GPU can get on with this job. So, you know, we've spoken about reducing processing stalls in the

Starting point is 00:11:18 GPUs by minimizing tail latency, as well as maximizing the use of individual network connections with multi-path and packet spraying. So these are two key benefits. Basically, getting more done with the available GPUs means that more processing can be completed with a set number of GPUs. There's an obvious cost efficiency there. And then secondly, better utilization of the network infrastructure has both cost and power savings. So, you know, underused network connections are still consuming power. And so the aim is to always have useful data patting over these network links, you know,

Starting point is 00:11:55 all of the time, and then you get just the most efficient, the most efficient use of your infrastructure with the least power penalties. Okay. So how will the adoption of these new Ethernet standards impact existing network infrastructure and what should organizations consider when upgrading? So firstly, ultra Ethernet is backwards compatible with today's Ethernet infrastructure. Although some capabilities will be lost if you're using a regular switch from today.

Starting point is 00:12:30 So we haven't spoken about this yet, but UltraEthernet can support two functions that are implemented through changes down at the Ethernet, Mac and PCS layers. So these are link level retry and a credit-based flow control mechanism. So credit-based flow control will aid in preventing network overloading by using in-band signaling on the individual network links.

Starting point is 00:12:55 So this enables very short reaction times to situations changing in terms of usage and capacity on individual links. So very short reaction times, and that can prevent packets from being sent unless there's actually available capacity for the packet to get through the network. Link level retry is a little different, and that'll rapidly resend the packet

Starting point is 00:13:17 if it's not acknowledged by its link partner. So this removes the need to rely on protocol time marks before another request is sent across the network. So we're speeding things up and preventing the need for additional networking requests. Anyway, these two mechanisms would not be available across links that don't have the full implementation of ultra ethernet.

Starting point is 00:13:39 However, the other ultra ethernet transport mechanisms like out of order delivery and packet spring and so on, they all work just fine with legacy Ethernet switches. Long and the short of it then, yes, you can influence ultra-Ethernet with today's Ethernet switches and then deploying newer switches that support these link layer mechanisms, will give you some additional benefits in terms of

Starting point is 00:14:02 both utilization and reduction in tail latency across the network. What are some of the key use cases and applications that will benefit the most from these new ethernet standards, particularly ultra ethernet? Well, the principal target application is memory transfers across GPUs and this enables just the massive scaling of AI compute clusters. That's the key thing.

Starting point is 00:14:26 I mean, a good starting point is to make use of the fastest networking technology that's available today. So in other words, that's 800 gigabit ethernet or 1.6 terabit ethernet, utilizing the likes of 212 gigabits electrical signaling. So this gives huge throughput and low latency. So once we have the very fastest

Starting point is 00:14:45 infrastructure, we need to make sure we're getting the most effective use out of that infrastructure. So with ultra ethernet, we'll achieve the best possible GPU processing throughput with maximized network speeds, lowest end-to-end transfer latencies, and reduction or elimination of tail latency. And looking ahead, how do you see Ethernet standards evolving in the next five to ten years, and what innovations are on the horizon? Well, I mean, first of all, Ethernet was not replaced by FTDI back in the late 80s. So, you know, we worked on technology then, but it didn't really go anywhere, not in a major way. Also, it wasn't replaced by ATM in the mid 90s.

Starting point is 00:15:32 Instead, this, you know, Ethernet being a pervasive technology is continuing to evolve. So, for example, today Ethernet's very much taking over in the automotive and industrial wiring spaces. So you like mechanisms to improve predictability and reliability together with their smart traffic management, they continue to enable Ethernet to carry delay sensitive and mission critical data. In addition, there are new cabling and electrical standards that reduce the cost and weight of in-car cabling. And this is very important as we continue to try to make personal transport more efficient. So, and of course,

Starting point is 00:16:12 I'm expecting Ethernet to continue to increase in speeds. So, there's already activities looking into the next electrical signaling speeds. So, currently we're, you know, roughly at 200 gigabits per second. And we're looking at roughly 200 gigabits per second. And we're looking at the next electrical speeds up from that. And then with continued use of multi-lane ethernet, we'd expect to see the general ethernet rates to go up beyond 1.6 terabits per second. The only difference is, whereas in the earlier days,

Starting point is 00:16:41 we saw at each step, we saw increases, or we saw speeds increase like tenfold or so. Today we have to be happy when we see the line rates double. So you know we're currently at 1.6 terabits ethernet, we could expect in the coming years to see a 3.2 terabits ethernet, and so on and so forth. As we continue to see the infrastructure build out, we continue to see the maximization of the use of that infrastructure and then we'll also continue to see the speeds increasing to just drive essentially the biggest data centers that are possible. Great, thank you John. Enjoyed the conversation, I hope to have you back on the podcast sometime soon.

Starting point is 00:17:26 Okay, thanks very much. It's good to talk to you. And yeah, I'll be happy to continue the conversation anytime. Cheers then. That concludes our podcast. Thank you all for listening and have a great day.

CODACE Plant Stand

SemiWiki.com - Podcast EP281: A Master Class in the Evolving Ethernet Standard with Jon Ames of Synopsys

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

SemiWiki.com - Podcast EP281: A Master Class in the Evolving Ethernet Standard with Jon Ames of Synopsys

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.