SemiWiki.com - Podcast EP281: A Master Class in the Evolving Ethernet Standard with Jon Ames of Synopsys
Episode Date: April 4, 2025Dan is joined by Jon Ames, principal product manager for the Synopsys Ethernet IP portfolio. Jon has been working in the communications industry since 1988 and has led engineering and marketing activi...ties from the early days of switched Ethernet to the latest data center and high-performance computing Ethernet technologies.… Read More
Transcript
Discussion (0)
Hello, my name is Daniel Nennie, founder of SemiWiki, the open forum for semiconductor
professionals.
Welcome to the Semiconductor Insiders podcast series.
My guest today is John Ames, principal product manager for the Synopsys Ethernet IP portfolio.
John has been working in the communications industry since 1988 and has led
engineering and marketing activities from the early days of switched Ethernet to the latest data
center and high-performance computing Ethernet technologies. Welcome to the podcast, John.
Hi, yeah, thanks. It's good to be here. Can you tell us how you first got started in
semiconductors, John? Sure, just a little bit of background, I guess. And back in 1988, I worked on my first
intern project with a technology called FDDI. And back then, there was expectation that that
action might replace Ethernet. So I built this project, there was a diagnostics card using a
chipset from AMD. After I graduated, I started out as a software engineer working at 3Com,
focused on network management for Ethernet systems. And then I moved into a product management
role where I led the introduction of gigabit Ethernet over twisted pair, as well as the
layer 3 Ethernet switching. So obviously, you know, kind of, you know, working in the
these, with these products at 3Com, I became pretty much exposed to the
underlying semiconductors. And I thought it might be quite interesting to dip my toes
into this space. So I joined a company called PMCCERA to manage marketing of Ethernet switching
silicon. Now this took me on a slightly new path where I led the introduction of Ethernet
over Sonnet. And for a while, my focus was on the telecom world.
Yeah, interesting.
And what brought you to Synopsys?
So I joined Synopsys about two years ago,
or just over two years ago,
and I was keen to get back into Ethernet technology.
So I moved into this role
to look after the family of Ethernet controllers.
And these are kind of interesting, it's very broad.
And these products are used in many applications
from automotive all the way up to high-performance computing.
Great.
And can you provide a brief overview
of the latest advancements in Ethernet standards
and what sets them apart from previous generations?
Sure.
So, I mean, Ethernet has been evolving
and growing now for like 40 years or so.
You know originally it was 10 megabits per second over a shared cable and then we went
through twisted pair cabling which simplified the physical side of the network and made
it more reliable just because of the way the cables and connects worked.
It was less unreliable than using this shared coaxial cable. After that, we added switching.
This saw each workstation
had its own 10 megabits Ethernet connection.
Shortly after that, with these Ethernet switches,
we started to see faster connections on the servers.
We had like 10-dink from the desktop,
and then 100 megabits to the servers.
Then over time, we saw
100 megabits coming to every desktop And then over time, we saw 100 megabits coming
to every desktop and so the network started to grow.
So, in addition to getting faster LAN
or local area network connections
and overall more capacity, we also saw Ethernet
starting to push into the metro and the long haul networks.
So, for relatively modest reaches,
I'd say Ethernet was running over fiber optic
and that was kind of sufficient.
But then for like the real longer metro reaches
and for long haul, we saw Ethernet being mapped
to Sony SDH and then later to OTN
to really traverse these telecom networks.
Now in between, we had the mechanisms for Ethernet over ATM.
And that was kind of viewed as a higher speed technology
at the time, but more importantly,
it offered improved quality of service.
And that was the kind of thing that was being adopted
by service providers.
But very soon, this went away once gigabit Ethernet
became more mainstream,
and 10 gigabit ethernet switching came as
well, together with a whole bunch of advancements in quality of service for ethernet.
So ethernet, the technology, became suited to all kinds of things from voice traffic
all the way up to industrial control networks.
Along the way, there was another interesting development called virtual LAN technology
or VLANs.
This enabled a degree of security or at least traffic separation, allowing multiple IP networks
to share a single physical infrastructure.
In short then, we've seen many developments, increasing link speeds and switch capacity
together with these dalliances with alternative technologies. But the long and the short of it is that we've been sending the same kinds of packets
over the network. That hasn't changed. So, you know, end to end, you've had interoperability
of Ethernet packets, regardless of the speeds and as speeds have increased, still that same
packet mechanism has been retained. So, for example, with Ethernet today, you could connect, say, a printer from the 90s that
might have a 10 megabit Ethernet interface on it, and you could connect that to your
super high performance computer with, say, a 10 gigabit Ethernet connection.
And they would connect it via a simple switch, and you don't need any protocol converters
there. I mean, you can even use the same kind of cable and the same connectors.
I mean, you could even have these two endpoints,
you know, the printer and the computer, they could be separated by a huge capacity
backbone, like a 1.6 terabyte backbone.
But again, no protocol conversion is required from end to end.
And if you compare that to the alternative, you alternative, mention the printer, compare that to the alternative
where in the same time frame, we used
to have printers connected on RS-232,
then we had USB and file wire and so on.
And I'm sure we've all got a box of various cables
sitting in the cupboard or sitting down
in the basement with all these different technologies.
And that's very different from what we've seen with Ethernet.
So where are we today with this, then?
Well, we're making some updates in Ethernet
to support high-performance computing and AI workloads.
So we're taking the latest high-speed Ethernet technology
and sharpening it to increase AI processing performance
and to maximize the output of an AI cluster
and ultimately bring down the power
and cost of the infrastructure.
So what we're talking about then with ultra-Ethernet
is a mechanism for making better use of physical links
and reducing the time that AI processes
might be idle otherwise,
waiting for a package to be received
from what is inherently
a lossy network.
Okay, so what is ultra ethernet?
Can you talk a little bit more about that
and how does it differ from traditional ethernet
in terms of performance and capabilities?
Sure, there is actually a lot more to ultra ethernet
than just ethernet.
So ultra ethernet itself is an entire stack, and it includes a networking layer, that's
IP, together with a new transport layer.
So in reality, we have ultra-Ethernet transport over Ethernet.
There are, however, some important additions down at the lower Ethernet layers as well,
and I can talk more about those a little later.
First, I should explain why AltaEthernet
has been developed.
And it's really to enable the massive scale out
of AI infrastructure,
integrating many, many AI accelerator processes like GPUs,
so they can work with huge data sets.
And in fact, AltaEthernet is being developed and specified
to be able to support up to like a million nodes.
So it really is quite the massive scale out
for these systems.
So basically, any processor can access the memory
of any other processor.
So previously, we had remote DMA over converged Ethernet
or Rocky.
And that provided similar memory access capabilities.
But one really important attribute now with Alterethernet is it allows data to be received
that isn't necessarily in the order that it was transmitted.
So it doesn't stick to the old must-be-in-order mechanisms that we saw in the past with the
past protocols.
So if we look at the common transport layer today,
so TCP, this not only requires packets to arrive in order,
but if a packet is lost,
then a request is made to resend the lost packet,
and then all subsequent packets
that had previously been successfully transferred
get retransmitted as well.
Now, the reason that the packet was determined
to be lost in the first place was due to a timeout expiring.
So not only does it take a while to detect
that a packet has been lost, but then also the retransmission
takes additional time.
So we don't only stall the processor waiting
for this lost packet, but also we load up
the network with traffic that has to be transferred again.
Now, this time taken due to the timeout and retransmission,
this leads to a long latency for this specific packet.
And this outlier latency, that's what's known as tail latency.
That term comes up a lot when talking about alter ethernet.
So reducing this tail latency, that helps to maximize the time
spent by the accelerators actually doing the job that they're meant to do.
Enables you basically to perform more with a set number of accelerators.
So another aspect of ultra-Ethernet,
and that's something that's implemented at the transport layer,
is increased utilization of the physical network connections.
So we've had multipath in Ethernet now for some time, transport layer is increased utilization of the physical network connections.
So we've had multipath in Ethernet now for some time, but in Ethernet the multipathing
relies on specific flows being directed over distinct network paths.
So in simple terms, for example, traffic from 100 gig port can be split over 10 links of
10 gig each.
The problem here is that to
ensure packets arrive for a given application in the correct order, in the
order that they were sent, then specific flows are mapped to each of the links. So
although this kind of balances the performance over these different
links, the likely scenario is that some links will be oversubscribed, whereas
other links will be underutilized. And note that if links will be oversubscribed, whereas other links will be underutilized.
And note that if a link is oversubscribed,
the likelihood is packets will be dropped.
UltraEthernet, on the other hand, will use packet spraying,
which is where packets are sent over the links according
to available capacity, nothing to do with specific end
to end flows.
So as the transport layer accepts out ofof-order packets, it's okay
for packets, if the packets don't arrive in the order in which they were sent, which of course
can happen if packets are sent over over different network paths. So if you consider what we're
trying to do, we're actually trying to move data from one GPU's memory to that of another. It
doesn't actually matter the order in which the memory is copied, but it
certainly is advantageous if the copying process takes the shortest possible time so that the GPU
can get on with this job. So, you know, we've spoken about reducing processing stalls in the
GPUs by minimizing tail latency, as well as maximizing the use of individual network connections with multi-path
and packet spraying. So these are two key benefits. Basically, getting more done with the available
GPUs means that more processing can be completed with a set number of GPUs. There's an obvious
cost efficiency there. And then secondly, better utilization of the network infrastructure
has both cost and power
savings.
So, you know, underused network connections are still consuming power.
And so the aim is to always have useful data patting over these network links, you know,
all of the time, and then you get just the most efficient, the most efficient use of
your infrastructure with the least power penalties.
Okay.
So how will the adoption of these new Ethernet standards impact existing network infrastructure
and what should organizations consider when upgrading?
So firstly, ultra Ethernet is backwards compatible with today's Ethernet infrastructure.
Although some capabilities will be lost
if you're using a regular switch from today.
So we haven't spoken about this yet,
but UltraEthernet can support two functions
that are implemented through changes
down at the Ethernet, Mac and PCS layers.
So these are link level retry
and a credit-based flow control mechanism.
So credit-based flow control will aid in preventing network overloading by using in-band signaling
on the individual network links.
So this enables very short reaction times to situations changing in terms of usage and
capacity on individual links.
So very short reaction times, and that
can prevent packets from being sent
unless there's actually available capacity for the packet
to get through the network.
Link level retry is a little different,
and that'll rapidly resend the packet
if it's not acknowledged by its link partner.
So this removes the need to rely on protocol time
marks before another request is sent across the network.
So we're speeding things up and preventing the need
for additional networking requests.
Anyway, these two mechanisms would not be available
across links that don't have the full implementation
of ultra ethernet.
However, the other ultra ethernet transport mechanisms
like out of order delivery and packet spring and so on,
they all work just fine with legacy Ethernet switches.
Long and the short of it then,
yes, you can influence ultra-Ethernet with
today's Ethernet switches and then deploying
newer switches that support these link layer mechanisms,
will give you some additional benefits in terms of
both utilization and reduction in tail latency across the network.
What are some of the key use cases and applications
that will benefit the most from these new ethernet
standards, particularly ultra ethernet?
Well, the principal target application is memory transfers
across GPUs and this enables just the massive scaling
of AI compute clusters.
That's the key thing.
I mean, a good starting point is to make
use of the fastest networking technology that's
available today.
So in other words, that's 800 gigabit ethernet or 1.6
terabit ethernet, utilizing the likes of 212 gigabits
electrical signaling.
So this gives huge throughput and low latency.
So once we have the very fastest
infrastructure, we need to make sure we're getting the most effective use out of that
infrastructure. So with ultra ethernet, we'll achieve the best possible GPU processing throughput
with maximized network speeds, lowest end-to-end transfer latencies, and reduction or elimination of tail latency.
And looking ahead, how do you see Ethernet standards evolving in the next five to ten years,
and what innovations are on the horizon? Well, I mean, first of all, Ethernet was not replaced by
FTDI back in the late 80s. So, you know, we worked on technology then,
but it didn't really go anywhere, not in a major way.
Also, it wasn't replaced by ATM in the mid 90s.
Instead, this, you know, Ethernet being a pervasive
technology is continuing to evolve.
So, for example, today Ethernet's very much taking over
in the automotive and industrial wiring spaces. So you like mechanisms
to improve predictability and reliability together with their smart traffic management,
they continue to enable Ethernet to carry delay sensitive and mission critical data. In addition,
there are new cabling and electrical standards that reduce the cost and weight of in-car cabling. And this is very
important as we continue to try to make personal transport more efficient. So, and of course,
I'm expecting Ethernet to continue to increase in speeds. So, there's already activities looking
into the next electrical signaling speeds. So, currently we're, you know, roughly at 200 gigabits
per second. And we're looking at roughly 200 gigabits per second.
And we're looking at the next electrical speeds up from that.
And then with continued use of multi-lane ethernet,
we'd expect to see the general ethernet rates
to go up beyond 1.6 terabits per second.
The only difference is, whereas in the earlier days,
we saw at each step, we saw increases,
or we saw speeds increase like tenfold
or so. Today we have to be happy when we see the line rates double. So you know we're currently
at 1.6 terabits ethernet, we could expect in the coming years to see a 3.2 terabits ethernet,
and so on and so forth. As we continue to see the infrastructure build out, we continue to see the maximization
of the use of that infrastructure and then we'll also continue to see the speeds increasing
to just drive essentially the biggest data centers that are possible.
Great, thank you John. Enjoyed the conversation, I hope to have you back on the podcast sometime soon.
Okay, thanks very much.
It's good to talk to you.
And yeah, I'll be happy to continue the conversation anytime.
Cheers then.
That concludes our podcast.
Thank you all for listening and have a great day.