In The Arena by TechArena - Connectivity Innovation for Scalable Data Movement in the AI Era with Credo’s Don Barnetson
Episode Date: April 25, 2024TechArena host Allyson Klein chats with Credo VP Don Barnetson about how his company is delivering innovative connectivity solutions that address the AI era’s requirements for scalable data movement... in the data center and beyond from OCP Lisbon 2024.
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Allison Klein.
Now, let's step into the arena. Welcome to the Tech Arena. My name is Allison Klein, and we are recording this week
at OCP Lisbon. I'm so glad to be joined by Don Barnstead, VP of Product at Credo. Welcome to
the program, Don. Thank you so much for having me. So this is the first time that Credo has been on
the show. So why don't we just start with an introduction of the company and your role?
Sure.
So Credo is a pure-play interconnect company.
We're one of a very small number of companies who develop what's called certain serializers and deserializers.
And then we do that in a variety of ways.
We license those out to folks who include them in larger semiconductors.
We build many of our own ICs. ICs for communications, for off-limit communication, as well as common communication.
And then we build them into end products. And what I'm actually responsible for at Credo
is a product line called our active electrical cables, which are copper cables that have
re-financing that allows the copper to be thinner, allows us to go longer distances,
and then add some advanced functionality.
So when you look at the cabling challenges in the data center,
I don't think that they're getting any easier.
Why don't you describe the current landscape today
and how you're seeing operators challenged by wanting to go faster
and having
KPIs and limitations. Yeah, it's really interesting. The
infrastructure of the data center was often defined a long time ago, maybe more than a decade
ago. So our large customers have standardized 600 millimeter graphs
which have a pretty small area allocated for data management.
But over the last decade, and I would say in particular in the last two years with
AI, the amount of interconnect that they're trying to fit into a rack was starting to grow
exponentially. And so initially, I don't know if customers were using active productivity,
what are called directed hatch cables or DACs, but they found as they went to the 100 and 200
gigabit generation, well, they still could run out of space for these passive operators.
And so the active electrical cables that we created reduced the amount of volume that
they take by about 75%.
But because they're active, we can also add quite a bit of advanced functionality into
them to suit each type of scale's needs, which has been a lot of development we've done.
So talk to me about that advanced functionality.
What are you able to do with an active cable?
So one of the things that we started doing, a customer, Microsoft, came to us and had a
reliability issue. They were having a single top-of-rack switch per rack of their general
compute. And if that switch had an outage or failed, it would take down that entire rack.
And that turned out to be a substantial reliability issue. There's been a lot of
approaches to going to multiple top of rack switches in the past,
but those didn't work well in the Microsoft ecosystem.
So they asked us to build them an active cable that would manage the failover process directly.
And so if one of the top of rack switches goes out,
the cable is able to near instantaneously failover to the second switch,
and the whole network can converge in the order of, say, 50 milliseconds.
What I allowed them to do was to add this level of redundancy and substantially improve
their reliability without having to impact any of the compute customers at all, because
the server doesn't have to participate in this.
So for their infrastructure-as-a-service customers, they can add this capability.
It turned out it gave them about a 30x improvement in reliability, and they can do that entirely seamlessly to their customers.
And so that product, which Microsoft publicly calls the Y cable, is the foundation of their demo compute today.
That is fascinating, and I would have never thought a cable could do so much for its op-req switch.
And that reliability gain is incredible.
Now, you're here giving a course.
Can you tell me about what you're teaching?
Yeah.
So I'm talking about Co-Parent Office, which is a new type of long-range
mobile office.
And some of the challenges that our customers have run into in the building
are then a new product that we put together to help them address those
changes. And then a new product that we put together to help them address those issues.
When you look at data centers and you look at networking technologies, every advancement of speed brings about a conversation of, is this the moment that we pivot completely to optimal?
And yet copper continues to survive.
How do you view the connection between copper and optic today, and where do you
think we're going with that? Yeah, yeah, it's amazing. You always think copper's dead, and then
it turns out. The thing about copper is it's really, really low-power, it's low-cost, and it's
extremely reliant. And so our customers can use copper, they almost always prefer to. So it used
to be that a whole data center was built.
And then as we went to the 10 gigabit generation,
that sort of shrunk down to maybe the row of servers,
so maybe 20 meters of servers.
And then as we went to the 100 gigabit generation,
that kind of shrunk down to the rack of servers.
And that's where we're at today, is most racks of servers
and AI devices are interconnected with copper between the appliance or the server on the top of that switch.
And then we're using optical where we have to, which is where the rack functions, typically more than about three minutes long.
Now, coherent optics have been referred to as coherent associated with computing for the last 20 to 30 years, but never to optics.
What do you mean by coherent optics and why is this an important thing?
So the optics that are used inside
data centers are called rayon
optics, and they work well up to about
two kilometers in reach, which
really means when you take into account the
right events, it could be anywhere inside the gate.
But that's only part of the problem.
If you want to go outside the building, if you want
to go between buildings, or if you want to go
around a metro area, you have to modulate the light in a different way.
And that's what we call coherent optics.
So coherent optics used to be deployed in a transponder structure, so basically the great big box that you find.
But now they're available in a pluggable form factor.
So they actually have the same form factor as the optics that we use in the data center
and reuse a lot of the pieces that we've built.
But it's created a lot of new challenges
for our customers.
They're having some difficulties
and some growing against it.
Can you talk a little bit about
why folks are choosing to connect
buildings to buildings and across a city?
And what are the kind of use cases that they're solving?
Sure, sure.
So the classic one would be a service provider.
If you look at how you get your home internet or how your business gets its internet, there's
a service provider who's typically running a large metro ring around the city and connecting
many thousands or tens of thousands of folks together over tens or hundreds of dollars
in order to provide and provision internet services. many thousands or tens of thousands of books together, over tens or hundreds of thousands,
in order to provide and provision internet services.
Also, classic internet service provisioning is perfectly explicit.
But in the greatest cases, these very large data centers that we're here at the conference talking about,
they're interconnected to each other.
It's an extremely large bandwidth,
so that you can move workloads back and forth,
or indeed, sometimes the data centers can even interoperate in the same way.
And so those require links that are well in excess of two kilometers.
And so here in office, they're becoming reinforced.
When you look at that high-capacity data center, there's data centers that I think have multiple football fields long.
They've got thousands of servers on them. What are the challenges today in terms of the interconnection between all of those boxes?
And how does cabling fit into the challenge?
So the challenge is, as soon as you leave the building, it's kind of your realm of security.
So at first order, you have to encrypt all of that traffic, which most of our customers do really at the entrance to the building.
So these are protocols on Macs that Rito helps develop. So these are products off of MaxSense that
Rito helps develop. And they encrypt all of that traffic. But then they have to modulate it in this
fashion to be able to send it over five or maybe 100 or more kilometers without having to repeat
or amplify it. So those, well, they're optics that are enabling you to do that. The consumers,
you might imagine, are a lot more power than an optic, but it's a couple of colors.
And they're a lot more complex because they're having to look at a lot of parameters of the fiber.
They don't care so much about the inside of the data center.
So they're a lot, lot more difficult to read.
And I think those are really two of the challenges that have stalled the adoption of these apparent optics.
Why is OCP so critical in terms of the development of technology in this arena?
And why does this show something that you wanted to prioritize?
So OCP is really the only place where all of the users and developers of this technology come together
and try to solve those back projects.
So a lot of standards bodies exist, and they create a standard. You
can sort of think of it as a set of ingredients that you can choose. I believe it'd be an example
of those various multi-source agreements that are an example. But at OCP, we say, okay, how does this
actually work together so that we can give a solution to the industry that is easy to deploy,
easy to maintain, and provides the necessary cost and reliability.
So we're really looking at the entirety of the problem rather than just perhaps a few technical aspects.
And when you look at market and how it's shaping up in 2020, do you see the opportunity primarily across your portfolio?
Yeah, well, you can't talk about 2020 without talking about our business values.
And so, you know, you're driving interconnection at cities4, they're talking about our business values. And so, AMLA is driving interconnections
to these areas that we've never seen before.
Because these models,
it's much too large to run,
it's either too large to run on even a rack
or a row of computers,
they're running on a data center scale.
And so, the interconnect that's necessary
inside the data center to support that
is just carrying our faith into the hands of the player that is what we use to run the data center to support them is just carrying our fleet into payings to pay our
different results.
Where do you see the industry in terms of being poised for the demand?
And is there technology that is going to be discussed at OCP in Lisbon this week that
will help shape the future?
Yeah, I think the industry is well poised to support the demand, at least for the next 18 months, and see what challenges we can be on that.
It's always hard to get bored in this industry.
I think at UCP in Lisbon, we've got a lot of end users talking about the challenges that they face.
And a lot of the technology is developed for the large U.S. venture standards. So what's interesting in Lisbon is talking about how can you take that technology and make it accessible to the broader base of users that exist in the Caribbean market, also in North America and in Asia.
That is fascinating, Don.
I think this is a really interesting area of the ecosystem.
I'm so glad that you spent some time with us.
I'm sure my listeners are interested in learning more about Credo.
Where would you send them for more information to engage your team?
So you can obviously visit our website, www.credosemi.com, or you can navigate to the
recruitment teaching project events page, and you can watch the seminar that we're doing today.
We're kind of released to you.
Thanks so much for being here today and spending a bit of your OCP time with me.
Thank you so much.
Thanks for joining the Tech Arena.
Subscribe and engage at our website, thetecharena.net.
All content is copyright by the Tech Arena. Pecorino.