In The Arena by TechArena - Connectivity Innovation for Scalable Data Movement in the AI Era with Credo’s Don Barnetson

Episode Date: April 25, 2024

TechArena host Allyson Klein chats with Credo VP Don Barnetson about how his company is delivering innovative connectivity solutions that address the AI era’s requirements for scalable data movement... in the data center and beyond from OCP Lisbon 2024.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Allison Klein. Now, let's step into the arena. Welcome to the Tech Arena. My name is Allison Klein, and we are recording this week at OCP Lisbon. I'm so glad to be joined by Don Barnstead, VP of Product at Credo. Welcome to the program, Don. Thank you so much for having me. So this is the first time that Credo has been on the show. So why don't we just start with an introduction of the company and your role? Sure. So Credo is a pure-play interconnect company. We're one of a very small number of companies who develop what's called certain serializers and deserializers.
Starting point is 00:00:57 And then we do that in a variety of ways. We license those out to folks who include them in larger semiconductors. We build many of our own ICs. ICs for communications, for off-limit communication, as well as common communication. And then we build them into end products. And what I'm actually responsible for at Credo is a product line called our active electrical cables, which are copper cables that have re-financing that allows the copper to be thinner, allows us to go longer distances, and then add some advanced functionality. So when you look at the cabling challenges in the data center,
Starting point is 00:01:32 I don't think that they're getting any easier. Why don't you describe the current landscape today and how you're seeing operators challenged by wanting to go faster and having KPIs and limitations. Yeah, it's really interesting. The infrastructure of the data center was often defined a long time ago, maybe more than a decade ago. So our large customers have standardized 600 millimeter graphs which have a pretty small area allocated for data management.
Starting point is 00:02:01 But over the last decade, and I would say in particular in the last two years with AI, the amount of interconnect that they're trying to fit into a rack was starting to grow exponentially. And so initially, I don't know if customers were using active productivity, what are called directed hatch cables or DACs, but they found as they went to the 100 and 200 gigabit generation, well, they still could run out of space for these passive operators. And so the active electrical cables that we created reduced the amount of volume that they take by about 75%. But because they're active, we can also add quite a bit of advanced functionality into
Starting point is 00:02:34 them to suit each type of scale's needs, which has been a lot of development we've done. So talk to me about that advanced functionality. What are you able to do with an active cable? So one of the things that we started doing, a customer, Microsoft, came to us and had a reliability issue. They were having a single top-of-rack switch per rack of their general compute. And if that switch had an outage or failed, it would take down that entire rack. And that turned out to be a substantial reliability issue. There's been a lot of approaches to going to multiple top of rack switches in the past,
Starting point is 00:03:08 but those didn't work well in the Microsoft ecosystem. So they asked us to build them an active cable that would manage the failover process directly. And so if one of the top of rack switches goes out, the cable is able to near instantaneously failover to the second switch, and the whole network can converge in the order of, say, 50 milliseconds. What I allowed them to do was to add this level of redundancy and substantially improve their reliability without having to impact any of the compute customers at all, because the server doesn't have to participate in this.
Starting point is 00:03:39 So for their infrastructure-as-a-service customers, they can add this capability. It turned out it gave them about a 30x improvement in reliability, and they can do that entirely seamlessly to their customers. And so that product, which Microsoft publicly calls the Y cable, is the foundation of their demo compute today. That is fascinating, and I would have never thought a cable could do so much for its op-req switch. And that reliability gain is incredible. Now, you're here giving a course. Can you tell me about what you're teaching? Yeah.
Starting point is 00:04:10 So I'm talking about Co-Parent Office, which is a new type of long-range mobile office. And some of the challenges that our customers have run into in the building are then a new product that we put together to help them address those changes. And then a new product that we put together to help them address those issues. When you look at data centers and you look at networking technologies, every advancement of speed brings about a conversation of, is this the moment that we pivot completely to optimal? And yet copper continues to survive. How do you view the connection between copper and optic today, and where do you
Starting point is 00:04:45 think we're going with that? Yeah, yeah, it's amazing. You always think copper's dead, and then it turns out. The thing about copper is it's really, really low-power, it's low-cost, and it's extremely reliant. And so our customers can use copper, they almost always prefer to. So it used to be that a whole data center was built. And then as we went to the 10 gigabit generation, that sort of shrunk down to maybe the row of servers, so maybe 20 meters of servers. And then as we went to the 100 gigabit generation,
Starting point is 00:05:15 that kind of shrunk down to the rack of servers. And that's where we're at today, is most racks of servers and AI devices are interconnected with copper between the appliance or the server on the top of that switch. And then we're using optical where we have to, which is where the rack functions, typically more than about three minutes long. Now, coherent optics have been referred to as coherent associated with computing for the last 20 to 30 years, but never to optics. What do you mean by coherent optics and why is this an important thing? So the optics that are used inside data centers are called rayon
Starting point is 00:05:49 optics, and they work well up to about two kilometers in reach, which really means when you take into account the right events, it could be anywhere inside the gate. But that's only part of the problem. If you want to go outside the building, if you want to go between buildings, or if you want to go around a metro area, you have to modulate the light in a different way.
Starting point is 00:06:08 And that's what we call coherent optics. So coherent optics used to be deployed in a transponder structure, so basically the great big box that you find. But now they're available in a pluggable form factor. So they actually have the same form factor as the optics that we use in the data center and reuse a lot of the pieces that we've built. But it's created a lot of new challenges for our customers. They're having some difficulties
Starting point is 00:06:34 and some growing against it. Can you talk a little bit about why folks are choosing to connect buildings to buildings and across a city? And what are the kind of use cases that they're solving? Sure, sure. So the classic one would be a service provider. If you look at how you get your home internet or how your business gets its internet, there's
Starting point is 00:06:54 a service provider who's typically running a large metro ring around the city and connecting many thousands or tens of thousands of folks together over tens or hundreds of dollars in order to provide and provision internet services. many thousands or tens of thousands of books together, over tens or hundreds of thousands, in order to provide and provision internet services. Also, classic internet service provisioning is perfectly explicit. But in the greatest cases, these very large data centers that we're here at the conference talking about, they're interconnected to each other. It's an extremely large bandwidth,
Starting point is 00:07:20 so that you can move workloads back and forth, or indeed, sometimes the data centers can even interoperate in the same way. And so those require links that are well in excess of two kilometers. And so here in office, they're becoming reinforced. When you look at that high-capacity data center, there's data centers that I think have multiple football fields long. They've got thousands of servers on them. What are the challenges today in terms of the interconnection between all of those boxes? And how does cabling fit into the challenge? So the challenge is, as soon as you leave the building, it's kind of your realm of security.
Starting point is 00:07:56 So at first order, you have to encrypt all of that traffic, which most of our customers do really at the entrance to the building. So these are protocols on Macs that Rito helps develop. So these are products off of MaxSense that Rito helps develop. And they encrypt all of that traffic. But then they have to modulate it in this fashion to be able to send it over five or maybe 100 or more kilometers without having to repeat or amplify it. So those, well, they're optics that are enabling you to do that. The consumers, you might imagine, are a lot more power than an optic, but it's a couple of colors. And they're a lot more complex because they're having to look at a lot of parameters of the fiber. They don't care so much about the inside of the data center.
Starting point is 00:08:35 So they're a lot, lot more difficult to read. And I think those are really two of the challenges that have stalled the adoption of these apparent optics. Why is OCP so critical in terms of the development of technology in this arena? And why does this show something that you wanted to prioritize? So OCP is really the only place where all of the users and developers of this technology come together and try to solve those back projects. So a lot of standards bodies exist, and they create a standard. You can sort of think of it as a set of ingredients that you can choose. I believe it'd be an example
Starting point is 00:09:10 of those various multi-source agreements that are an example. But at OCP, we say, okay, how does this actually work together so that we can give a solution to the industry that is easy to deploy, easy to maintain, and provides the necessary cost and reliability. So we're really looking at the entirety of the problem rather than just perhaps a few technical aspects. And when you look at market and how it's shaping up in 2020, do you see the opportunity primarily across your portfolio? Yeah, well, you can't talk about 2020 without talking about our business values. And so, you know, you're driving interconnection at cities4, they're talking about our business values. And so, AMLA is driving interconnections to these areas that we've never seen before.
Starting point is 00:09:48 Because these models, it's much too large to run, it's either too large to run on even a rack or a row of computers, they're running on a data center scale. And so, the interconnect that's necessary inside the data center to support that is just carrying our faith into the hands of the player that is what we use to run the data center to support them is just carrying our fleet into payings to pay our
Starting point is 00:10:05 different results. Where do you see the industry in terms of being poised for the demand? And is there technology that is going to be discussed at OCP in Lisbon this week that will help shape the future? Yeah, I think the industry is well poised to support the demand, at least for the next 18 months, and see what challenges we can be on that. It's always hard to get bored in this industry. I think at UCP in Lisbon, we've got a lot of end users talking about the challenges that they face. And a lot of the technology is developed for the large U.S. venture standards. So what's interesting in Lisbon is talking about how can you take that technology and make it accessible to the broader base of users that exist in the Caribbean market, also in North America and in Asia.
Starting point is 00:10:50 That is fascinating, Don. I think this is a really interesting area of the ecosystem. I'm so glad that you spent some time with us. I'm sure my listeners are interested in learning more about Credo. Where would you send them for more information to engage your team? So you can obviously visit our website, www.credosemi.com, or you can navigate to the recruitment teaching project events page, and you can watch the seminar that we're doing today. We're kind of released to you.
Starting point is 00:11:18 Thanks so much for being here today and spending a bit of your OCP time with me. Thank you so much. Thanks for joining the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by the Tech Arena. Pecorino.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.