In The Arena by TechArena - Cerebras Scales Real-Time AI with Wafer-Scale Innovation

Starting point is 00:00:00 Welcome to Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome in the arena. My name is Allison Querey. We're coming to you live from the AI Infra Summit in Santa Clara, California. And I am so delighted to have Sean Reed, CTO of Cerebrates, back with us for the second year in a row from AI Infra. Welcome, Sean. Thank you, Ellis Academy. Cerebris is performing incredibly well in the market. You've got incredible traction.

Starting point is 00:00:38 Every time I open my LinkedIn, I'm seeing another post for one of your executives talking about the advancement that you're making. I know that many of our listeners have heard about Cerebris and what you're doing, but why don't you just give us a general background on the company and what you're delivering in terms of AI core capability? Yeah, if you're not familiar with the detailed of Surrey Risk, where the company that's known for building the really big chip, and with this wafers-sized chip, which is the only one in the world, the largest chip of the world, we're able to offer four of the magnitude higher performance

Starting point is 00:01:11 than even the fastest GPS. And that's been the primary value that we've been bringing to our customers, is being able to serve inference in instant speeds versus having to wait many seconds or fetching minutes. And what we're seeing here is, as the industry is shifting more and more into more sophisticated inference applications,

Starting point is 00:01:33 the need for low latency isn't increasing. And we're here to essentially enable that and enable our customers and our partners to be able to do things they can do ever. That's awesome. Now, it's been a year since we last talked. When we talked in 2024, you were starting to make some early advancements with deployments.

Starting point is 00:01:52 I have seen so much news that I want to unpack with you. Why don't you give me a state for where we are with adoption and what you're seeing in the market more broadly that makes Cerever's solutions so compelling. Well, one of the main focus items for us in 2024 when we spoke last night was really just building the core technology to be able to provide this performance. But since we spoke, one of the main focus items that we had was we want to bring this inference capability to average.

Starting point is 00:02:23 And so this year, our primary focus had been really scale out. our infrastructure. And it's perfect since we're here at the Infra Summit. And so we've been building more data centers, we've been building more systems. And we've been building out our service so that we can reach our customers where they want to operate. So we still sell systems on trend and you can buy it after the system. So now it's a big part of our business.

Starting point is 00:02:47 It's actually public consumer tokens just off the cloud. That's amazing. I think that something that I wanted to delve into with you is just how you leaned into software and leaned in, in particular to open source, you recently made an announcement called K-toothink. And instead of me paraphrasing what I think it is, why don't you tell me what it is and why this is so important? Yeah. So one of the trends in the industry over the last year or so is to create panel models that can think and make a reason. And this allows these models to provide the next level of intelligence. And one of the phenomenon that happened

Starting point is 00:03:24 with the staff for instance that these models do all of a sudden do remarkable things. Many of us may have even experienced some of this and to the point now where every single state of our model is a wizard. But one of the things that we felt was really important, this reasoning is not being restricted to the largest biggest models out there. So this week with our partner and BZUAI, we launched a modest size model, it's 32 billion family model called a two thing. And it's a deep reasoning model. And what we showed is that even if it's a 32 billion

Starting point is 00:03:56 programmative model, you can actually get state-of-the-art math reasoning just using this reasoning and thinking. And so we just put that out just this week, and we're a firm believer in open source, we put it at the model in the open source community so that we can get that access to everybody. And we're serving in on our service so you can get it at crazy fast speeds,

Starting point is 00:04:18 over 2000s every second. That's amazing. The thing that I have been really impressed with you about is just the model diversity that you support all around the world, different languages, different geo-specific models, obviously large models, small models, et cetera, the services, and now that open source center for support, it's such a wonderful story. One question that I have for you, you said something about customers are seeking out cloud services from you delivering the poor capabilities when they need them.

Starting point is 00:04:48 How do you disambiguate the companies that are talking to you about on-prem deployments in what they need versus those where the cloud might be the best option? And are there underlying workload or underlying business challenges that differentiate them in terms of the path that they're going to take? Absolutely. I think the entire community is, I think, still trying to figure in a lot of ways. And we are working with all of our customers to navigate this entire landscape. And today what we see is that cloud is the fastest, the easiest, the most flexible way of consuming.

Starting point is 00:05:24 We can get our customers, our to our service, considering billions of tokens in an hour. Without having to build a data center and pull in the power and the cooling and all of that. And so that has generally been the primary way that all of our customers start engaging with us. And perhaps this is not unique to Cerebrus, right? This is why cloud is so popular. However, we still believe very strongly in selling on-prem and apps with hardware, in part because we don't believe that there's a one-size-stit-all paragraph here. Many customers need the hardware themselves for a variety of briefings.

Starting point is 00:06:02 Some are things like regulation or privacy or they can't share the data as well with the internet. Some of our customers want to build their old models. They want to use their own techniques to add a digital. value on top of the base hardware. And it's not easy to do that if you just have an API. So we're seeing this full spectrum as the entire community of trying to figure this out together. We're here basically to support our customers and the users.

Starting point is 00:06:29 You talked about APIs, so I'll bring it up. You launched a new certification partner program around APIs. Why was the Sim Fort Root and why no? I think one of the things that we're seeing is that has for the entire industry matures. It's not just about being able to provide a particular model, Amal Maverick or GPTOSS. That is at the heart of the AI, that's the LLM.

Starting point is 00:06:55 However, as the entire industry matures, what we're seeing is that more and more, these models are being consumed to parallel to API extract. Sure. And this is very natural. Any maturing industry will have this. And one of the most important thing that has an issue

Starting point is 00:07:13 important thing is it has an infrastructure builder is to ensure that you compute access to your infrastructure to all the users in the ways that they want to put it in right now what that is essentially that API layer that you just mentioned so the open routers of the world the hugging faces of the world this layer is how most of the next generation by the API option will be done and so we are trying to get ahead of it and making sure that we're right there with them not only supporting it, but actually pushing it forward together. That's awesome. Sean, I know that you talked at AI Infra today.

Starting point is 00:07:51 What were the key messages that you wanted to bring to this audience? And what is they paint in terms of the path for Cerebrus ahead? I think one of the main messages I had today in my cheat note is that AI is still improving. Two years ago, we saw a chat GPT taper over the world. And since then, there's been enhancements across the board, reasoning model, agenetic models. I believe we're just at the beginning. And what's amazing is that every single one of these attachments puts more stress on the infrastructure and our ability to be able to divide more latency and high performance.

Starting point is 00:08:32 For example, it might take a chat thought, maybe a few seconds, to respond here. on cerebris, that can be about a 10th of a second, so it becomes instant. But a reasoning model might take 20 or 30 seconds or a minute on cerebrus, then now becomes a second. And then now we're starting to see the emerging agentic workloads. We have customers that have agents that run for 45 minutes, and that might be okay in some very specialized use cases. But if you want to use these applications in any kind of real-time interactive environment,

Starting point is 00:09:07 it's just no longer possible. And so on Cerebras, we're seeing them those kind of times down to a minute. That's amazing. To get real kind of interaction. So one of the main messages that we're putting out there now, obviously, what we're doing from our customers is that with this speed and with this performers,

Starting point is 00:09:24 not only is it a better user change, where neighboring brand new capability that would have been achieved that one. That's incredible. One final question for you before you go. Where can folks find out more about what we talked about today? and engage with you. So the best way to do that is check on our website

Starting point is 00:09:41 at Svreberst.a.I. And it's not just the website. In fact, you can sign it for a free account and you can actually try out the fastest inference in the world and you can experience this new generation of instant and real-time AI for yourself. Thank you so much for the time.

Starting point is 00:09:59 I know that you're a busy guy. We appreciate you being in an arena. It's always a pleasure. Thank you. Thanks for joining Tech Arena. Subscribe and engage at our website, Techarena.aI. All content is copyright by Tech Arena.

In The Arena by TechArena - Cerebras Scales Real-Time AI with Wafer-Scale Innovation

At AI Infra Summit, CTO Sean Lie shares how Cerebras is delivering instant inference, scaling cloud and on-prem systems, and pushing reasoning models into the open-source community....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.