Screaming in the Cloud - Building Computers for the Cloud with Steve Tuck

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is brought to us in part by our friends at Red Hat. Increasingly, enterprises are embracing automation to make their IT teams more efficient, cut costs, and gain faster ROI. Welcome to Screaming in the Cloud.

Starting point is 00:01:01 I'm Corey Quinn. You know, I often say it, but not usually on the show, that Screaming in the Cloud is a podcast about the business of cloud, which is intentionally overbroad so that I can talk about basically whatever the hell I want to with whoever the hell I'd like. Today's guest is, to some ways of thinking, about as far in the opposite direction from cloud as it's possible to go and still be involved in the digital world. Steve Tuck is the CEO at Oxide Computer Company. You know, computers, the things we all pretend aren't underpinning those clouds out there that we all use and pay by the hour, gigabyte, second month pound or whatever it works out to. Steve, thank you for agreeing to come back on the show after a couple of years and once again, suffer my slings and arrows.

Starting point is 00:01:50 Much appreciated. Great to be here. It has been a while. I was looking back, I think, three years. It was like pre-pandemic, pre-interest rates, pre-Twitter going totally sideways. And I have to ask to start with that. It feels on some level like toward the start of the pandemic when everything was flying high and we'd had low interest rates for a decade, that there was a lot of, well, lunacy lurking around in the industry.

Starting point is 00:02:18 My own business saw it too. It turns out that not giving a shit about the AWS bill is in fact a zero interest rate phenomenon. And with all that money or constipated capital sloshing around, people decided to do ridiculous things with it. I would have thought on some level that we're going to start a computer company in the Bay Area. Making computers would have been one of those. But given that we are a year into the correction and things seem to be heading up into the right for you folks. That take was wrong. How'd I get it wrong?

Starting point is 00:02:48 Well, I mean, first of all, you got part of it, right? Which is there were just a litany of ridiculous companies and projects and, and money being thrown and in all directions. An NFT of a computer.

Starting point is 00:03:02 We're going to have one of those. That's what you were selling, right? Then you had to actually hard pivot to making the real thing. That's it. So we might as well cut right to it. You know, this is, uh, went through the crypto phase, but you know, our, when we started the company, it was yes, a computer company. It's on the tin. It's definitely kind of the foundation of what we're building, But we think about what a modern computer looks like through the lens of cloud.

Starting point is 00:03:27 I was at a cloud computing company for 10 years prior to us founding Oxide. So was Brian Cantrell, CTO, co-founder. And we are huge, huge fans of cloud computing, which was an interesting kind of dichotomy and set of conversations we were raising for Oxide because, of course, Sandhill is terrified of hardware. was an interesting kind of dichotomy and set of conversations when we were raising for Oxide, because of course Sandhill is terrified of hardware. And when we think about what modern computers need to look like, they need to be in support of the characteristics of cloud.

Starting point is 00:03:58 And cloud computing being not that you're renting someone else's computers, but that you have fully programmable infrastructure that allows you to slice and dice, compute and storage and networking, however software needs. And so what we set out to go build was a way for the companies that are running on-premises infrastructure, which, by the way, is almost everyone,

Starting point is 00:04:23 and will continue to be so for a very long time, access to the benefits of cloud computing. And to do that, you need to build a different kind of computing infrastructure and architecture, and you need to plumb the whole thing with software. There are a number of different ways to view cloud computing. And I think that a lot of the, shall we say, incumbent vendors over in the computer manufacturing world

Starting point is 00:04:47 tend to sound kind of like dinosaurs on some level, where they're always talking in terms of, you're a giant company and you already have a whole bunch of data centers out there. But one of the magical pieces of cloud is you can have a ridiculous idea at nine o'clock tonight, and by morning you'll have a prototype if you're of that bent. And if it turns out it doesn't work, you're out, you know, 27 cents. And if it does work, you can keep going and not have to stop and rebuild on something enterprise grade. So for the small scale stuff and rapid iteration, cloud providers are terrific. Conversely, when you wind up in the giant fleets of millions of computers, in some cases, there begin to be economic factors that weigh in.

Starting point is 00:05:28 And for some workloads, yes, I know it's true. Going to a data center is the economical choice. But my question is, is starting a new company in the direction of building these things, is it purely about economics or is there a capability story tied in there somewhere too? Yeah, it's actually economics ends up being a distant third, fourth in the list of needs and priorities from the companies that we're working with. When we talk about, and just to be clear, our demographic, the part of the market that we are focused on are large enterprises, like folks that are spending half a billion, billion dollars a year in IT infrastructure. They, over the last five years, have moved a lot of the use cases that are great for public cloud out to the public cloud, and who still have this very, very large need, be it for latency reasons or cost reasons, security reasons, regulatory reasons, where they need on-premises infrastructure in their own data centers and co-loan facilities,

Starting point is 00:06:31 et cetera. And it is for those workloads and that part of their infrastructure that they are forced to live with enterprise technologies that are 10, 20, 30 years old, that haven't evolved much since I left Dell in 2009. And when you think about what are the capabilities that are so compelling about cloud computing, one of them is, yes, what you mentioned, which is you have an idea at nine o'clock at night, swipe a credit card, and you're off and running. And that is not the case for an idea that someone has who has got to use the on-premises infrastructure of their company. And this is where you get shadow IT and 16 digits to freedom and all the like. Yeah, everyone with a corporate credit card winds up being a shadow IT source in many

Starting point is 00:07:16 cases. If your processes as a company don't make it easier to proceed rather than doing it the wrong way, people are going to be fighting against you every step of the way. Sometimes the only stick you've got is that of regulation, which in some industries, great. But in other cases, no, you get to play whack-a-mole. I've talked to too many companies that have specific scanners built in to their mail system every month, looking for things that look like AWS invoices. Right. Exactly. And so, you know, but if you flip it around and you say, well, what if the experience for all of my infrastructure that I am running or that I want to provide to my software development teams, be it rented through AWS, GCP, Azure, or

Starting point is 00:07:57 owned for economic reasons or latency reasons, had a similar set of characteristics where my development team could hit an API endpoint and provision instances in a matter of seconds when they had an idea and only pay for what they use back to kind of corporate IT. And what if they were able to use the same kind of developer tools they become accustomed to using, be it Terraform scripts and the kinds of access that they are accustomed to using? How do you make those developers just as productive across the business instead of just through public cloud infrastructure? At that point, then you are in a much stronger position where you can say,

Starting point is 00:08:35 for a portion of things that are, as you point out, more unpredictable, and where I want to leverage a bunch of additional services that a particular cloud provider has, I can rent that. And where I've got more persistent workloads or where I want a different economic profile or I need to have something in a very low latency manner to another set of services, I can own it. And that's where I think the real chasm is. Because today, we take for granted

Starting point is 00:09:03 the basic plumbing of cloud computing. Elastic compute, elastic storage, networking and security services, and us in the cloud industry end up wanting to talk a lot more about exotic services and higher-up stack capabilities. None of that basic plumbing is accessible on-prem. I also am curious as to where exactly Oxide lives in the stack, because I used to build computers for myself in 2000, and it seems like having gone down that path a bit recently, yeah, the process hasn't really improved all that much. The same off-the-shelf components still exist, and that's great. We always used to disparagingly call spinning hard drives as spinning rust in racks. You name the company Oxide. You're talking an awful lot about the rust programming language in public

Starting point is 00:09:53 a fair bit of the time, and I'm starting to wonder if maybe words don't mean what I thought they meant anymore. Where do you folks start and stop exactly? Yeah, it's a good question. And when we started, we sort of thought the scope of what we were going to do and then what we were going to leverage was smaller than it has turned out to be. And by that, I mean, man, over the last three years, we have hit a bunch of forks in the road where we had questions about, do we take something off the shelf or do we build it ourselves? And we did not try to build everything ourselves. So to give you a sense of kind of where the dotted line is

Starting point is 00:10:28 around the Oxide product, what we're delivering to customers is a rack-level computer. So the minimum size comes in rack form. And I think your listeners are probably pretty familiar with this, but a rack is... You would be surprised. It's basically, what are they, about seven feet tall? Yeah, about eight feet tall. Yeah, seven, eight feet. Weighs a couple thousand pounds.

Starting point is 00:10:51 You know, make an insulting joke about NBA players here. Yeah, all kinds of these things. Yeah, and big hunk of metal. And in the cases of on-premises infrastructure, it's kind of a big hunk of metal hull, and then a bunch of 1U and 2U boxes crammed into it. What the hyperscalers have done is something very different. They started looking at the rack level. How can you get much more dense, power-efficient designs doing things like using a DC bus bar down the back instead of having 64 power supplies with cables hanging

Starting point is 00:11:22 all over the place in a rack, which I'm sure is what you're more familiar with. Tremendous amount of weight as well, because you have the metal chassis for all of those 1U things, which in some cases you wind up with, what, 46U in a rack, assuming you can even handle the cooling needs of all that. That's right. You have so much duplication and so much of the weight is just metal separating one thing from the next thing down below it. And there are opportunities for mass improvement, but you need to be at a certain point of scale to get there. You do. You do. And you also have to be taking on the entire problem.

Starting point is 00:11:50 You can't pick at parts of these things. And that's really what we found. So we started at this sort of the rack level as sort of the design principle for the product itself and found that that gave us the ability to get to the right geometry, to get as much CPU horsepower and storage and throughput and networking into that kind of chassis for the least amount of wattage required,

Starting point is 00:12:13 kind of the most power-efficient design possible. So it ships at the rack level, and it ships complete with both our server sled systems and a pair of Oxide switches. When I talk about design decisions, do we build our own switch? It was a big, big, big question early on. We were fortunate, even though we were leaning towards thinking

Starting point is 00:12:35 we needed to go do that, we had this prospective early investor who was early at AWS. He had asked a very tough question that none of our other investors had asked to this point, which is, what are you going to do about the switch? And we knew that the right answer to an investor is like, no, we're already taking on too much. We're redesigning a server from scratch in the mold of what some of the hyperscalers have learned, doing our own root of trust. We're doing our own operating system, hypervisor, control plane, etc. Taking on the switch could

Starting point is 00:13:02 be seen as too much. But we told them, we think that to be able to pull through all of the value of the security benefits and the performance and the observability benefits, we can't have then this obscure third-party switch rammed into this rack. It's one of those things that people don't think about, but it's the magic of cloud. In AWS's network, for example, it's magic.

Starting point is 00:13:23 You can get line rate or damn near it between any two points sustained. That's right. Try that in a data center. You wind into massive congestion with top of rack switches where, okay, we're going to paralyze this stuff out over, you know, two dozen racks. And we're all going to have them seamlessly

Starting point is 00:13:38 transfer information between each other at line rate. It's like, no, you're not. Because those top of rack switches will melt and become side of rack switches and then bottom puddle of rack switches. It doesn't work that way. That's right. And you have to put a lot of thought and planning into it. That is something that I've not heard a traditional networking vendor addressing because everyone loves to hand wave over it. Well, so in this particular perspective investor, we told him, we think we have to go build our own

Starting point is 00:14:02 switch. And he said, great. And we said, you know, we think we're going to lose you as an investor as a result, but this is what we're doing. And he said, if you're building your own switch, I want to invest. And his comment really stuck with us, which is AWS did not stand on their own two feet until they threw out their proprietary switch vendor and built their own. And that really unlocked, like you've just mentioned, their ability, both in hardware and software, to tune and optimize to deliver that kind of line rate capability. And that is one of the big findings for us, is that we got into it. Yes, it was really, really hard. But based on a couple of design decisions, P4 being the programming language that we are using as the surround for our silicon,

Starting point is 00:14:42 tons of opportunities opened up for us to be able to do similar kind of optimization and observability, and that has been a big, big win. But to your question of where does it stop? So we are delivering this complete with a baked-in operating system, hypervisor, control plane. And so the endpoint of the system, where the customer meets it,

Starting point is 00:15:02 is either hitting an API or a CLI or a console that delivers and gives you the ability to spin up projects. And if one is familiar with EC2 and EBS and VPC, that VM-level abstraction is where we stop. That, I think, is a fair way of thinking about it. And a lot of cloud folks are going to poo-poo it as far as saying, oh, well, just virtual machines. That's old cloud. That just treats the cloud like a data center. In many cases, yes, it does. Because there are ways to build modern architectures

Starting point is 00:15:34 that are event-driven on top of things like Lambda and APIs, Gateway, and the rest. But you take a look at what most customers are doing and what drives the spend. It is invariably virtual machines that are largely persistent. Sometimes they scale up, sometimes they scale down, but there's always a baseline level of load that people like to hand wave away the fact that what they're fundamentally doing in a lot of these cases is paying the cloud provider

Starting point is 00:15:58 to handle the care and feeding of those systems, which can be expensive, yes, but also delivers significant innovation beyond what almost any company is going to be able to deliver in-house. There is no way around it. AWS is better than you are, whoever you happen to be, at replacing failed hard drives. That is a simple fact. They have teams of people who are the best in the world at replacing failed hard drives. You generally do not. They are going to be better at that than you. But that's not the only axis. There's not one calculus that leads to, is cloud a scam or is cloud a great value proposition for us? The answer is always a deeply nuanced, it depends. Yeah. I mean, I think cloud is a great value proposition for most and a growing

Starting point is 00:16:43 amount of software that's being developed and deployed and operated. And I think one of the myths that is out there is, hey, turn over your IT to AWS because we have, or a cloud provider, because we have such higher caliber personnel that are really good at swapping hard drives and dealing with networks and operationally keeping this thing running in a highly available manner that delivers good performance. That is certainly true, but a lot of the operational value in an AWS has been delivered via software, via automation, via observability, and not actual people putting hands on things. And it's an important point because that's been a big part of what we're building into the product. Just because you're running infrastructure in your own data center,

Starting point is 00:17:33 it does not mean that you should have to spend 1,000 hours a month across a big team to maintain and operate it. And so part of that cloud hyperscaler innovation that we're baking into this product is so that it is easier to operate with much, much, much lower overhead in a highly available, resilient manner. So I've worked in a number of data center facilities, but the companies I was working with were always at a scale where these were co-locations, where they would, in some cases, rent out a rack or two. In other cases, they'd rent out a cage and fill it with their own racks. They didn't own the facilities themselves. Those were always handled by other companies.

Starting point is 00:18:10 So my question for you is, if I want to get a pile of oxide racks into my environment in a data center, what has to change? What are the expectations? I mean, yes, there's obviously going to be power and requirements that the data center co-location is very conversant with. But Open Compute, for example, had very specific requirements, power and requirements that the data center co-location is very conversant with. But open compute, for example, had very specific requirements, to my understanding, around things like the airflow construction of the environment that they're placed within. How prescriptive is what you've built

Starting point is 00:18:36 in terms of, do we need a building retrofit to start using uFox? Yeah, definitely not. And this was one of the tensions that we had to balance as we were designing the product. For all of the benefits of hyperscaler computing, some of the design center for the kinds of racks that run in Google and Amazon and elsewhere are hyperscaler-focused, which is unlimited power. In some cases, data centers designed around the equipment itself. And where we were headed, which was basically making hyperscaler infrastructure available to the masses, the rest of the market, these folks don't have unlimited power and they aren't going to be able to go redesign data centers. And so, no, the experience should be, with exceptions for folks maybe that have very, very limited access to power, that you roll this rack in to your existing data center.

Starting point is 00:19:31 It's on a standard floor tile that you give it power and give it networking and go. how we can operate in the wide-ranging environmental characteristics that are commonplace in data centers that folks own themselves, colo facilities, and the like. So that's really on us so that the customer is not having to go do much work at all to kind of prepare and be ready for it. One of the challenges I have is how to think about what you've done because you are rack-sized. What that means is that my own experimentation at home recently with on-prem stuff or smart home stuff involves a bunch of raspberries, pie, and a nook.

Starting point is 00:20:20 But I tend to more or less categorize you the same way that I do AWS outposts as well as mythical creatures like unicorns or giraffes, where I don't believe that all these things actually exist because I haven't seen them. And in fact, to get them in my house, all four of those things would theoretically require a loaning dock if they existed. And that's a hard thing to fake on a demo signup form, as it turns out. How vaporware is what you've built? Is this all on paper and you're telling amazing stories or do they exist in the wild?

Starting point is 00:20:43 So last time we were on, it was all vaporware. It was a couple of napkin drawings and a seed round of funding. I do recall you not using that description at the time for what it's worth. Good job. Yeah. Well, at least we were transparent when we were going through the race. We had some napkin drawings. We had some good ideas, we thought. You formalized those and that's called Microsoft PowerPoint. That's it. 100%. The next gender of AI play is take the scrunched up stained napkin drawing, take a picture of it and convert it to a slide. Google Docs, you know, one of those.

Starting point is 00:21:11 But no, it's got a lot of scars from the build and it is real. In fact, next week, we are going to be shipping our first commercial systems. So we have got a line of racks out in our manufacturing facility in lovely Rochester, Minnesota. Fun fact, Rochester, Minnesota is where the IBM AS400s were built. I used to work in that market, of all things. Selling tape drives in the AS400.

Starting point is 00:21:37 I mean, I still maintain there's no real mainframe migration to Cloudplay because there's no AWS400, a joke that tends to sail over an awful lot of people's heads because, you know, most people aren't as miserable in their career choices as I am. Okay. That reminds me. So when we were originally pitching Oxide and we were fundraising, in a particular investor meeting, they asked, what would be a good comp? How should we think about what you are doing? And fortunately, we had about 20 investor meetings to go through. So burning one on this was probably okay. But we may have used the AS400 as a comp, talking about how mainframe systems did such a good job of building hardware and software together. And as you can imagine, there were some blank stares in that room. But there are some good analogs to historically in the computing industry

Starting point is 00:22:29 when the industry, the major players in the industry, were thinking about how to deliver holistic systems to support end customers. And we see this in what Apple has done with the iPhone. And you're seeing this as a lot of stuff in the automotive industry is being pulled in-house. I was listening to a good podcast. Jim Farley from Ford was talking about how the automotive industry historically outsourced all of the software that controls cars. So Bosch would write the software for the controls for your seats. And they had all these suppliers that were writing the software.

Starting point is 00:23:06 And what it meant was that innovation was not possible because you'd have to go out to suppliers to get software changes for any little change you wanted to make. And in the computing industry in the 80s, you saw this blow apart where firmware got outsourced. In the IBM and the clones kind of race, everyone started outsourcing firmware and outsourced. In the IBM and the clones race, everyone started outsourcing firmware and outsourcing software.

Starting point is 00:23:29 Microsoft started taking over operating systems, and then VMware emerged and was doing the virtualization layer. And this kind of fragmented ecosystem is the landscape today that every single on-premises infrastructure operator has to struggle with. It's a kit car. And so pulling it back together, designing things in a vertically integrated manner is what the hyperscalers have done. And so you mentioned Outposts. It's a good example of,

Starting point is 00:24:01 I mean, the most public cloud of public cloud companies created a way for folks to get their system on-prem. I mean, if you need anything to underscore the draw and the demand for cloud computing-like infrastructure on-prem, just the fact that that emerged at all tells you that there is this big need. Because you've got, I don't know, a trillion dollars worth of IT infrastructure out there and you have maybe 10% of it in the public cloud. And that's up from 5% when Jassy was on stage in 21 talking about 95% of stuff living outside of AWS.

Starting point is 00:24:39 But there's going to be a giant market of customers that need to own and operate infrastructure. And again, things have not improved much in the last 10 or 20 years for them. They have taken a tone on stage about how, oh, those workloads that aren't in the cloud yet. Yeah, those people are legacy idiots. And I don't buy that for a second because believe it or not, I know this cuts against what people commonly believe in public, but company execs are generally not morons and they make decisions with context and constraints that we don't see. Things are the way that they are for a reason. And I promise that 90% of corporate IT workloads that still live on-prem are not being managed or run by people

Starting point is 00:25:23 who've never heard of the cloud. There was a decision made when some other things were migrating of, do we move this thing to the cloud or don't we? And the answer at the time was, no, we're going to keep this thing on-prem where it is now for a variety of reasons of varying validity. But I don't view that as a bug. I also, frankly, don't want to live in a world where all the computers are basically run by three different companies. You're spot on, which is like it does a total disservice to these smart and forward-thinking teams in every one of the Fortune 1000 plus companies

Starting point is 00:25:59 who are taking the constraints that they have. And some of those constraints are not monetary or entirely workload-based. If you want to flip it around, we were talking to a large cloud SaaS company, and their reason for wanting to extend beyond the public cloud is because they want to improve latency for their e-commerce platform.

Starting point is 00:26:24 And navigating their way through the complex layers of the networking stack at GCP to get to where the customer assets are that are in colo facilities adds lag time on the platform that can cost them hundreds of millions of dollars. And so we need to think beyond this notion of like, oh, well, the dark ages are for software that can't run in the cloud and that's on-prem. And it's just a matter of time until everything moves to the cloud. In the forward-thinking models of public cloud,

Starting point is 00:26:55 it should be both. I mean, you should have a consistent experience from a certain level of the stack down everywhere. And then it's like, do I want to rent or do I want to own for this particular use case in my vast set of infrastructure needs? Do I want this to run in a data center that Amazon runs? Or do I want this to run in a facility that is close to this other provider of mine? And I think that's best for all. Then it's not this kind of false dichotomy of quality infrastructure or ownership.

Starting point is 00:27:28 I find that there are also workloads where people will come to me and say, well, we don't think this is going to be economical in the cloud. Because again, I focus on AWS bills. That is the lens I view things through. And the AWS sales rep says it will be. What do you think? I look at what they're doing, and especially if it involves high volumes of data transfer, I laugh a good hearty laugh and say, yeah, keep that thing in the data center where it is right now. You will thank me for it later. It's, well, can we run this in an economical way in AWS? As long as you're okay with economical, meaning six times what you're paying a year right now for the same thing, yeah, you can. Wouldn't recommend it. And the numbers sort of speak for themselves, but it's not just an economic play. There's also the story of, does

Starting point is 00:28:09 this increase their capability? Does it let them move faster toward their business goals? And in a lot of cases, the answer to that is no, it doesn't. It's one of those business process things that has to exist for a variety of reasons. You don't get to reimagine it for funsies. And even if you did, it doesn't advance the company and what they're trying to do any. So focus on something that differentiates as opposed to this thing that you're stuck on. That's right. And what we see today is it is easy to be in that mindset of running things on premisespremises is kind of backwards facing because the experience of it is today still very, very difficult. I mean, talking to folks and sharing with us that it takes

Starting point is 00:28:54 100 days from the time all the different boxes land in their warehouse to actually having usable infrastructure that developers can use. And our goal and what we intend to go hit with Oxide is you can roll in this complete rack-level system, plug it in within an hour. You have developers that are accessing cloud-like services out of the infrastructure. And that got countless stories of firmware bugs that would send all the fans in the data center nonlinear

Starting point is 00:29:23 and soak up 100kW of power. Oh, God. And the problems that you had with the out-of-band management systems. For a long time, I thought DRAC stood for Dell, RMA, another computer. It was awful having to deal with those things. There was so much room for innovation in that space, which no one really grabbed onto. There's a really, really interesting talk at DEF CON that we just stumbled upon yesterday. The NVIDIA folks are giving a talk on BMC exploits and a very, very serious BMC exploit. And again, it's what most people don't know.

Starting point is 00:29:57 First of all, the BMC, the Baseboard Management Controller, is like the brainstem of the computer. It has access to... It's a backdoor intostem of the computer. It has access to... It's a backdoor into all of your infrastructure. It's a computer inside a computer. And it's got software and hardware that your server OEM didn't build and doesn't understand very well.

Starting point is 00:30:19 And firmware is even worse because firmware written by an American Megatrends or other is a big blob of software that gets loaded into these systems that is very hard to audit and very hard to ascertain what's happening. And it's no surprise when back when we were running all the data centers at a cloud computing company that you'd run to these issues and you'd go to the server OEM and they'd kind of throw their hands up. Well, first they gaslight you and say, we've never seen this problem before. But when you thought you've root caused something down to firmware, it was anyone's

Starting point is 00:30:54 guess. And this is kind of the current condition today. And back to the journey to get here, we kind of realized that you had to blow away that old extant firmware layer. And we rewrote our own firmware in Rust. Yes, done a lot in Rust. No, it was in Rust, but on some level, that's what Nitro is, as best I can tell on the AWS side. But it turns out that you don't tend to have the same resources as a one and a quarter, at the moment, trillion dollar company. That keeps valuing.

Starting point is 00:31:23 At one point, they lost a comma, and that was sad and broke all my logic for that. And I haven't fixed it since. Unfortunate stuff. Totally. I think that was another kind of question early on from certainly a lot of investors was like, hey, how are you going to pull this off with a smaller team? And there's a lot of surface area here. It's certainly a reasonable question. Definitely was hard. The one advantage, among others, is when you are designing something in a vertical, holistic manner, those design integration points are narrowed down to just your equipment. When someone's writing firmware, when AMI is writing firmware, they're trying to do it to cover hundreds and hundreds of components across dozens and dozens of vendors. And we have the advantage of having this purpose-built system

Starting point is 00:32:06 kind of end-to-end from the lowest level, from first boot instruction all the way up through the control plane and from rack to switch to server. That definitely helped narrow the scope. This episode has been fake-sponsored by our friends at AWS with the following message. Graviton, Graviton, Graviton, Graviton, Graviton, Graviton, Graviton, Graviton, Graviton. Thank you for your lack of support for this show. Now, AWS has been talking about Graviton an awful lot, which is

Starting point is 00:32:35 their custom in-house ARM processor. Apple moved over to ARM, and instead of talking about benchmarks they won't publish and marketing campaigns with words that don't mean anything, they've let the results speak for themselves. In time, I found that almost all of my workloads have moved over to ARM architecture for a variety of reasons, and my laptop now gets 15 hours of battery life when all is said and done. You're building these things on top of x86. What is the deal there? I do not accept that you hadn't heard of ARM until just now because, as mentioned, Graviton, Graviton, Graviton. That's right. Well, so why x86 to start? And I say to start because we have just launched our first generation products. And our first generation, our second generation products that we are now underway working on are going to be launching a Genoa sled. But when you're thinking about what silicon to use, obviously, there's a bunch of parts that

Starting point is 00:33:32 go into the decision. You're looking at the applicability to workload, performance, power management, for sure. And if you carve up what you are trying to achieve, x86 is still a terrific fit for the broadest set of workloads that our customers are trying to solve for. And choosing which x86 architecture was certainly an easier choice come 2019. At this point, AMD had made a bunch of improvements in performance and energy efficiency in the chip itself. We've looked at other architectures, and I think as we are incorporating those in the future roadmap, it's just going to be a question of what are you trying to solve for? You mentioned power management, and that has commonly been low-power systems is where folks have gone beyond x86 is we're looking forward to hardware acceleration products and future products will certainly look beyond x86 but x86 has a long long road to go it still is kind of the foundation for what again is a general purpose

Starting point is 00:34:39 cloud infrastructure for being able to slice and dice for a variety of workloads. True. I have to look around my environment and realize that Intel's not going anywhere. And that's not just an insult to their lack of progress on committed roadmaps that they consistently miss. But enough on that particular topic, because we want to keep this polite. Intel has definitely had some struggles, for sure. They're very public ones. I think we were really excited and continue to be very excited about their

Starting point is 00:35:13 Tofino silicon line. And this came by way of the Barefoot Networks acquisition. I don't know how much you had paid attention to Tofino, but what was really, really compelling about Tofino is the focus on both hardware and software and programmability. So great chip, and P4 is the programming language

Starting point is 00:35:34 that surrounds that. And we have gone very, very deep on P4. And that is some of the best tech to come out of Intel lately. But from a core silicon perspective for the rack, we went with AMD. And again, that was a pretty straightforward decision at the time. And we're planning on having this anchored

Starting point is 00:35:55 around AMD silicon for a while now. One last question I have before we wind up calling it an episode. It seems that at least as of this recording, it's still embargoed, but we're not releasing this until that winds up changing. You folks have just raised another round, which means that your napkin doodles have apparently drawn more folks in. And now that you're shipping, you're also not just bringing in customers,

Starting point is 00:36:20 but also additional investor money. Tell me about that. Yes, we just completed our Series A. So when we last spoke three years ago, we had just raised our seed and had raised $20 million at the time. And we had expected that it was going to take about that to be able to build the team and build the product and be able to get to market. And I think tons of technical risk along the way. I mean, there was technical risk up and down the stack around this de novo server design, this switch design, and software is still the kind of disproportionate majority of what this product is,

Starting point is 00:36:59 from hypervisor up through kind of control plane, the cloud services, etc. So we just view it as software with a really, really confusing hardware dongle. Yeah. Super heavy. We're talking enterprise and government grade here. That's right. There's a lot of software to write. We had a bunch of milestones that, as we got through them, one of the big ones was getting Milan Silicon booting on our firmware. It was funny.

Starting point is 00:37:24 This was the thing that clearly like the industry was most suspicious of us doing our own firmware. And you could see it when we demonstrated booting this like a year and a half ago and AMD all of a sudden just lit up from kind of arm's length to like, how can we help? This is amazing. You know? And they could start to see the benefits of when you can tie low level silicon intelligence up through a hypervisor. No, I love the existing firmware I have. It looks like it was written in 1984 and winds up having terrible user ergonomics and hasn't been updated at all. And every time

Starting point is 00:37:55 something comes through, it's a 50-50 shot as whether it fries the box or not. Yeah, no, I want that. That's right. And you look in these hyperscaler data centers and it's like, no. I mean, you've got intelligence from that first boot instruction through a root of trust up through the software of the hyperscaler and up into the user level. And so as we were going through and kind of knocking down

Starting point is 00:38:17 each one of these layers of the stack, doing our own firmware, doing our own hardware root of trust, getting that all the way plumbed up into the hypervisor and the control plane. Number one, on the customer side, folks moved from, this is really interesting. We need to figure out how we can bring cloud capabilities

Starting point is 00:38:32 to our data centers. Talk to us when you have something. To, okay, we actually, back to your earlier question on vaporware, it was great having customers out here to Emeryville where they can put their hands on the rack and they can put your hands on software, but being able to look at real running software and that end cloud experience. And that led to getting our first couple commercial contracts.

Starting point is 00:38:54 So we've got some great first customers, including a large department of the government, of the federal government, and a leading firm on Wall Street that we're going to be shipping systems to in a matter of weeks. And as you can imagine, along with that, that drew a bunch of renewed interest from the investor community. Certainly a different climate today than it was back in 2019. But what was great to see is you still have great investors that understand the importance of making bets in the hard tech space and in companies that are looking to reinvent certain industries. And so we added our existing investors all participated. We added a bunch of terrific new investors, both strategic and institutional. And this capital is going to be super important

Starting point is 00:39:46 now that we are headed into market and we are beginning to scale up the business and make sure that we have a long road to go. And of course, maybe as importantly, this was a real confidence boost for our customers. They're excited to see that Oxide is going to be around for a long time and that they can invest in this technology as an important part of their infrastructure strategy. I really want to thank you for taking the time to speak with me about, well, how far you've come in a few years. If people want to learn more and have the requisite loading dock, where should they go to find you? So we try to put everything up on the site. So oxidecomputer.com or oxide.computer. We also, if you remember, we did On the Metal. So we had a Tales from the Hardware Software

Starting point is 00:40:34 Interface podcast that we did when we started. We have shifted that to Oxide and Friends, which the shift there is we're spending a little bit more time talking about the guts of what we built and why. So if folks are interested in like, why the heck did you build a switch and what does it look like to build a switch? We actually go to depth on that and you know, what does bring up on a new server motherboard look like?

Starting point is 00:40:58 And we've got some, some episodes out there that might be worth checking out. We will definitely include a link to that in the show notes. Thank you so much for your time. I really appreciate it. Yeah, Corey, thanks for having me on. Steve Tuck, CEO at Oxide Computer Company. I'm cloud economist Corey Quinn,

Starting point is 00:41:16 and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this episode, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this episode, please leave a five-star review on your podcast platform of choice, along with an angry ranting comment because you are in fact a zoology major and you're telling me that some animals do in fact exist. But I'm pretty sure of the two of them, it's the unicorn. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group.

Starting point is 00:41:49 We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Screaming in the Cloud - Building Computers for the Cloud with Steve Tuck

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.