Screaming in the Cloud - Building Computers for the Cloud with Steve Tuck
Episode Date: September 21, 2023Steve Tuck, Co-Founder & CEO of Oxide Computer Company, joins Corey on Screaming in the Cloud to discuss his work to make modern computers cloud-friendly. Steve describes what it was like... going through early investment rounds, and the difficult but important decision he and his co-founder made to build their own switch. Corey and Steve discuss the demand for on-prem computers that are built for cloud capability, and Steve reveals how Oxide approaches their product builds to ensure the masses can adopt their technology wherever they are. About SteveSteve is the Co-founder & CEO of Oxide Computer Company. He previously was President & COO of Joyent, a cloud computing company acquired by Samsung. Before that, he spent 10 years at Dell in a number of different roles. Links Referenced:Oxide Computer Company: https://oxide.computer/On The Metal Podcast: https://oxide.computer/podcasts/on-the-metal
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is brought to us in part by our friends at Red Hat.
Increasingly, enterprises are embracing automation to make their IT teams more efficient, cut costs, and gain faster ROI. Welcome to Screaming in the Cloud.
I'm Corey Quinn. You know, I often say it, but not usually on the show,
that Screaming in the Cloud is a podcast about the business of cloud, which is intentionally
overbroad so that I can talk about basically whatever the hell I want to with whoever the
hell I'd like. Today's guest is, to some ways of thinking, about as far in the opposite direction from cloud as it's possible to go and still be involved in the digital world.
Steve Tuck is the CEO at Oxide Computer Company.
You know, computers, the things we all pretend aren't underpinning those clouds out there that we all use and pay by the hour, gigabyte, second month pound or whatever it works out to.
Steve, thank you for
agreeing to come back on the show after a couple of years and once again, suffer my slings and arrows.
Much appreciated. Great to be here. It has been a while. I was looking back, I think,
three years. It was like pre-pandemic, pre-interest rates, pre-Twitter going totally sideways.
And I have to ask to start with that.
It feels on some level like toward the start of the pandemic
when everything was flying high
and we'd had low interest rates for a decade,
that there was a lot of, well, lunacy
lurking around in the industry.
My own business saw it too.
It turns out that not giving a shit about the AWS bill
is in fact a zero interest rate phenomenon.
And with all that money or constipated capital sloshing around, people decided to do ridiculous things with it.
I would have thought on some level that we're going to start a computer company in the Bay Area.
Making computers would have been one of those.
But given that we are a year into the correction and things seem to be heading up into the right for you folks. That take was wrong.
How'd I get it wrong?
Well,
I mean,
first of all,
you got part of it,
right?
Which is there were just a litany of ridiculous companies and projects and,
and money being thrown and in all directions.
An NFT of a computer.
We're going to have one of those.
That's what you were selling,
right?
Then you had to actually hard pivot to making the real thing. That's it. So we might
as well cut right to it. You know, this is, uh, went through the crypto phase, but you know, our,
when we started the company, it was yes, a computer company. It's on the tin. It's definitely
kind of the foundation of what we're building, But we think about what a modern computer looks like
through the lens of cloud.
I was at a cloud computing company for 10 years
prior to us founding Oxide.
So was Brian Cantrell, CTO, co-founder.
And we are huge, huge fans of cloud computing,
which was an interesting kind of dichotomy
and set of conversations we were raising for Oxide
because, of course, Sandhill is terrified of hardware. was an interesting kind of dichotomy and set of conversations when we were raising for Oxide,
because of course Sandhill is terrified of hardware. And when we think about what modern computers need to look like, they need to be in support of the characteristics of cloud.
And cloud computing being not that you're renting someone else's computers, but that you have fully programmable infrastructure
that allows you to slice and dice,
compute and storage and networking,
however software needs.
And so what we set out to go build
was a way for the companies
that are running on-premises infrastructure,
which, by the way, is almost everyone,
and will continue to be so for a very long time,
access to the benefits of cloud computing.
And to do that, you need to build a different kind
of computing infrastructure and architecture,
and you need to plumb the whole thing with software.
There are a number of different ways to view cloud computing.
And I think that a lot of the, shall we say, incumbent vendors
over in the computer manufacturing world
tend to sound kind of like dinosaurs on some level, where they're always talking in terms of,
you're a giant company and you already have a whole bunch of data centers out there.
But one of the magical pieces of cloud is you can have a ridiculous idea at nine o'clock tonight,
and by morning you'll have a prototype if you're of that bent.
And if it turns out it doesn't work, you're out, you know, 27 cents. And if it does work,
you can keep going and not have to stop and rebuild on something enterprise grade.
So for the small scale stuff and rapid iteration, cloud providers are terrific. Conversely,
when you wind up in the giant fleets of millions of computers, in some cases, there begin to be economic factors that weigh in.
And for some workloads, yes, I know it's true.
Going to a data center is the economical choice.
But my question is, is starting a new company in the direction of building these things, is it purely about economics or is there a capability story tied in there somewhere too?
Yeah, it's actually economics ends up being a distant third, fourth in the list of needs and priorities from the companies that we're working with. When we talk about, and just to be clear, our demographic, the part of the market that we are focused on are large enterprises, like folks that are spending half a billion, billion dollars a year in IT infrastructure.
They, over the last five years, have moved a lot of the use cases that are great for public cloud
out to the public cloud, and who still have this very, very large need,
be it for latency reasons or cost reasons, security reasons, regulatory
reasons, where they need on-premises infrastructure in their own data centers and co-loan facilities,
et cetera. And it is for those workloads and that part of their infrastructure that they are forced
to live with enterprise technologies that are 10, 20, 30 years old, that haven't evolved much since I left Dell
in 2009. And when you think about what are the capabilities that are so compelling about cloud
computing, one of them is, yes, what you mentioned, which is you have an idea at nine o'clock at night,
swipe a credit card, and you're off and running. And that is not the case for an idea that someone
has who has got to use the on-premises
infrastructure of their company. And this is where you get shadow IT and 16 digits to freedom and all
the like. Yeah, everyone with a corporate credit card winds up being a shadow IT source in many
cases. If your processes as a company don't make it easier to proceed rather than doing it the
wrong way, people are going to be fighting against you every step of the way. Sometimes the only stick you've got is that of regulation, which in some industries,
great. But in other cases, no, you get to play whack-a-mole. I've talked to too many companies
that have specific scanners built in to their mail system every month, looking for things that
look like AWS invoices. Right. Exactly. And so, you know, but if you flip it around and you say,
well, what if the experience for all of my infrastructure
that I am running or that I want to provide to my software development teams,
be it rented through AWS, GCP, Azure, or
owned for economic reasons or latency reasons,
had a similar set of characteristics where my development team could hit an API endpoint and provision instances in a matter of seconds when they
had an idea and only pay for what they use back to kind of corporate IT.
And what if they were able to use the same kind of developer tools they become accustomed
to using, be it Terraform scripts and the kinds of access that they are accustomed to
using?
How do you make those developers just as productive across the business instead of just through public cloud infrastructure?
At that point, then you are in a much stronger position where you can say,
for a portion of things that are, as you point out, more unpredictable,
and where I want to leverage a bunch of additional services that a particular cloud provider has, I can rent that.
And where I've got more persistent workloads
or where I want a different economic profile
or I need to have something in a very low latency manner
to another set of services, I can own it.
And that's where I think the real chasm is.
Because today, we take for granted
the basic plumbing of cloud computing.
Elastic compute, elastic storage, networking and security services, and us in the cloud industry
end up wanting to talk a lot more about exotic services and higher-up stack capabilities.
None of that basic plumbing is accessible on-prem. I also am curious as to where exactly Oxide lives in
the stack, because I used to build computers for myself in 2000, and it seems like having gone down
that path a bit recently, yeah, the process hasn't really improved all that much. The same
off-the-shelf components still exist, and that's great. We always used to disparagingly call spinning hard drives as spinning rust in racks. You name
the company Oxide. You're talking an awful lot about the rust programming language in public
a fair bit of the time, and I'm starting to wonder if maybe words don't mean what I thought they
meant anymore. Where do you folks start and stop exactly? Yeah, it's a good question. And
when we started, we sort of thought the scope
of what we were going to do and then what we were going to leverage was smaller than it has turned
out to be. And by that, I mean, man, over the last three years, we have hit a bunch of forks in the
road where we had questions about, do we take something off the shelf or do we build it ourselves?
And we did not try to build everything ourselves.
So to give you a sense of kind of where the dotted line is
around the Oxide product, what we're delivering
to customers is a rack-level computer.
So the minimum size comes in rack form.
And I think your listeners are probably pretty familiar
with this, but a rack is... You would be surprised.
It's basically, what are they, about seven feet tall?
Yeah, about eight feet tall.
Yeah, seven, eight feet. Weighs a couple thousand pounds.
You know, make an insulting joke about NBA players here.
Yeah, all kinds of these things.
Yeah, and big hunk of metal.
And in the cases of on-premises infrastructure, it's kind of a big hunk of metal hull,
and then a bunch of 1U and 2U boxes crammed
into it. What the hyperscalers have done is something very different. They started looking
at the rack level. How can you get much more dense, power-efficient designs doing things like
using a DC bus bar down the back instead of having 64 power supplies with cables hanging
all over the place in a rack, which I'm sure is what you're more familiar with. Tremendous amount of weight as well,
because you have the metal chassis for all of those 1U things, which in some cases you wind
up with, what, 46U in a rack, assuming you can even handle the cooling needs of all that.
That's right. You have so much duplication and so much of the weight is just metal separating
one thing from the next thing down below it. And there are opportunities for mass improvement,
but you need to be at a certain point of scale to get there.
You do. You do.
And you also have to be taking on the entire problem.
You can't pick at parts of these things.
And that's really what we found.
So we started at this sort of the rack level
as sort of the design principle for the product itself
and found that that gave us the ability to get to the right geometry,
to get as much CPU horsepower and storage and throughput
and networking into that kind of chassis
for the least amount of wattage required,
kind of the most power-efficient design possible.
So it ships at the rack level,
and it ships complete with both our server sled systems
and a pair of Oxide switches.
When I talk about design decisions,
do we build our own switch?
It was a big, big, big question early on.
We were fortunate, even though we were leaning towards thinking
we needed to go do that,
we had this prospective early investor who was early at AWS.
He had asked a very tough question that none of our other investors
had asked to this point, which is,
what are you going to do about the switch? And we knew that the right answer to
an investor is like, no, we're already taking on too much. We're redesigning a server from scratch
in the mold of what some of the hyperscalers have learned, doing our own root of trust. We're
doing our own operating system, hypervisor, control plane, etc. Taking on the switch could
be seen as too much. But we told them, we think that to be able to pull through
all of the value of the security benefits
and the performance and the observability benefits,
we can't have then this obscure third-party switch
rammed into this rack.
It's one of those things that people don't think about,
but it's the magic of cloud.
In AWS's network, for example, it's magic.
You can get line rate or damn near it
between any two points sustained.
That's right.
Try that in a data center.
You wind into massive congestion with top of rack switches
where, okay, we're going to paralyze this stuff out
over, you know, two dozen racks.
And we're all going to have them seamlessly
transfer information between each other at line rate.
It's like, no, you're not.
Because those top of rack switches will melt
and become side of rack switches and then bottom puddle of rack switches. It doesn't work that way.
That's right.
And you have to put a lot of thought and planning into it. That is something that I've not heard a
traditional networking vendor addressing because everyone loves to hand wave over it.
Well, so in this particular perspective investor, we told him, we think we have to go build our own
switch. And he said, great. And we said, you know, we think we're going to lose you as an investor as a result, but this is what we're
doing. And he said, if you're building your own switch, I want to invest. And his comment really
stuck with us, which is AWS did not stand on their own two feet until they threw out their
proprietary switch vendor and built their own. And that really unlocked, like you've just mentioned,
their ability, both in hardware and software, to tune and optimize to
deliver that kind of line rate capability. And that is one of the big findings for us,
is that we got into it. Yes, it was really, really hard. But based on a couple of design decisions,
P4 being the programming language that we are using as the surround for our silicon,
tons of opportunities opened up for us to be able to do similar kind of optimization
and observability, and that has been a big, big win.
But to your question of where does it stop?
So we are delivering this complete
with a baked-in operating system,
hypervisor, control plane.
And so the endpoint of the system,
where the customer meets it,
is either hitting an API or a CLI or a console
that delivers and gives you the ability to spin up projects. And if one is familiar with EC2 and
EBS and VPC, that VM-level abstraction is where we stop. That, I think, is a fair way of thinking
about it. And a lot of cloud folks are going to poo-poo it as far as saying, oh, well, just virtual machines.
That's old cloud.
That just treats the cloud like a data center.
In many cases, yes, it does.
Because there are ways to build modern architectures
that are event-driven on top of things like Lambda
and APIs, Gateway, and the rest.
But you take a look at what most customers are doing
and what drives the spend.
It is invariably virtual machines
that are largely persistent. Sometimes they scale up, sometimes they scale
down, but there's always a baseline level of load that people like to hand wave away the fact that
what they're fundamentally doing in a lot of these cases is paying the cloud provider
to handle the care and feeding of those systems, which can be expensive, yes, but also delivers significant
innovation beyond what almost any company is going to be able to deliver in-house. There is no way
around it. AWS is better than you are, whoever you happen to be, at replacing failed hard drives.
That is a simple fact. They have teams of people who are the best in the world at replacing failed
hard drives. You generally do not. They are going to be better at
that than you. But that's not the only axis. There's not one calculus that leads to, is cloud
a scam or is cloud a great value proposition for us? The answer is always a deeply nuanced,
it depends. Yeah. I mean, I think cloud is a great value proposition for most and a growing
amount of software that's being developed and deployed
and operated. And I think one of the myths that is out there is, hey, turn over your IT to AWS
because we have, or a cloud provider, because we have such higher caliber personnel that are really
good at swapping hard drives and dealing with networks and operationally keeping this thing running in a highly available manner that delivers good performance.
That is certainly true, but a lot of the operational value in an AWS has been delivered via software, via automation, via observability, and not actual people putting hands on things.
And it's an important point because that's been a big part
of what we're building into the product.
Just because you're running infrastructure in your own data center,
it does not mean that you should have to spend
1,000 hours a month across a big team to maintain and operate it.
And so part of that cloud hyperscaler innovation
that we're baking into this product is so that it is easier to operate with much, much, much lower overhead in a highly available, resilient manner.
So I've worked in a number of data center facilities, but the companies I was working with were always at a scale where these were co-locations, where they would, in some cases, rent out a rack or two.
In other cases, they'd rent out a cage and fill it with their own racks.
They didn't own the facilities themselves.
Those were always handled by other companies.
So my question for you is, if I want to get a pile of oxide racks
into my environment in a data center, what has to change?
What are the expectations?
I mean, yes, there's obviously going to be power and requirements
that the data center co-location is very conversant with.
But Open Compute, for example, had very specific requirements, power and requirements that the data center co-location is very conversant with. But open
compute, for example, had very specific requirements, to my understanding, around things like the airflow
construction of the environment that they're placed within. How prescriptive is what you've built
in terms of, do we need a building retrofit to start using uFox? Yeah, definitely not. And this
was one of the tensions that we had to balance as we were
designing the product. For all of the benefits of hyperscaler computing, some of the design
center for the kinds of racks that run in Google and Amazon and elsewhere are hyperscaler-focused,
which is unlimited power. In some cases, data centers designed around the equipment itself. And where we were headed,
which was basically making hyperscaler infrastructure available to the masses,
the rest of the market, these folks don't have unlimited power and they aren't going to be able
to go redesign data centers. And so, no, the experience should be, with exceptions for folks maybe that have very, very limited access to power, that you roll this rack in to your existing data center.
It's on a standard floor tile that you give it power and give it networking and go. how we can operate in the wide-ranging environmental characteristics
that are commonplace in data centers that folks own themselves,
colo facilities, and the like.
So that's really on us so that the customer is not having to go do much work at all
to kind of prepare and be ready for it.
One of the challenges I have is how to think about what you've done
because you are rack-sized.
What that means is that my own experimentation at home recently with on-prem stuff or smart home stuff involves a bunch of raspberries, pie, and a nook.
But I tend to more or less categorize you the same way that I do AWS outposts as well as mythical creatures like unicorns or giraffes,
where I don't believe that all these things actually exist because I haven't seen them. And in fact, to get them in my house,
all four of those things would theoretically require a loaning dock if they existed.
And that's a hard thing to fake on a demo signup form,
as it turns out.
How vaporware is what you've built?
Is this all on paper and you're telling amazing stories
or do they exist in the wild?
So last time we were on, it was all vaporware.
It was a couple of napkin drawings and a seed round of funding. I do recall you not using
that description at the time for what it's worth. Good job. Yeah. Well, at least we were transparent
when we were going through the race. We had some napkin drawings. We had some good ideas,
we thought. You formalized those and that's called Microsoft PowerPoint. That's it. 100%.
The next gender of AI play is take the scrunched up stained napkin drawing,
take a picture of it and convert it to a slide.
Google Docs, you know, one of those.
But no, it's got a lot of scars from the build and it is real.
In fact, next week, we are going to be shipping our first commercial systems.
So we have got a line of racks out in our manufacturing facility
in lovely Rochester,
Minnesota.
Fun fact, Rochester, Minnesota is where the IBM AS400s were built.
I used to work in that market, of all things.
Selling tape drives in the AS400.
I mean, I still maintain there's no real mainframe migration to Cloudplay because there's no
AWS400, a joke that tends to sail over an awful lot of people's heads because, you know,
most people aren't as miserable in their career choices as I am.
Okay. That reminds me. So when we were originally pitching Oxide and we were fundraising,
in a particular investor meeting, they asked, what would be a good comp? How should we think about what you are doing? And fortunately, we had about 20 investor meetings to go through. So
burning one on this was probably okay. But we may have used the AS400 as a comp,
talking about how mainframe systems did such a good job of building hardware and software together.
And as you can imagine, there were some blank stares in that room. But there are some good analogs to historically in the computing industry
when the industry, the major players in the industry,
were thinking about how to deliver holistic systems to support end customers.
And we see this in what Apple has done with the iPhone.
And you're seeing this as a lot of stuff in the automotive industry is being pulled in-house.
I was listening to a good podcast.
Jim Farley from Ford was talking about how the automotive industry historically outsourced all of the software that controls cars.
So Bosch would write the software for the controls for your seats.
And they had all these suppliers that were writing the software.
And what it meant was that innovation was not possible
because you'd have to go out to suppliers to get software changes
for any little change you wanted to make.
And in the computing industry in the 80s,
you saw this blow apart where firmware got outsourced.
In the IBM and the clones kind of race, everyone started outsourcing firmware and outsourced. In the IBM and the clones race,
everyone started outsourcing firmware
and outsourcing software.
Microsoft started taking over operating systems,
and then VMware emerged
and was doing the virtualization layer.
And this kind of fragmented ecosystem
is the landscape today
that every single on-premises infrastructure operator has to struggle with.
It's a kit car. And so pulling it back together, designing things in a vertically integrated manner
is what the hyperscalers have done. And so you mentioned Outposts. It's a good example of,
I mean, the most public cloud of public cloud companies created a way for folks
to get their system on-prem. I mean, if you need anything to underscore the draw and the demand
for cloud computing-like infrastructure on-prem, just the fact that that emerged at all
tells you that there is this big need. Because you've got, I don't know,
a trillion dollars worth of IT infrastructure out there
and you have maybe 10% of it in the public cloud.
And that's up from 5% when Jassy was on stage in 21
talking about 95% of stuff living outside of AWS.
But there's going to be a giant market of customers
that need to own and operate infrastructure. And again,
things have not improved much in the last 10 or 20 years for them.
They have taken a tone on stage about how, oh, those workloads that aren't in the cloud yet.
Yeah, those people are legacy idiots. And I don't buy that for a second because believe it or not,
I know this cuts against what people commonly believe in public, but company execs are generally not morons and they make decisions with context and
constraints that we don't see. Things are the way that they are for a reason. And I promise that
90% of corporate IT workloads that still live on-prem are not being managed or run by people
who've never heard of the cloud. There was a
decision made when some other things were migrating of, do we move this thing to the cloud or don't
we? And the answer at the time was, no, we're going to keep this thing on-prem where it is now
for a variety of reasons of varying validity. But I don't view that as a bug. I also, frankly,
don't want to live in a world where all the computers are basically run by three different companies.
You're spot on, which is like it does a total disservice
to these smart and forward-thinking teams
in every one of the Fortune 1000 plus companies
who are taking the constraints that they have.
And some of those constraints are not monetary
or entirely workload-based.
If you want to flip it around, we were talking to a large
cloud SaaS company, and their
reason for wanting to extend beyond the public
cloud is because they want to improve
latency for their e-commerce platform.
And navigating their way through the complex layers
of the networking stack at GCP to get to where the customer assets are that are in colo facilities
adds lag time on the platform that can cost them hundreds of millions of dollars.
And so we need to think beyond this notion of like, oh, well, the dark ages are for software
that can't run in the cloud and that's on-prem.
And it's just a matter of time
until everything moves to the cloud.
In the forward-thinking models of public cloud,
it should be both.
I mean, you should have a consistent experience
from a certain level of the stack down everywhere.
And then it's like, do I want to rent
or do I want to own for this particular
use case in my vast set of infrastructure needs? Do I want this to run in a data center that Amazon
runs? Or do I want this to run in a facility that is close to this other provider of mine?
And I think that's best for all. Then it's not this kind of false dichotomy of quality infrastructure or ownership.
I find that there are also workloads where people will come to me and say, well,
we don't think this is going to be economical in the cloud. Because again, I focus on AWS bills.
That is the lens I view things through. And the AWS sales rep says it will be. What do you think?
I look at what they're doing, and especially if it involves high volumes of data transfer, I laugh a good hearty laugh and say, yeah,
keep that thing in the data center where it is right now. You will thank me for it later.
It's, well, can we run this in an economical way in AWS? As long as you're okay with economical,
meaning six times what you're paying a year right now for the same thing, yeah, you can.
Wouldn't recommend it. And the numbers sort of speak for themselves, but it's not just an economic play. There's also the story of, does
this increase their capability? Does it let them move faster toward their business goals? And in a
lot of cases, the answer to that is no, it doesn't. It's one of those business process things that has
to exist for a variety of reasons. You don't get to reimagine it for funsies.
And even if you did, it doesn't advance the company and what they're trying to do any.
So focus on something that differentiates as opposed to this thing that you're stuck on.
That's right.
And what we see today is it is easy to be in that mindset of running things on premisespremises is kind of backwards facing because the experience of it is
today still very, very difficult. I mean, talking to folks and sharing with us that it takes
100 days from the time all the different boxes land in their warehouse to actually having usable
infrastructure that developers can use. And our goal and what we intend to go hit with Oxide
is you can roll in this complete rack-level system,
plug it in within an hour.
You have developers that are accessing cloud-like services
out of the infrastructure.
And that got countless stories of firmware bugs
that would send all the fans in the data center nonlinear
and soak up 100kW of power.
Oh, God. And the problems that you had with the out-of-band management systems. For a long time,
I thought DRAC stood for Dell, RMA, another computer. It was awful having to deal with
those things. There was so much room for innovation in that space, which no one really grabbed onto.
There's a really, really interesting talk at DEF CON that we just stumbled upon yesterday.
The NVIDIA folks are giving a talk on BMC exploits
and a very, very serious BMC exploit.
And again, it's what most people don't know.
First of all, the BMC, the Baseboard Management Controller,
is like the brainstem of the computer.
It has access to... It's a backdoor intostem of the computer. It has access to...
It's a backdoor into all of your infrastructure.
It's a computer inside a computer.
And it's got software and hardware
that your server OEM didn't build
and doesn't understand very well.
And firmware is even worse
because firmware written by an American Megatrends or other
is a big blob of software that gets loaded into these systems that is very hard to audit
and very hard to ascertain what's happening.
And it's no surprise when back when we were running all the data centers at a cloud computing
company that you'd run to these issues and you'd go to the server OEM and they'd
kind of throw their hands up. Well, first they gaslight you and say, we've never seen this
problem before. But when you thought you've root caused something down to firmware, it was anyone's
guess. And this is kind of the current condition today. And back to the journey to get here,
we kind of realized that you had to blow away that old extant firmware layer.
And we rewrote our own firmware in Rust.
Yes, done a lot in Rust.
No, it was in Rust, but on some level, that's what Nitro is,
as best I can tell on the AWS side.
But it turns out that you don't tend to have the same resources as a one and a quarter, at the moment, trillion dollar company.
That keeps valuing.
At one point, they lost a comma, and that was sad and broke all my logic for that. And I haven't fixed it since. Unfortunate
stuff. Totally. I think that was another kind of question early on from certainly a lot of
investors was like, hey, how are you going to pull this off with a smaller team? And there's a lot of
surface area here. It's certainly a reasonable question. Definitely was hard. The one advantage, among others, is when you are designing something in a vertical,
holistic manner, those design integration points are narrowed down to just your equipment.
When someone's writing firmware, when AMI is writing firmware, they're trying to do it to cover
hundreds and hundreds of components across dozens and dozens of vendors.
And we have the advantage of having this purpose-built system
kind of end-to-end from the lowest level,
from first boot instruction all the way up through the control plane
and from rack to switch to server.
That definitely helped narrow the scope.
This episode has been fake-sponsored by our friends at AWS
with the following message.
Graviton, Graviton, Graviton, Graviton, Graviton, Graviton, Graviton, Graviton, Graviton. Thank you for your
lack of support for this show. Now, AWS has been talking about Graviton an awful lot, which is
their custom in-house ARM processor. Apple moved over to ARM, and instead of talking about benchmarks
they won't publish and marketing campaigns with words that don't mean anything, they've let the
results speak for themselves. In time, I found that almost all of my workloads
have moved over to ARM architecture for a variety of reasons, and my laptop now gets 15 hours of
battery life when all is said and done. You're building these things on top of x86. What is the
deal there? I do not accept that you hadn't heard of ARM until just now because, as mentioned, Graviton, Graviton, Graviton.
That's right. Well, so why x86 to start? And I say to start because we have just launched our first generation products. And our first generation, our second generation products that we are now underway working on are going to be launching a Genoa sled.
But when you're thinking about what silicon to use, obviously, there's a bunch of parts that
go into the decision. You're looking at the applicability to workload, performance,
power management, for sure. And if you carve up what you are trying to achieve, x86 is still a terrific fit for the broadest set of workloads that our customers are trying to solve for.
And choosing which x86 architecture was certainly an easier choice come 2019.
At this point, AMD had made a bunch of improvements in performance and energy efficiency in the chip itself.
We've looked at other architectures, and I think as we are incorporating those in the future roadmap, it's just going to be a question of what are you trying to solve for?
You mentioned power management, and that has commonly been low-power systems is where folks have gone beyond x86 is we're looking forward to hardware acceleration
products and future products will certainly look beyond x86 but x86 has a long long road to go
it still is kind of the foundation for what again is a general purpose
cloud infrastructure for being able to slice and dice for a variety of workloads.
True. I have to look around my environment and realize that Intel's not going anywhere.
And that's not just an insult to their lack of progress on committed roadmaps
that they consistently miss. But enough on that particular
topic, because we want to keep this polite.
Intel has definitely had some struggles, for sure.
They're very public ones.
I think we were really excited and continue to be very excited about their
Tofino silicon line.
And this came by way of the Barefoot Networks acquisition.
I don't know how much you had paid attention to Tofino,
but what was really,
really compelling about Tofino
is the focus on both hardware and software
and programmability.
So great chip, and P4 is the programming language
that surrounds that.
And we have gone very, very deep on P4.
And that is some of the best tech to come out of Intel lately.
But from a core silicon perspective for the rack,
we went with AMD.
And again, that was a pretty straightforward decision
at the time.
And we're planning on having this anchored
around AMD silicon for a while now.
One last question I have
before we wind up calling it an episode.
It seems that at least as of this recording, it's still embargoed,
but we're not releasing this until that winds up changing.
You folks have just raised another round,
which means that your napkin doodles have apparently drawn more folks in.
And now that you're shipping, you're also not just bringing in customers,
but also additional investor money.
Tell me about that.
Yes, we just completed our Series A. So when we last spoke three years ago, we had just raised
our seed and had raised $20 million at the time. And we had expected that it was going to take
about that to be able to build the team and build the product and be able to get to market.
And I think tons of technical risk along the way. I mean, there
was technical risk up and down the stack around this de novo server design, this switch design,
and software is still the kind of disproportionate majority of what this product is,
from hypervisor up through kind of control plane, the cloud services, etc.
So we just view it as software with a really, really confusing hardware dongle.
Yeah.
Super heavy. We're talking enterprise and government grade here.
That's right. There's a lot of software to write.
We had a bunch of milestones that, as we got through them,
one of the big ones was getting Milan Silicon booting on our firmware.
It was funny.
This was the thing that clearly like the industry was most suspicious
of us doing our own firmware.
And you could see it when we demonstrated booting this like a year and a half
ago and AMD all of a sudden just lit up from kind of arm's length to like,
how can we help? This is amazing. You know?
And they could start to see the benefits of when you can tie low level silicon
intelligence up through a hypervisor. No, I love the existing firmware I have. It looks like it was written in 1984
and winds up having terrible user ergonomics and hasn't been updated at all. And every time
something comes through, it's a 50-50 shot as whether it fries the box or not. Yeah, no, I want
that. That's right. And you look in these hyperscaler data centers and it's like, no.
I mean, you've got intelligence from that first boot instruction
through a root of trust
up through the software of the hyperscaler
and up into the user level.
And so as we were going through
and kind of knocking down
each one of these layers of the stack,
doing our own firmware,
doing our own hardware root of trust,
getting that all the way plumbed up
into the hypervisor and the control plane.
Number one, on the customer side,
folks moved from, this is really interesting.
We need to figure out how we can bring cloud capabilities
to our data centers.
Talk to us when you have something.
To, okay, we actually,
back to your earlier question on vaporware,
it was great having customers out here to Emeryville
where they can put their hands on the rack
and they can put your hands on software, but being able to look at real running software and that end cloud experience.
And that led to getting our first couple commercial contracts.
So we've got some great first customers, including a large department of the government, of the federal government, and a leading firm on Wall Street that we're going to be shipping
systems to in a matter of weeks. And as you can imagine, along with that, that drew a bunch of
renewed interest from the investor community. Certainly a different climate today than it was
back in 2019. But what was great to see is you still have great investors that understand
the importance of making bets in the hard tech
space and in companies that are looking to reinvent certain industries. And so we added
our existing investors all participated. We added a bunch of terrific new investors,
both strategic and institutional. And this capital is going to be super important
now that we are headed into market and we are beginning to scale up the business and make sure
that we have a long road to go. And of course, maybe as importantly, this was a real confidence
boost for our customers. They're excited to see that Oxide is going to be around for a long time
and that they can invest in this technology as an important part of their infrastructure strategy.
I really want to thank you for taking the time to speak with me about, well, how far you've come in
a few years. If people want to learn more and have the requisite loading dock, where should they go
to find you? So we try to put everything up on the site. So oxidecomputer.com or oxide.computer.
We also, if you remember, we did On the Metal. So we had a Tales from the Hardware Software
Interface podcast that we did when we started. We have shifted that to Oxide and Friends,
which the shift there is we're spending a little bit more time talking about
the guts of what we built and why.
So if folks are interested in like,
why the heck did you build a switch and what does it look like to build a
switch?
We actually go to depth on that and you know,
what does bring up on a new server motherboard look like?
And we've got some,
some episodes out there that might be worth checking out.
We will definitely include a link to that in the show notes.
Thank you so much for your time.
I really appreciate it.
Yeah, Corey, thanks for having me on.
Steve Tuck, CEO at Oxide Computer Company.
I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud.
If you've enjoyed this podcast,
please leave a five-star review
on your podcast platform of choice.
Whereas if you've hated this episode,
please leave a five-star review on your podcast platform of choice. Whereas if you've hated this episode, please leave a five-star review on your podcast platform of choice, along with an angry ranting
comment because you are in fact a zoology major and you're telling me that some animals do in
fact exist. But I'm pretty sure of the two of them, it's the unicorn. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business and we get to the point.
Visit duckbillgroup.com to get started.