Embedded - 342: That Girl's Brain

Episode Date: August 27, 2020

Jess Frazelle (@jessfraz) of Oxide Computer (@oxidecomputer) spoke with us about hyperscalers (large companies that make their own datacenter server hardware) and podcasts.  Jess wrote an article abo...ut the power efficiency measurements of datacenter servers: Power to the People (ACM Queue August 2020). The Oxide podcast is available on oxide.computer/podcast as well as your usual podcast apps. Jess particularly recommended the episode with Jonathan Blow. Oxide is working to make hyperscaler-style hardware available to everyone. Their goal is to open source all their hardware and software: github.com/oxidecomputer. They use the Rust language for much of their development. Jess has a blog: blog.jessfraz.com Rust

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Embedded. I am Alessia White, alongside Christopher White. Our guest this week is Jesse Frazell, and we're going to talk about computers, computers, maybe Rust, and computers. Hi, Jesse. Thanks for putting up with the technical difficulties. Thanks for having me. I'm excited to be here. Could you tell us about yourself? Yeah. So I've been a computer nerd for a long time.
Starting point is 00:00:36 I've worked on a lot of different projects, whether it's like the Go programming language, Docker. And now, most recently, we started a computer company. And that's Oxide? Yes, Oxide Computer Company. What do you make? I mean, you must make computers because you introduced it as a computer company, but I think not laptops.
Starting point is 00:00:59 Yeah, I mean, I would love to one day do that. But right now, we're making RackScale servers. So not even individual servers we're making. We're actually selling racks of servers themselves. So you get a bunch of servers already packaged into a rack. Okay, we're going to talk about why that's interesting in a few minutes. But first, I want to do lightning round. We'll ask you short questions.
Starting point is 00:01:23 We want short answers. Are you ready yes how much malt powder is required to change a milkshake into a malt okay so i mean this is a contentious question especially for me uh you must have gotten this question from someone internal at oxide because um i have an obsession with malt powder um i like double what most normal people want. So I mean, that might be like eight scoops or something ridiculous. Do you have a favorite vintage computer? Oh, that's hard. I really like the old Macs because they're fun to play with. And they have some of my favorite games just from my own past history um but i also love like the 486 they have has a bunch of like old uh software that like my my grandpa wrote so there's like a uh
Starting point is 00:02:12 sentimental aspect there when you say old max do you mean apple 2e or ifrude wait wait wait wait stop apple twos were not max yeah this is like the Macintosh SE. You can't say that. Yeah. I'm sorry. I'm so ignorant. Favorite processor? Ooh.
Starting point is 00:02:38 So currently it's the AMDs because they're just like the most powerful that's on the market today. I mean, ARM has like a lot of promise, but it's just not there yet from what I've seen. Yeah, it's hard. I also love RISC-V as far as like instruction sets go because it's open source. And I think that there's like a huge amount of potential there when it comes to like what the future will look like in like 20 years or something as to what people are running. What's your least favorite programming language? Like, I really like all of them. I would say any language that is not compiled because um i mean i love to you know program
Starting point is 00:03:27 by compile time errors which sounds terrible but like i like to refactor a lot and the easiest way to catch mistakes is that way um so uh yeah any language like like javascript now with typescript is great because like you can kind of get like those types errors uh but i don't think i have a least favorite one honestly it sounds like your least favorite might be python yeah i mean i do love python on occasion though i mean it depends but yeah python is hard to refactor well uh would you rather be a superhero or a super villainous um yeah so i mean i like 95 percent superhero but then on the occasion when you need to like really get something done maybe that like five percent of super villain comes out i don't know do you have a favorite fictional robot um yeah so i have a couple um i love wally like he just
Starting point is 00:04:22 like touches my heart um in a in a place and Betamax from Big Hero 6 because he's just like, he's like this caregiver and then they kind of turn him evil, but then he goes back to being like the nice guy again, which is cute. Okay. On the website, it says you make hyperscaler infrastructure. How is that related to large racks of computers? Yeah. So basically what hyperscalers like Facebook, Microsoft, Google, all of those huge companies did is they were pretty much sick of using like any proprietary vendors hardware for their computers, because like mostly what you're doing is just like stacking a bunch
Starting point is 00:05:06 of boxes on top of each other. And it kind of makes no sense when you're running at like this super large scale, because like on those boards, you get a lot of things that like you don't need in a data center. Like you don't need like a mouse, you don't need like VGA, you don't need a lot of things. So they kind of redesigned everything, getting rid of all the old like existent desktop components of this hardware um and then they wrote all the software on top of it so what you get is actually like way more dense racks so you can pack a lot of compute in there um and you also own the entire stack so when it comes to like bugs no longer are like uh these vendors that you're kind of like slap sticking together to build like this data center they aren't like
Starting point is 00:05:50 pointing fingers at each other like sure you have like um you know a cpu vendor and and and vendors for like storage and stuff like that but you get rid of a lot of the like stuff on top like software vendors and stuff like that because they wrote all of the stack um so uh you you get like this really nice way to deploy it like speeds up productivity um and and you get a lot more compute for the like power that you're using and and uh yeah it really just cleaned things up throughout so so we're basically doing that for everyone else, everyone who doesn't have a team to build out a whole hardware infrastructure for their business because it's not what they do as a business. So we allow all the rest of Fortune 500 companies or whatever
Starting point is 00:06:38 to have what these hyperscalers have internally without all the work. Okay, I'm going to go back to the word hyperscalers have internally without all the all the work okay i'm gonna go back to the word hyperscaler i didn't realize that that meant just the big companies the companies that are big enough to do this sort of thing yeah i mean i i think that it means more than that it also means like like um the companies who run at this massive scale but but it is like even companies that run at this massive scale, but, but it is like, even companies that run at massive scales, they don't necessarily have, have the skillset or, or the right people internally to do it. It totally makes sense. I mean, this seems like something if you're,
Starting point is 00:07:17 if you're not a software company that you should just buy. I mean, this is, this is something that is kind of boring. It's almost a commodity. You're not a software company or not a hardware company? Totally. I meant not a software company. Not a hardware company, too. I mean, you'd have to become a hardware company partially to do this,
Starting point is 00:07:39 but it's the Facebook, Google, as Jesse mentioned. Okay, so the reason they do it is because it is more cost effective for them in the end. Yeah. How come big companies aren't doing this? Dell or Apple or any of the computer manufacturers? So they're trying um uh like like you can get a dell a set of dell boxes with like vmware on top and like they're kind of trying to make this thing that is like all put together but like it's still it's two different companies within dell so that you really aren't getting like um this really integrated software and hardware experience. And also like you aren't getting on top,
Starting point is 00:08:27 like really a modern, uh, interface and API for deploying. Um, there's a lot of just, you know, um, uh,
Starting point is 00:08:35 like old thoughts there. Um, but yeah, like it's, it's very different than what you would see inside an actual hyperscaler. And it's, it's different, definitely different definitely different than what we're doing because it is still the old hardware at the bottom. You still have all the desktop stuff that you don't need in a data center, which makes no sense. So we're definitely getting rid of all of that.
Starting point is 00:09:02 We even went so far as to like not even have a bmc on the servers we have like a a very traditional service processor instead so that the bmc like has uh less uh control now like i mean the bmc should have never had so much privilege in the first place um but people just kept adding features there because it was where they needed to add features but um now like we've taken out a lot lot of that stuff that doesn't need to be there. And actually, the BMC just does what the BMC should do. Well, the service processor in our case. And that's like boot the machine.
Starting point is 00:09:36 What does BMC stand for? Baseboard Management Controller. Okay. So what do these look like architecturally from the hardware standpoint so i worked on rack scale routers a long time ago and it was one thing it had a central central control processor plane and it had a bunch of line cards and things but it was one unit when i think about data centers i think of a rack full of individual blade servers. Is this a rack full of custom designed individual blade servers, or is it a rack full of
Starting point is 00:10:10 one kind of purpose built thing? Yeah. So it's a bunch of like what we're calling sleds, because they're more the like open compute project form factor where like, so in a rack, you have like the width of the rack. So each sled is actually half the width of the rack. And then it's like, like one OU or two OU like height. These are the open like units for in the open compute project. That's how they're defined. But yeah, so we have just a bunch of those sleds in there. And that's what makes up the rack. When I worked at HP in their net server division 9 million years ago, we had monitors to make sure that the server was doing what it should.
Starting point is 00:11:13 Do you have a, does each rack have a monitor yeah so so um each rack has like uh rack controllers which internally does all the software for like distributing compute and like managing like networking for the rack like it does all the kind of like smart things in between i mean we have um also a top of rack switch which might be in the middle of the rack. It might be a middle of rack switch. We have two of those for making sure if one goes down, we have another. But yeah, so these rack controllers, if you have multiple racks, all of them are going to talk together. And then in this kind of like one pane of glass, you have this user interface where you can see all the racks at a given point of time and then everything that's going on.
Starting point is 00:11:50 And you can drill into individual racks and then see like, oh, like this rack, this disk needs to be replaced or, you know, just anything like the latency on this like network cable is bad. You can drill into any any finite amount of detail. Do you still use SNMP? That I do not know.
Starting point is 00:12:16 It's a very old protocol, although it is still in use, that describes the health of units monitoring like that. And I just wondered what the latest technology, I mean, what is this view built upon? So this is going to be built on all of our software that we're writing, even from the firmware up. So I think that largely we won't use something like that, but I actually have no idea. I'd have to ask one of our engineers. Fair enough. How did you decide that this was a good idea? How did you say this was a space that was worth doing? Uh, so I think that like between the three of us, like me, Brian and Steve, like we all had a different process for going about this, but mine was mostly talking to people who are already running on premises.
Starting point is 00:13:10 And I talked to like a bunch of different people, just like cold reached out to people too, who like, I knew that they had to have like some sort of on premises, like, uh, existing hardware. And then I wanted to see like what problems they had. And I was, cause I wanted to make sure like we were actually solving a problem for people. And it turns out like it's a huge problem for people running on premises. Like, like there's problems all over the stack. Like it could be in the firmware where you have like a huge outage and then you get like two vendors basically pointing fingers at each other and your bug doesn't actually get fixed. Like your, your business is actually on the line because
Starting point is 00:13:43 like these two vendors are pointing their fingers at each other. Or it's bugs higher up in the stack because you have like some software layer on top that was then talking to the hardware. And then those layers got basically pointing fingers at each other. So almost by removing all this, like by putting this together
Starting point is 00:14:04 as like almost one unit uh then then you remove a lot of that like we don't know what's happening between these two layers also by open sourcing a lot of it you you get um visibility into the layers of the stack as well and a lot of these like the folks that i talked to um they don't have like on-premises necessarily like like nice apis that you would get in a cloud uh like it's not it's not like an ec2 it's not like gcp where you're deploying bms like it's not kind of the same api like very developer friendly experience um so it seemed like there was a lot that we could do there as well and so now they can just point the finger at you yeah which is fine i'm fine with having a finger pointed at me uh open source how much of what you're doing is open source yeah anything that
Starting point is 00:14:58 we are writing software wise is going to be open source even like we're going to open source the hardware as well but um uh we're definitely going to open all of it the things that might not be open source because of um like vendors where we're wrapping like say amds you know proprietary firmware for their silicon like uh that we can't open source that but but we're working with a lot of vendors to get these down to like the lowest, lowest bits so that we open source as much as possible. Do you have anything open sourced already or is it all future? Yeah, there's a few things on our GitHub. It's github.com slash oxide computer. I think there's just a few things with regard to the API, but yeah, we're just open sourcing random stuff as we go. You talked about open compute. That's where Microsoft, Google,
Starting point is 00:15:53 Facebook got together and said, we want our computers to look like this so that we stop having to deal with all you people. We just want them to be the same. Yeah. Okay. Why are you doing that? I mean, of course, if you do eventually want to sell into the bigger data centers, the hyperscalers, you would need to conform to that. But if you're not selling to them, why not do your own? So yeah, no, it's super true. And that's a really good point. What we actually are doing is a little bit different than what they're doing because so there's the Open Compute Project and then there's this other one called Open 19. And Open 19 was like some like LinkedIn kind of took their design for hardware and put it into a different foundation, basically. But there's good parts of both. So like Open19, you get this really nice cabling on the back of the rack for networking, and
Starting point is 00:16:55 you can just basically like pop it right into the network slot versus having individual cables that you have to unplug. So we're going to take that. And then we're also going to take the power bus bar from the Open Compute Project. We're basically taking the best parts of both and putting them together, which is not what people in the Open Compute Project typically do. And I don't want to ask how much is a rack because that seems odd. But if I wanted to make a CPU or if I wanted to make my own chip, I know that I shouldn't even consider it if I don't have $2 million. Two.
Starting point is 00:17:38 Well, that's a small chip. What scale of business are you looking for? Yeah, we're looking for pretty large scale businesses. Like sadly, this won't be the type of server that's just running in someone's garage. I mean, unless they have the money to do so, that would be really fun. I mean, I would love that. But yeah, we're really targeting like Fortune 500 companies,
Starting point is 00:18:01 large enterprises. Cool. Speaking of running it in your garage, if I wanted one, what kind of power would I have to install in my house? Yeah, so here's where things get interesting. Because a lot of enterprises and traditional companies, they either are running them in their own data centers or they're running them in colos. It's a little bit different than like hyperscalers kind of can draw as much power as they want to.
Starting point is 00:18:30 Like they get up to like a huge amount of power on the draw. But like for us, it was actually hard because like we need to like conform to basically all these colos where like getting 16 watts or whatever, it's the maximum that you can get, whereas a hyperscaler, you would go well above that. So we have that as a restraint. And so some of these racks might not be fully populated based on the power that people can draw from them, but then we can also start filling it in as they realize how much power they're actually drawing. So say start with half a rack, and then we'll give them servers once they realize that they can handle the capacity for that.
Starting point is 00:19:16 I seem to recall the telcos used to want DC power, huge high amperage DC power. Is that still the case? We haven't actually talked to that many telcos. I mean, that's been true for some data centers as well, that they want DC power, big DC power, lots of DC power. Which actually brings me to, you recently wrote an article about data centers and carbon footprints. Could you summarize? Yeah.
Starting point is 00:19:53 So I started out looking up, well, actually, okay. So it started as one of our hardware engineers. He wrote this really nice thing internally for how power works. And then I got kind of nerd sniped by it. And I was like, whoa, this is really interesting. and i didn't know like anything honestly before writing this article and i like super dug into it because i was like super interested in where basically we could help people uh on their their carbon footprint because it seems like there's a lot that we could do to get like the power usage better and so i looked looked into what all the hyperscalers basically are doing about their carbon footprints and how they're looking at that.
Starting point is 00:20:33 And then I really dug into just how this works in general, what power usage efficiency is, and how each hyperscaler does power, because actually Microsoft has a different way of doing it than like Google and Facebook, which is interesting. In this week of ash raining over the San Francisco Bay Area and lightning storms predicted for more fires, why does carbon footprint matter? Yeah, so's super matters. Like sadly, like the ash is due to like a natural disaster of the fires, but like with, with data centers, like that's us, like, like we are the enemy there. Like we're the ones that causing it. So, so if there's any way that
Starting point is 00:21:19 we can make it more neutral, then it's like a huge deal. So what are some of the things that are different between the different hyperscalers? Yeah. So Microsoft didn't go the route of using a power bus bar, which is like just this huge like metal thing on the back of the rack that serves power. They actually do individual cables. And when I first like started looking into this, I was very naive, but I was like, whoa, like I wonder, whoa, I wonder if by making this decision, Microsoft is actually causing them to use a lot more power where they don't need to. And it actually turns out it's pretty negligible. There's a lot of nuance here when it comes to the right thing for the job. But as long as you have cables that are very very very high quality which i assume they are they do then then you're getting the same um you're getting the same
Starting point is 00:22:11 power draw as as as with a power bus bar um and i i don't know exactly why they did it that way um there's there's a couple talks on it the first time that way and it worked yeah it's true there's, there's a couple of talks on it the first time that way. And it worked. Yeah, it's true. There's a couple of talks on it, but you, but you, you don't get the like, uh, serviceability gains that you get from, from having the bus bar. Because like when you, when you have to service one of these things, you have to like go unplug the cable. Whereas like with, um, with the bus bar, you just like pop the server in and out from it. There's no cable.
Starting point is 00:22:43 It's like super nice. And in the article, you said that Google sets its data centers at 80 degrees Fahrenheit, 20C, instead of the usual 70F, 21C. Does that matter that much? So it depends. I've been looking into this, and I would actually be curious, honestly, Google's take on this because it seems like there's a few articles that say the original studies, which say like you should get to a max of like 77 degrees before your hardware starts, like it's damaging to the hardware. Those studies are pretty old. So like a
Starting point is 00:23:26 lot of people are like, you know, Google doing this, maybe it's not that damaging, but I would actually be interested in Google's take on it because, uh, since, since they've been the ones doing this for the longest amount of time, they would probably have the best take, but it does seem like the industry is more coming towards, you know, maybe we should retake a look at like how CPUs get damaged in heat or something like that. There's, there's like a opportunity for improvement. That's a huge part of the whole power draw too, because if folks haven't been in a data center, the ones I've been in are, you have to put earplugs in before you go in because the, the sound of the air
Starting point is 00:24:05 conditioners and the fans and the racks are so loud that it's damaging to human hearing. It's just so much cooling going on. It must draw just a huge, huge portion of the total power. Totally. So power usage efficiency, P-U-E, does that measure the cooling? Does that measure the processing? How do, what is that? How is it measured? So that's the total energy required to power. And that includes like lights, cooling, anything that's like within the building and drawing power. And then that's divided by the energy used for servers. Okay, so that's the overhead. It measures the overhead versus what the server is doing.
Starting point is 00:24:58 But is the server doing computing or is it doing monitoring? Or is it just sitting there doing nothing and drawing power for no reason? I mean, it depends, actually. Like, PUE is actually like this actual point of contention, because when I started asking people for feedback on this article, it was funny because a lot of them were like, hey, you can't really take PUE seriously. Like, especially like numbers people give out. A lot of people who give out their PUE numbers, you know, they might not be taking into even like environment from the outside, especially if like a data center is located in a hot and humid place. Like there's a bunch of stuff that isn't included.
Starting point is 00:25:32 So like the workloads being used, definitely like not included. Okay, so PUE doesn't measure everything and may be a confusing measure? Yeah, I mean, it can be done well and it can be done poorly, basically. measure everything and maybe a confusing measure? Yeah, I mean, it can be done well and it can be done poorly, basically. People don't give a lot of information as to like what exactly they put into their PUE numbers. But if they're transparent about it, then it's way better. Sounds like all benchmarks. Yeah.
Starting point is 00:26:10 This article was posted in ACm q what is that oh yeah so q um is this magazine put out by acm and it's it's it's more oriented towards practitioners so like ieee has spectrum which like i absolutely love um it is super nerdy like they talk about like robots and like all this crazy stuff like drones but q is more oriented towards like what people uh like engineers actually do day-to-day in their jobs um and problems they encounter and it it's more it's more oriented towards practitioners than academics um yeah that's useful is it mostly computer stuff or is it more mathematics um there are a lot of mathematic uh articles um it depends on who writes them honestly so mine are mine are typically like i just dive into something random um there's another one that's more like an ask alice or whatever but um you're
Starting point is 00:27:00 like getting software help from one of the authors which is hilarious because he's very blunt um it's a it's a mix uh you also have a podcast yes tell me about our podcast is pretty cool uh we started it before we started the company um so so then we like had a series of episodes that we could release and it was mostly people who we had talked to like before starting the company that we had learned things from, and that had like interesting stories about things between the software and hardware interface. So it's, it's mostly stories. There, there is a romantic aspect of it. Some episodes are more romantic than the other. We like to talk about old computers, people's love of computing. And you wrapped up a season in February,, it's been really hard to do that.
Starting point is 00:28:06 We do have like a bunch of we have a few episodes that are already recorded. Then we had a bunch of people who we had lined up to be on it. And then things kind of got derailed. So we're hoping to get started again soon. But we'll see. Yeah. You claim it's the nerdiest podcast on the planet. Yeah. but we'll see yeah you claim it's the nerdiest podcast on the planet yeah um i i feel like i've learned a lot from the podcast it's funny because i'll still listen to episodes and in doing show
Starting point is 00:28:33 notes like i even learn stuff from going back and doing the show notes it's hard actually um when recording the podcast i i've found that i miss a lot of things uh just because i'm like trying to keep the conversation going or trying to catch it all and then like um it on the re-listens uh i get a lot more out of it so i don't know i think it's like the thing that keeps on giving that seems very familiar yes it does um what made you start a podcast oh we thought it would be fun um and a way to like you know break up just like working day-to- fun. And a way to like, you know, break up just like working day to day. And a way to just like get folks opinions and their own experiences from working on hardware. I mean, I think what I love about it is that a lot of folks that we saw during our raise and
Starting point is 00:29:20 stuff like they, a lot of people think that like this this like lower level programming and lower level engineering and like really hard tech like that it's like a dying breed like people don't like really see it anymore because there's so many sass companies it's like actually no like it exists and so like this is our way of being like you know hello there's people down here like everyone has stuff down here and it's like a huge thing and we're trying to kind of like unravel that at the same time you mostly do talk about computers as opposed to embedded devices is that right yeah um yeah there's not much embedded we we could like we could we could adventure in that area it's's definitely, it's definitely on topic.
Starting point is 00:30:05 It's, I think that we didn't include that. Just like we didn't purposefully not include that. It was just more like a, it just happened. Oh yeah. I mean, you haven't done that many,
Starting point is 00:30:18 so you still have a lot of things you can explore before you get to origami, glass blowing, all of the other things we have kind of so we started on some of those early so that's true do you have any uh i don't like asking people for their favorite episodes but but if you if somebody wanted to check out the podcast would you recommend like oh start with this one so i guess one of my funniest stories would be like we had jonathan blow on the podcast
Starting point is 00:30:46 and um he wrote the the witness like the video game and i played that game like i was super stoked for him to come on it because i played the game but i didn't want to make it weird so like because i can make it really weird really easily and so i was like okay don't be like this weird fangirl um but so i had all these questions for him about like these videos that are like embedded into the game. And I didn't ask any of them. And then like, basically he left and then like, we're going back over it. And I was like, okay, but like,
Starting point is 00:31:16 I was trying to explain to like Brian and Steve, like these videos that are embedded into the game. Like they drove me nuts for like days. I was trying to figure out like the meaning and like all this stuff. And like, I mean, I'm a crazy person when it comes to video games and i finish them in like two days and like won't shower won't eat like i just like have to get it done and so like this was all the crazy that i was trying to hide from him and brian and steve are just like oh my god
Starting point is 00:31:36 like like i don't even know what to do with all this information that you're giving us but so in in the recap episode i basically go over all of this. And then I was just hoping that Jonathan Blow would listen to it and be like, what the hell was going on in that girl's brain during this episode? But I don't think he did. Do you have a lot of interaction with listeners? Oh, what do you mean? I mean, do they email you and tell you you're wrong like ours do? Oh, so yeah, no, we've had a few of those and they're pretty funny, honestly. I mostly forward them to Brian.
Starting point is 00:32:11 But yeah, no, we get a lot of, and it comes in over our catch-all email address, which I mean, we get a lot of fun stuff in there. And it's nice to get the complimentary emails, although they don't stick as long in my head as some of the others. Totally. I want to go to some of our listener questions, one of which is your co-worker, Rick Arthur. That doesn't seem right. No, it really doesn't, does it?
Starting point is 00:32:37 Why open source firmware? Yeah. Wait a minute. Let's actually go back to just firmware. When I say firmware, it's what's running on my processor. It's not really software because it doesn't really interface with people. And it's not hardware because it's typey typey C, C++. But when you say firmware, you mean something else, don't you so no i think that we have the same thoughts there i mean it is it we we also are going to have like firmware running on a bunch of microcontrollers and stuff like that um i think it's it's basically the same definition okay i i thought you i thought it was the part the bios part oh yeah no we definitely have that i i would consider the bios what
Starting point is 00:33:25 runs on the cvu no or is that not you would consider the bios firmware yeah it was when i worked on servers a second time uh for park and they were calling it firmware and i was like what do you mean this isn't firmware this is a giant. You don't run firmware on this sort of thing. But yes, it was where I encountered BIOS's firmware. I mean, that has long been one of those things that nobody wants to give you. It's very expensive to get BIOS code if you want to be able to modify it. But now you're going gonna open source it yeah i mean mostly this comes from i mean when i was talking to a bunch of people about like their pain with running on premises like a ton comes from the firmware and it comes from the like
Starting point is 00:34:19 the lack of visibility there it's like you don't know when things go wrong, why they're going wrong, which drives me nuts. And, and like you, you get the vendors pointing their fingers at each other. And then you also like talk to like, uh, members of this team. And then you get like routed to like a bajillion different teams. And it just seems like no one knows how this thing actually works, which is nuts. And like, by making it open source one, like when we fix bugs, it's very visible, like where the bug was, like where it came from. Like that just helps like me personally sleep at night because I actually know like, oh, that thing got fixed. Like look at those lines of code that changed or whatever. And then you actually know what the bug actually
Starting point is 00:34:58 was, which you can never get a straight answer from a vendor on. And then also when it comes to like security, um, the, these security um these like a bunch of the stuff running in the lower levels of the stack like all of this has like way too many privileges like they uh they they have full power basically over the computer and um it's also the code that we know the least about which is super messed up um and so you get a bunch of vulnerabilities. We saw this with like Rick found the vulnerability in the VMCs. I mean, there's just a bunch of vulnerabilities there. The Bloomberg kind of like huge expose where they thought that like the,
Starting point is 00:35:42 the supply chain had been modified. Like that was, that was interesting, but also like why even go through the supply chain had been modified. Like, that was interesting. But also, like, why even go through the supply chain when you could just walk through the firmware? Because, like, there's so much in there. Intel has, like, huge web servers in there that no one knew about for a long time. And it's just, like, all this stuff just needs to be opened up
Starting point is 00:35:59 so that we have more eyes on it. There can be more audits. And people can know what's actually happening in their computers. It seems like people don't even know what's in there, despite not even how does it work. But, oh, look, there's a little port you can connect to and mess with Intel's firmware. It's like, why is this here?
Starting point is 00:36:18 It's a mystery, yeah. Well, to be fair, and having seen some of this code, it's built on layers. I mean, it's huge layers of layers of layers because they have to support a whole bunch of different interfaces. And I mean, all the mice, all the keyboards, all the processors running this speed and that speed and monitoring for this exception and that exception. And it started out one code base 25 years ago, and that code is still running. Yeah. And that's like also why we have this opportunity to clean it up is that like, we don't need all those drivers for mice. We don't need all the drivers for keyboards. No, like our customers will never actually interact with the firmware because like, it makes sense, honestly, why,
Starting point is 00:37:09 why like the BMC even got so packed with a lot of code. It's because like when that's all you expose to people, that's where you have to put features. But, but most features shouldn't have that level of privilege. And so we actually get a chance to clean it up. So the level of privilege, it seems like the firmware does need a high level because it needs to wake everything up. It needs to be able to talk to the internet. Well, I don't know if it needs to talk to the internet. That's different. But it needs to talk to the memory and the CPU and the hard drive.
Starting point is 00:37:47 What privileges are you taking away? So, so it's still going to be very, very privileged. What we're taking away is like these feature sets where, you know, it does have to talk over the internet or it does have to like interface with a bajillion different keyboards and mice and all this, like, um, all these vendors for various things. Like we don't, we don't need all that. We, we only have certain vendors that we're working with and stuff like that. So we don't have to have like crazy interfaces for doing whatever we need. Like actually all we need to do is boot and like interact with what's on the board. Yeah. Sometimes starting over is easier than maintaining. Oh, it's yeah. It's the basis for a million startups. You do a lot of automation in your work at Oxide. Uh, what kind of automation do you do? Yeah. So I love automating things. I mean, I do this at home too. I all automate anything that can be automated. So, um, for the company, like we,
Starting point is 00:38:47 we're still super small, but I, uh, with adding every single person that we hired, I was like, okay, like this can be automated. Like adding, uh, people to G suite can be automated. Adding people to a zoom account can be automated. Um, adding people to air table, like all the kind of like internal tools that we use, the accounts all get set up automatically. I have a bunch of scripts that make short URLs because having worked at Google, I love the like go slash thing. And you just like go to this like page and like, it's very nice. And it's an easier way to remember things. And we have this RFD process request for discussion internally. And now we have over a hundred requests for discussion. So it's hard to like actually link out to these things because
Starting point is 00:39:30 they live out on different Git branches. So I made a short URLs for those. So you can just like go to like a 100.RFD.Aux.computer and get routed to the right branch. Um, stuff like that, where it's just very, it makes everything a lot easier. Um, And then I automate between like a lot of these tools. So like Airtable to GitHub or whatever, like just weird statistics and monitoring and stuff like that. So it's a lot of random stuff, but it was fun to build because now kind of,
Starting point is 00:39:58 I joke that like the role of CIO or like chief infrastructure officer is, it's a robot. Your title is chief product officer. What does that mean? So I think it means a different thing, different places, but for me, it means like talking to people who have problems with their current infrastructure or talking to people that don't have problems, but just have interesting experiences in infrastructure that we can learn from um and and then uh taking that back and kind
Starting point is 00:40:29 of like putting all the conversations together into like how we want to build our product to make it better than than everything else and and make people's lives easier okay you write some of these tools in rust is that right yeah i wrote i wrote a lot of them in go at first um because it was just what i was fluent in uh but as a company like we're we're really like writing everything in rust because it's great for embedded and it's great at a lot of things um and so i was like okay like I have to just like knock this off. I got to just like do it in Rust, which Rust isn't like it wasn't, it didn't start out as being great for like Rust APIs, which is what a lot of this is.
Starting point is 00:41:17 It's just interactions between Rust APIs, but it's actually like super getting there. But I ran into a lot of pain points there, but now it's a lot better. Like with async await is like way better. But yeah, no, it was a fun exercise to write it in Rust. We open source a lot of the libraries that are on our GitHub page as well.
Starting point is 00:41:36 But they were mostly like, they're all purpose-built. So like, it's not the entire API. It's just like the parts of the API that I needed. And yeah, it was a fun exercise. I now am, I would say, maybe still a little bit more fluent in Go because I find myself writing Rust like it's Go, but I'm getting there.
Starting point is 00:41:55 We have some really good Rust folks on the team and getting their feedback on code is great. Why would I choose Rust over Go? I mean, would you have if it wasn't that the rest of the team was Rust centric? Yeah. So when I changed like the code for the bots from Go to Rust, I mean, even the Rust folks on our team were like, why did you do that? Like, I mean, Go is actually better for what for what I was trying to do with concurrency of pushing things out to various APIs. I did it mostly as a learning experience. Where you actually want Rust is memory management, stuff like that.
Starting point is 00:42:37 Actually, for Docker, I think that had Rust been a thing at the time and had it been at the place where it is today, I think that it would have been a better language for the job because we got into a lot of problems when it came to embedding C and Go, which is hard. I mean, there's really load-bearing parts of the Docker code base, which is now in Run-C, where it's all C and very few people actually know what's happening. And so with Rust, you can avoid that because you can actually get all, you can get the level of granularity that you need. I see a lot of people seem interested in Rust for deeply embedded stuff and for even micros,
Starting point is 00:43:21 but it seems almost more suited to this kind of thing like we were talking about with Docker. Like, shouldn't have been doing stuff in C probably for a utility of that level, right? Yeah, no, like, it would have been great to have Rust then. And I think Rust is, like, perfect for firmware and what we're using it for. Firmware being your firmware
Starting point is 00:43:43 or firmware being microcontrollers uh both like we're using it for microcontroller kernels um like stuff like that uh yeah we're using it for everything honestly the the control plane for deploying vms is going to be in rust did you choose rust before the name of your company? Yeah. Brian wanted to use rust for sure before that. So the, so the name is very much so a hat tip to rest as well. Could have been any oxide though. No, it has a lot of like, it has a, you know,
Starting point is 00:44:20 ties back into computers and has ties back into a lot of things. So it's almost perfect well except rust the language was named after rust the fungus pathogen not what oxidation oh that's interesting i didn't know that if you're gonna bash a language as much as i do you got to learn as much as you can about it that's not true i don't bash rust that much i just i don't know when or why i would choose it over something i know very well you you chose it because steve chose it and he cared about it enough to build your company around it but i don't know well it depends on what you're trying to accomplish, right? If you're finding limitations in C, if you're finding a lot of that, the memory safety and things are becoming an issue.
Starting point is 00:45:12 I mean, I don't know by that logic, you know, why, why did anybody switch from fourth in the seventies? Yeah. I think like writing, writing safe C it C, it's a very rare talent to find anymore. So Rust allows you to kind of do that in a better way. Let's see. I have questions from Philip Johnston about what lessons can all firmware developers learn from your investigations of proprietary and open source BMC firmware? Yeah, I mean, there's a lot there. Open BMC, you know, you have to give them a lot of props because they were really the thing that like started the whole
Starting point is 00:46:00 kind of open source firmware ecosystem from what I can tell. Like they were really like the first open source firmware out there. And so with a lot of like open source projects, what it becomes is like largely an interface for dealing with a lot of like kind of sub modules. Like it has to deal with a lot of like proprietary things. So like OpenBMC became this basically communicator over the system D bus of various sub-modules that interact with various vendors things. And so for ours, we don't necessarily need all of that
Starting point is 00:46:39 because we know exactly what vendors we're working with. We wanted the BMC have um a lot less features uh we don't we just wanted it to actually just do what do like what a bmc or a traditional service processor which is what we're calling it does which is boot the machine and then interact with a few things on the board um so we kind of took out a lot of the complexity, but like OpenBOC is great when it comes to like being able to work with a large variety of vendor components. Philip also asked, what are the low hanging fruit for security and secure boot that most teams miss
Starting point is 00:47:24 out on? Yeah. So, I mean, a lot of the vendors have like their own kind of secure boot built into their products. So like Intel has their own, AMD has their own, ARM has their own. And like what we're actually doing is very much in line with Apple has T2, which is their hardware root of trust. And Google has Titan and Amazon has their own thing as well. We're doing our own root of trust. And it seems like a lot of kind of the big companies, they don't wind up using a lot of those like features from the vendors. I mean, they're mostly all proprietary. It's hard to get visibility into what it's actually doing.
Starting point is 00:48:09 So by doing our own and open sourcing it, then we have a really like firm level of attestation going on in the machine, which is really, really nice. So, I mean, I think like what I would love is like maybe, you know, there were ways that either the vendors could like open source their things so that people knew how they worked or maybe just like not waste time on stuff like that. But I could see why people want it. But it seems like if you're really serious about like writing secure software, you likely aren't using the proprietary vendors's feature. There's a lot of benefit from starting over. I think you're seeing a lot of that.
Starting point is 00:48:53 Do you worry if you succeed and make it and you're still building these in five or ten years, you're going to look back and say, why are we still supporting that? Yeah. I mean, that's like these machines stay around forever. I mean, if, if, if we are successful, um, uh, there is an analog and like, Brian's going to laugh because I actually am using this as an analog, but the AS400, those machines have been around forever. Like my mom was actually a recruiter for people to work on the AS400. And like just the fact that like they're still around and you still need people to work on the AS400
Starting point is 00:49:34 goes to show how long these machines survived out in the wild. So like we're gonna have ways to easily update our software on the machines, but like this hardware, if we're successful, will be around for a really long time and we're gonna to have ways to easily update our software on the machines. But like this hardware, if we're successful, will be around for a really long time. And we're going to have to support it for a really long time. And I think there are ways to make it easier and make it easier on us.
Starting point is 00:49:57 And also ways to make it easier on potential customers. Because like if they turn on, on say automatic updates and everything goes swimmingly then they will always have like the latest software and it won't be like an android where like uh you know a lot of the android ecosystem is still on the like oldest version of android whereas with an iphone um everyone basically runs the new software like we're trying to go towards that model i would say say. But yeah, I mean, we're going to have to support it for forever, honestly. I mean, even Windows in their kernel,
Starting point is 00:50:31 like they have like this custom code for like SimCity. So we're probably going to have stuff like that. Like, I mean, not SimCity as an example, but you know, the things that stay around forever, we're eventually going to become the thing that we hated in the first place, I'm sure. But think of the size of the city you could have in your server. Yeah.
Starting point is 00:50:52 Actually, are you planning ahead for that sort of thing? It's hard because most of the code ends up being corner cases, things that don't quite work together, timings that work on some series of chips and not others. But how are you going to avoid that? default is a way to make sure, one, that, like, people get the best experience that they can get because, like, we're fixing a lot of bugs at the same time. But also then, like, we can get rid of things that don't need to be there. Christopher, how do you feel about automated software updates right now? Terrible. Well, it wasn't automated. I allowed it to go. If we sound different, it's because logic updated.
Starting point is 00:51:55 It shouldn't sound any different. Yeah, I mean, that's the thing with automatic updates is like you have to have this level of trust where people trust that bad things won't happen. And that comes from like not messing it up consistently, which is hard. Like it is very hard to get that right. Yeah. And it's really hard with things that are mission critical, right? There's probably a lot of, probably a lot of reluctance within the customer base to that kind of thing still. It's like, wait, I want to decide when to do this
Starting point is 00:52:26 and whether it's safe. I want to see the other companies have run this patch for six months before I apply it, that kind of thing. Totally. How do you battle against that IT managers who are reluctant? I mean, it's hard, honestly, because like the hyperscalers can do it
Starting point is 00:52:49 because I mean, it's their internal team updating. And like, we do want to give that functionality to people where it's like, oh, like you can automatically update jobs, migrate to different servers. Those servers get updated and then you do like a slow rollout. And I think like, honestly,
Starting point is 00:53:05 the ways to battle that are just like transparency, being fully open about how this works and, and, and having hopefully like the first set of, of like potential customers that we get be very open-minded. Um, and, and as long as we can nail it consistently, then we don't become like the, Oh, this windows update just broke my whole thing thing and now i've been like scarred for life um you want to avoid that and like we tried to do that with docker on the upgrades and i would say it was it was not great in the beginning but then you eventually get to a place where it's better and better and better and better and i think it's just it's just time would you use dockers on Windows now? Me? Yeah.
Starting point is 00:53:47 I actually have. I've gotten burned enough times that I just say no to Dockers. But unless it's in Linux. So I just wondered as somebody who probably has a lot more experience with it than I do. It's all right on Windows. To be honest, it's not the same thing. Because on Windows, you're getting a VM and you're not getting a container, which like what's nice about containers is like you can actually like share things like you can share the network, you can share like various file paths, you can share like the PID, like you can share like your
Starting point is 00:54:20 process space with the container, which you can't do that on Windows. You can share files on Windows, which is fine. Network, I actually don't know if you can do that. But it's a different experience. You can share the process list. I mean, it's like they're actually VMs at the end of the day. They're slower. They got it pretty fast.
Starting point is 00:54:42 Like, actually, it's pretty comparable now. But it is just a different thing. It is cool. And it's cool that like windows it people are now like coming into this like container space and getting like super more modern, modern and like updated. And that's, that's like super awesome. They're having like easier ways to deploy, but it's still, it's also windows running in a container. So like, like Linux, you can automate a bunch of things very easily and like get things up and running easily and just get like a process started. But on windows, that's all, that's a whole different experience. Jess, it's been really good to talk to you. I need to go buy some turnips from Daisy Mae. Do you have any thoughts you'd like to leave us with? I would say we're hiring if anybody is interested in joining, but also we're
Starting point is 00:55:35 going to hopefully have another season of the podcast out. I hope that people actually learn something from this. And if I got got anything wrong please feel free to email our catch-all email address and i will actually read it uh and what are you hiring for uh we're hiring for hardware or software systems engineers anything in that mix cool remote or local to? Most of us are remote now. So yeah, we're open. Cool. Our guest has been Jesse Frizzell, co-founder and chief product officer at Oxide Computing.
Starting point is 00:56:16 Thanks, Jesse. Thank you. Thank you for having me. Thank you to Christopher for producing and co-hosting. Thank you to Philip Johnston for recommending Jesse. And thank you for listening. You can always contact us at show at embedded.fm
Starting point is 00:56:30 or hit the contact link on embedded.fm. We are still doing transcripts. Soon they will be open to all of you. For now, they're only open for Patreon supporters. But a quote to leave you with
Starting point is 00:56:42 from Vladimir Navikov from Lolita. And the rest is rust and stardust. Logical Elegance, an embedded software consulting company in California. If there are advertisements in the show, we did not put them there and do not receive money from them. At this time, our sponsors are Logical Elegance and listeners like you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.