Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 05x11: Building Resilient Infrastructure at the Edge with Craig Nunes of Nebulon

Episode Date: July 10, 2023

Edge infrastructure is susceptible to many of the same security risks as datacenter and cloud, but is often run in less protected environments. This episode of Utilizing Edge features Craig Nunes, Co-...Founder and COO of Nebulon, talking to Brian Chambers and Stephen Foskett about the provision of reliable infrastructure services at the edge. Nebulon's product presents storage to servers in a managed way, monitoring and protecting storage in real time. Edge servers must have a known-good system image to ensure that they are secure, yet this is difficult to achieve in remote devices. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Brian Chambers: https://www.twitter.com/BriChamb Guest: Craig Nunes, COO and Co-founder, Nebulon: https://www.linkedin.com/in/craig-nun... Follow Gestalt IT Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/Gestalt-IT Tags: #UtilizingEdge, #EdgeComputing, #ResilientInfrastructure, #Security, #EdgeSecurity,

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT. This season of Utilizing Tech focuses on edge computing, which demands a new approach to compute, storage, networking, and more. I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. Joining me today as my co-host is Mr. Brian Chambers. Thanks for joining me today, Brian. Hey, Stephen. Good to see you. I'm Brian Chambers. I'm the chief architect at Chick-fil-A, where we do a lot of things with the edge. And you can also find me at brianchambers.substack.com, where I write the Chamber of Tech Secrets. So as we've talked about in previous episodes, the edge is an interesting area because it's not like we're doing anything completely novel there. but everything that we've done in the data center and the cloud needs to be rethought when we've got
Starting point is 00:00:49 remote, maybe not secured, many, many, many locations. And one of the things that we need to rethink, I think, is basically the security aspect and specifically making sure that the servers that are located out there are actually booting up correctly, booting up the right software, running the right things and dealing with issues that are going to arise, which may include ransomware in the future. Yeah. Security is one of those things that I don't think we've actually spent a lot of time talking about so far on our episodes this season, but definitely an absolutely critical feature of any edge stack that anybody's going to run to get business value. And as you mentioned,
Starting point is 00:01:33 the ransomware thing is a hot topic lately. One thing that I think is interesting about the edge is with cloud and data center, we generally got to a point where we could take physical security for granted. You could kind of assume you're in a safe facility with good security controls. And if there was some sort of intrusion or something, you know, it would get handled pretty quickly. Edge, maybe not so much, might be a little bit of a different environment out there. And the physical side, you know, in addition to the, you know, the many different copies, many different points of potential ingress creates a whole very large, very interesting, you know, and possibly attractive attack surface for the bad guys. So it's an interesting topic to dig into today. Yeah, absolutely. I would think that, you know, I mean, we've heard a little bit,
Starting point is 00:02:20 I think, about ransomware hitting industrial IoT and so on. I think it's only a matter of time before it's going to come down hard at retail, manufacturing. It's just such, like you say, such an attack surface. And it's just so much possibility to compromise these systems. And of course, when we think about security, we have to think about confidentiality, integrity, and availability. Availability is one, I think, that is really going to be a challenge as well, because, you know, in many cases, we're dealing with systems that may not be homogenous. They may be running, well, whatever they're running out there. And, you know, you really need a better understanding of the server. And so that's why today on the podcast, we are joined by an old friend of mine, Mr. Craig Nunez, co-founder and CEO at Nebulon.
Starting point is 00:03:10 Thanks for joining us. My pleasure to be here, Stephen. Super fun. Yeah, look forward to it. So, Craig, tell us a little bit, I guess, what is Nebulon? And then we're going to get into sort of how this fits into the edge and what this means for security. Yeah. Yeah. Nebulon is a company that basically we turn your favorite industry standard server into efficient, cyber resilient application infrastructure. And we do that by mirroring an approach that we've observed by the hyperscalers, effectively taking, that approach to offload a lot of infrastructure services off of your server CPUs and run it on this controller in your server. And in doing that, you gain great efficiency. It gives you some wonderful capabilities around security. You get quite a bit of flexibility in terms of the
Starting point is 00:04:28 operating environment you're running. And it also lends itself well to cloud-based management in a lot of ways, both security, operational efficiency, and the like. So we've kind of taken this different approach until we came to the party, really only something you do in the cloud. We believe there's great value for that architecture for enterprise and MSP data centers. So you've got this product that basically presents storage to servers. And how does that connect with this whole idea of resilience and security? It's just a SAN, right? Yeah.
Starting point is 00:05:20 No, SAN, I don't know if I've said that word in many years. You know, for the edge especially, you know, presenting data services, but we're presenting cyber services. We're providing remote control of the server itself. We're really providing a whole suite of infrastructure services. And all those infrastructure services are running on this card in your server that you would use instead of your favorite storage controller. So this goes into the server, connects to your SSDs, and is controlled remotely by the cloud. What this lends itself to do in the context of security is, you know, we are deduplicating, compressing, encrypting all data, you know, as it goes through. We're in a perfect position to ascertain whether or not application data is somehow being encrypted in real time as it's happening. And we've got capability both with the card and in the cloud, you know, to kind of keep tabs volume by volume.
Starting point is 00:06:58 What what's normal and what is you know, what is a big change as it relates to encryption? And we're really on the lookout for cryptographic ransomware. And if we spot a pattern of that after a couple of minutes, we'll alert someone to that fact. with technology that also is associated with this secure zone in the server anchored by this card, we will recover you back to before ransomware hit in just four minutes. So it's a very cool story you can only really get to if you take a certain approach. And so we've taken it for that reason. That's really interesting. So maybe for those who are newer to the edge or maybe not quite as steeped in the security realms, maybe a weird question, but like, can you scare us a little bit? Like
Starting point is 00:07:59 what could happen if you don't have this sort of solution or if you've got stories about what you have seen happen, but maybe put a little fear into the people who are new to thinking about this or who maybe think it's an edge environment. Maybe we don't allow ingress into our environment. What could go wrong and how could it go wrong? Yeah. First of all, if you have a network, if ransomware cracks in somehow, it can go anywhere, right? Anywhere that is attached to a network. And, you know, if you believe the, you know, the statistics out there that are, you know, sometimes they're a little more headline-oriented, but if you believe your favorite security analyst,
Starting point is 00:08:51 three out of every four businesses over the next two years will experience a ransomware attack, and one out of 10 of those will be effective, will get inside. And if it gets inside, the average time to recover is on the order of a few weeks. Okay. We talk to folks every day about this topic. And, you know, I'm blown away by the number of people who've told me we never got that data back. We gave up trying after three months. And so, you know, the issue is, you know, really not if you will experience an attack, but when. And when it comes to all those stats,
Starting point is 00:09:36 you know, what do they say about statistics? There's lies, there's damn lies, and then there's statistics. They are all meant to be a guide to your actions. You can change what those mean to you, right? Technology is all about changing your fate. And so the point is, given we're in a world where ransomware threatens your business operation, in everyday, more creative ways, we need to think in bigger ways about how we protect the server and storage infrastructure that's serving our applications. And there's a lot done at the network level, for sure. Some really powerful stuff. But when you think about what's
Starting point is 00:10:27 inside my server right now that will protect me, there's not a lot out there today. But, you know, we think with, you know, some of the technologies that, you know, to be honest, are running in the public cloud right now, there's a lot more that can be done around detection and recovering. Yeah, I think that's the real key here is that it's hard to know any time exactly what's really running on a server. And it's especially hard to know when those servers are not, you know, local or not something that you can see, not something that you have seen. You know, what's to say, who's to say that the operating system image wasn't compromised? Who's to say that there isn't something running, you know, kind of, I don't know, below the
Starting point is 00:11:15 virtualization layer or, you know, another virtual machine that's stealing credit card numbers or something like that. I don't want to be all scaremongering, but you really kind of don't know what's running on these servers. You only know what you can touch remotely because they're remote. And yet, you know, they could be running almost anything. I mean, honestly, it could be something innocuous. Somebody could be browsing the web on the server that's sitting in the restaurant, or it could be something a lot more serious. Maybe they browsed to the wrong website and now they've got a keylogger running on that infrastructure and they don't know about it, or ransomware, as you said. Or maybe it's
Starting point is 00:11:56 just corrupted. Maybe it's just somebody kicked it real hard, poured something nasty on it, and it's just a little flaky. That wouldn't be all that weird, either in retail environments. I know a lot of people basically just Oh, it's not working, I'm gonna unplug it, plug it back in again. That's not a great way to end up with a known good server. So I think that that to me is really what you're talking about here is that you want to have these things running a known good image. How do you know it's good, except by controlling the hardware that's showing the image to the server that's letting the server, you know,
Starting point is 00:12:30 and so to the layman, I think what the product is, is kind of like a super duper cloud connected hard drive that you just know is right, right? 100% Yeah, in fact, so a lot of us are familiar with, you know, machine images that we will take and spin up in our cloud-based instances. that is your known good certified image, your operating system or hypervisor, cluster software, applications, the patch set that you've tested and lock that down and put that into your secure enclave so that whenever you reboot that server, instead of rebooting what was running, which may have a errant patch, you know, applied or might have dormant ransomware that somehow made its way in.
Starting point is 00:13:37 The next time you reboot, you'll reboot from that known good image. And that having that across your server fleet, should something bad happen, and maybe it's not ransomware, maybe it's just a bad patch that's taken out a ton of your infrastructure, it's as simple as reboot to recover that infrastructure to what has been tested and validated, you know, by your engineering team centrally. It's, you know, it's all about, you know, good hygiene and keeping that, you know, the thing that's running your server infrastructure at every edge location.
Starting point is 00:14:19 Yeah, that's got to be kind of an interesting, you know, kind of a compelling idea that, you know, it's one more way to know that things are right. And we've heard some companies talk about hardware root of trust and trusted platform modules and things like that. How is this any different from that? Why is that not good enough? It's, yeah, I think the thing that I've learned about security technology is, you know, it's an and, you know, that it's that and the other, it's, it's never this or that. And, you know, so for example, you guys were talking about physical security at the edge, you know, you know, having everything always encrypted. So, you know, when, you know, a drive fails and is taken away by a subcontractor for disposal, but somehow that winds up on eBay with all your data on it,
Starting point is 00:15:14 it is entirely encrypted. No one is ever going to get into it. The controller that anchors this secure enclave, it's really critical that controller has a, you know, hardware root of trust is going to prevent them from being able to access and provide the secrets necessary to crack in. And so it's, you know, it's, it's, and so it's, you know, encryption and hardware root of trust, and, you know, this immutable boot image and, you know, detection and recovery in case, you know, all of this stuff, you know, goes sideways and somebody does get inside. Brian, I want to ask you a question. I mean, we're talking about this. You run a massive edge environment. You know, is, I assume that ransomware is on your radar. I assume that integrity of the system image is on your
Starting point is 00:16:26 radar. You know, is it? And what's your thought on it? Yeah, I mean, it's definitely both when we think about, you know, cloud environments, you know, as well as the edge. It's something that, you know, I think we definitely have top of mind. I think hopefully everybody does these days. Yeah, it's an interesting set of challenges. I think we talked about security being an and. Another way to say that is like layers, right? Like you've got your software supply chain side of things. You've got the images that we just talked about,
Starting point is 00:17:06 all the factors that we just went through here, definitely all come into play, you know, together in a solution. Another one, like we think about layers too, like, I guess it's not a security layer, but like not allowing anything inbound to our stores would be one thing that we, we get a lot of benefit of. Now you've got physical attack vectors and things like that. You'd have cloud resources that are compromised that create vectors. So no matter what you do, you're going to run infrastructure. You're going to have risk of security breaches and ransomware incidents and things like that.
Starting point is 00:17:38 So absolutely a top of mind thing. Um, and I think we think about it by putting those layers, uh, in place and, um, and doing as many as we can reasonably do, you know, um, uh, be able to manage, um, got to lean on partners in a lot of cases to help us with some of those kinds of things. Um, but yeah, try and try and build a stack of layers that ultimately gives us a good, um, you know, hopefully a secure surface or at least at least, you know, makes it pretty challenging and time consuming to get any value from it should you find a way to compromise it. You know, what you're describing too, Craig, I mean, you're talking hardware here, right? Isn't this going to add to the cost of the system? Is this something, because as Brian's saying, it's all about making
Starting point is 00:18:26 informed compromises because you have to make sure that the environment makes sense. You have to make sure that the bill of materials makes sense and that you're provisioning things. Sounds like a big complicated piece of hardware or something. I mean, isn't it yet another thing that you need to buy everywhere? So, you know, the real magic is all about software, right? And the whole point of this, you know, when it spun up in the cloud was about efficiency. And the, you know, the notion that you can move processing of something from, you know, expensive x86 processors to low-cost ARM processors is, you know, always considered to be a good thing. And the knock-on benefits of, you know, some of the security attributes a big deal. The way to think about the approach that we've taken is, look, every server you've got
Starting point is 00:19:31 is gonna have some kind of storage controller in there. And so this is just a replacement for that. It's not a additional card. It's a replacement for it. And the software that you're effectively running you're going to use that in stead of you know your your hyper converged path so you're going to you know avoid the expense of hyper converged infrastructure you're going to get all of your server cores back
Starting point is 00:20:05 to do what you need to do and right-size your edge accordingly. And you're going to get these capabilities that are just built right into the application infrastructure. You're not going to have to write another check to another company for a layer of security. It's something that, you know, all server infrastructure should just, you know, should just come wired for. And that's kind of the idea is, you know,
Starting point is 00:20:33 they're in larger companies, there's, you know, there's two people at the party, security ops and IT. And what we're trying to do is sort of equip the IT team who, you know, sadly, in most cases are stuck with the recovery, right? And equip them with, you know, some of the tools that, you know, typically have been sourced on the security ops side, so that they're bringing to the party equipment that is, you know, better protected, so they don't wind up having to own a nasty recovery across, you know, a whole bunch of sites. Which is the other thing, you know, we talk about at some point is how do you enable recovery, you know, push buttons simply across a bunch of sites, but clearly that's, you know, IT owns that. And if you can enable them to avoid that, that's a good day's work. I'd love to come back to the recovery thing next, because I think that's a great topic. Real quick, if somebody is thinking about using a solution like yours or doing something like this in their edge fleet, when is the right time to be thinking
Starting point is 00:21:45 about that? So we already have an existing fleet that's across all of our stores. A lot of other organizations do as well. Is this something that you see as a potential retrofit? Is this something people need to think about when they get to their next refresh? When's the right time to be thinking about this in your design overall. Yeah, for sure. I mean, we talk to companies who roll through their infrastructure in different ways. For sure, when you're thinking about, you know, modernizing, optimizing your edge sites, that's clearly a time and place for this. But let's say you're between that. When ransomware hits an organization, it, ransomware changes sort of data protection
Starting point is 00:22:40 as we know it because data protection has really been about the stuff above the operating system, your user data. It's never really been about, you know, allowing you to roll back your operating system. Why would you back up your operating system, right? So the typical approach with backup software isn't as useful as we would like it. And your recovery cluster, which is standing by in case something bad happened to your data when ransomware happens, not only is data encrypted, but your operating systems are all encrypted, including the operating systems in your recovery cluster. And so, you know, for those with a large edge environment, they've probably got, you know,
Starting point is 00:23:39 staged recovery clusters in different locations, probably under the control of IT, that is a great place to, you know, to start because getting your recovery cluster back up fast when you've lost an edge to ransomware is critical. And if, you know, across the network, that cluster has been encrypted, you will spend a fair bit of time rebuilding those servers to get back DHCP, DNS, if you're, you know, Linux environment, Pixie Boot, whatever, and your backups days to then start recovering what, you know, is producing revenue for you, your production infrastructure. Start there, because you can do that at any point along the way. And then, of course, as you are thinking about, you know, more broadly, how you modernize your edge to handle these modern threats of
Starting point is 00:24:38 ransomware, absolutely is a good time to look at, you know, the alternatives out there. Yeah. And I guess what are the alternatives? What other ways would people have if they don't want to do a, you know, like a hardware software, you know, solution? What are they going to do to protect themselves from, you know, and to ensure that they have a known good infrastructure? So there's, you know, there's, I think, a lot of criteria that, you know, you'd want to think about that most is widely available, but, you know, you got to check it. So let's take the example of HCI, common use case at the edge. The, and, you know, just about all hyperconverged infrastructure provides for encryption at rest, right? You should do that. One of the problems is sometimes people will
Starting point is 00:25:36 turn it off because it might have a bit of an overhead on those systems. You don't know necessarily how it's been set up and provisioned. So you wanna think about, how can I maintain encryption that's always on? Do I have hardware root of trust? Do I have any way to handle that complete encryption of my infrastructure? And how can I separate my applications from
Starting point is 00:26:07 the infrastructure services that are actually going to provide recovery and investigate those. The challenge with hyperconverged, and hyperconverged, I'll tell you right now, hyperconverged is great for a lot of things, but it isn't the, you know, it isn't the, you know, the do all be all for be-all for every challenge that affects IT. And one of the challenges for hyper-converged as it relates to this world of ransomware we live in today is the application domain, your operating system, software application binaries, your tools, that lives in the same domain as your data services, your network services, your lights out management. And so when that stuff gets encrypted, so does everything else. And so if you can find a way to create two separate domains, application from infrastructure, that's going to ultimately
Starting point is 00:27:06 protect you because application is the target. That's what, you know, that's what the hackers are after. And they're going after, you know, whether it's application, the operating system, server firmware, or whatever. If they get in, you have a, you know, protected environment, your infrastructure domain, your secure enclave, where you can now reach out and roll that back, you know, pre-ransomware attack. And, you know, that kind of technology is possible today with, you know, the advent of modern DPU technology, which I won't get too into, but that gives you a way to inject, you know, a secure enclave like you find in many consumer devices today, Intel SGX, Nitro enclaves, you know, they're out there. You know, think about how you incorporate that in your
Starting point is 00:27:58 next infrastructure design. It'll pay off. And that's really what it's all about, right? Is you're trying to take the idea of how cloud servers, real hyperscalers, you mentioned Nitro and Amazon AWS, you're taking that technology and basically moving it out of the cloud and into the hands of the world in a way. Yeah. For the software that you know and love today, for the data that has to stay on-prem, you know, here is an approach proven by, you know, the folks who run the most efficient and secure data centers on the planet now available, you know, for your data center, for your edge sites. And that's, you know, that's the approach we're taking. So this has been a really, you know, it's been an interesting and thought provoking conversation,
Starting point is 00:28:48 because we really haven't seen any way to basically ensure that the operating systems, the software that's running on these edge platforms is known, good and secure until now. I guess, Craig, what's your message to somebody listening to this who is an architect or, you know, working at the edge who maybe hadn't considered this technology or this problem? What's your message? Yeah. So the, I mean, we know that the edge presents probably the most brutal IT frontier on the planet. And traditionally, the architects of modern edge deployments are thinking about cost. They're thinking about how to avoid the operational headaches that can occur. You know, the seed I would plant with every one of those folks is, you know, put at the apex security. Because, you know, if not this year, you know, sometime in your future, your your organization is going to feel an attack.
Starting point is 00:30:05 And the more you have built these layers of security from the server and storage on up, the safer your organization is going to be. And the more likely you're going to be able to navigate through that. And even if you have all those things in place, think about the recovery. the more likely you're going to be able to navigate through that. And even if you have all those things in place, think about the recovery. You know, think about the steps, practice the steps, rehearse it with the team. So when, you know, when that time comes, you can recover your site infrastructure in just a few minutes.
Starting point is 00:30:43 It's possible today. Make sure you've got that technology at your disposal going forward. That's really good. I'm going to cheat and I have two. I thought it was really interesting to think about the security side as it relates to knowing what's actually running on any given machine. It sounds so simple, like what's running on there, but a problem that I think many people don't have a great solution for, especially when you think about a very distributed footprint that's often remote sites, poor connections, et cetera. So I thought that was an interesting insight.
Starting point is 00:31:20 And then overall, I think what I find interesting is we've talked to many different organizations over the course of the season here is just this continuing convergence of capabilities that have maybe been born in the cloud, making their way into some sort of solution that can exist at the edge. And we've seen it in all kinds of places, containerization or WebAssembly or whatever. We've seen it in just a host of different places. So it's interesting to see something that maybe is more on the sophisticated side of security also starting to be a possibility in edge environments as well, which I just think speaks to the fact that this is probably a computing paradigm that's here to stay for a while. And it's getting more robust and more mature, you know, really by the month.
Starting point is 00:32:08 So it's been interesting to learn a bit more about that and to see how that's happening in the industry. How about you, Stephen? Yeah, I'll take it from the other direction, Brian. Think about portable devices. Think about mobile devices. You know, all of these things, as Craig mentioned, they all have a secure enclave. They all have known good operating system images. They all have the ability to basically alternate, recover the image in the event of a bad upgrade, protect the image from security and intrusion.
Starting point is 00:32:40 It's the same nowadays more and more with laptops, for example. Both Windows and Mac have trusted platforms and secure modules in them that try to authenticate the operating system. This technology is going from the other direction as well. It seems almost like the only place it doesn't exist is, well, I guess in the enterprise and at the edge. And so it's really not a surprise that the same kind of challenges that you'd face with mobile devices. I mean, think about it. You're a manufacturer of mobile devices. You know, you got to have that thing. It's got to be bootable to a known good image or else you're just not going to, you're going to have a bunch of duds, a bunch of doorstops out there all over the place. It's kind of the same thing with edge because, you know,
Starting point is 00:33:24 you can't get out there. The system just has to work and it makes sense to have this stuff. So yeah, from the cloud coming in and from the mobile device coming in, it makes sense to bring it to the edge as well. Well, thank you so much, Craig, for joining us today on Utilizing Tech.
Starting point is 00:33:42 As we wrap up, where can people connect with you and where can they continue with you and where can they continue this conversation and learn more? Sure. Yeah, for sure. Head on over to www.nebulon.com. We've got a lot of ways to interact. You don't have to have a conversation with us right away. You can test drive the UI and get a little tour of things. Plenty of resources there. And when you're ready, just let us know. We'd be happy to hook you up with one of our technologists and tell you more. Great.
Starting point is 00:34:15 Brian, what's new with you? Oh, man, all kinds of stuff. I had a wedding a couple weekends ago, so that's a big deal. That's new. So excited about that. And then a lot of the same. So people can still find me in all the same places. I still have the Chamber of Tech Secrets blogs going on a weekly basis at brianchambers.substack.com and out there on the socials, LinkedIn, Twitter, et cetera. So those are the places to find me.
Starting point is 00:34:44 Great. Yeah, I do recommend checking out the Chamber of Tech Secrets. Great blog, very thoughtful. I have to say every week there's a new thought. I love it. And as for me, you'll find me on Gestalt IT where I'm writing. You'll also find me every Wednesday on our tech news show, the Gestalt IT Rundown. You'll find that in your podcast app or on YouTube if you look for it. And of course, every week here on Utilizing Tech. So thanks for listening to Utilizing Edge, part of the Utilizing Tech podcast series.
Starting point is 00:35:16 If you enjoyed the discussion, please do subscribe. We would love to hear from you as well. Give us a rating, give us a review. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, head over to our dedicated website, go to utilizingtech.com
Starting point is 00:35:33 or find us on Twitter and Mastodon at Utilizing Tech. Thanks for listening and we will see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.