Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 05x04: Designing a Scalable Edge Infrastructure with Scale Computing

Starting point is 00:00:00 Welcome to Utilizing Tech, the podcast about emerging technology from Gisdalt IT. This season of Utilizing Tech focuses on edge computing, which demands a new approach to compute, storage, networking, and more. I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gisdalt IT. Joining me today as my co-host is Brian Chambers. Hey everyone, I'm Brian Chambers. I lead enterprise architecture at Chick-fil-A and I write about tech at brianchambers.substack.com with the Chamber of Tech Secrets. And you can also find me on Twitter at B-R-I-C-H-A-M-B. So Brian, one of the defining characteristics of Edge is scale and scale means many things. Tell us first, you know, what do you think of are the

Starting point is 00:00:48 sort of the constraints and the requirements for deploying at scale? Yeah, well, there's lots of constraints when we think about deploying things at the edge, right? We've got limited human technicians, most likely, there's not a lot of people who can help us troubleshoot problems or maybe even install sophisticated equipment and stand-up architecture. So that's one thing. We've obviously got unreliable connections potentially due to being in a remote scenario or just not being able to invest in something resilient. So all kinds of constraints, right? That's just a few. So when I think about scaling at the edge, it's not necessarily the way that we would talk about scaling in the cloud where we think about like horizontal auto scaling

Starting point is 00:01:29 or scaling up a machine size or something like that. I think what we're actually talking about is the ability to deploy and manage and operate lots and lots of copies and lots and lots of footprints of whatever your thing looks like. And being able to do that, you know, 100 times, 1000 times, you know, 10,000 times, or even more, brings a whole new set of challenges. And so I think it's gonna be fun to explore those today. Yeah, that's one of the things that as coming from a background in data center IT, this is this is unlike anything we ever had to deal with before. I mean, if we had multiple locations, it was two or three or maybe 10, right? If you're a huge company, not 10,000. And that's one of the craziest things.

Starting point is 00:02:14 And because of this, it really demands a completely different architecture, a completely different approach, as you've learned at Chick-fil-A and also as we heard from company after company at Edge Field Day. And so that's why we invited on to join us today for this episode, Dave Demlow from Scale Computing. It's right there in the name. Dave, welcome. Thanks for being part of this. Thanks very much for having me. So when I say scale, and I don't mean the name of the company, when I say scale in terms of scaling at the edge, what do you think? I mean, how would you answer the same question? Yeah. I mean, it's all the things that Brian said. I mean, in many cases, it is scaling

Starting point is 00:02:57 across multiple locations, different connectivity options. But we really want to encourage also designing for scale of the new applications that we all know are coming to the edge. So not just building and buying a box or a software stack to deploy one thing. Maybe it's this hot new computer vision app, but what are those other things? What are the data requirements going to be and how would you address those without a new rip and replace architecture, without having to roll a truck every time you need to scale your applications and so forth? So I think it means a lot of things, but yeah, a lot of sites is way different than the data center in just about every way imaginable. And those are the kinds of problems that we're working really hard to solve.

Starting point is 00:03:39 So Dave, one of the things we talked about a second ago with constraints was thinking about some of the limitations with edge sites, not having humans there. And there's some different kinds of maybe things that are buzzwords to people, phrases like zero touch provisioning. What do you think about that concept? And what does that mean to you when you think about zero touch at the edge? What does that solve for and how does that work? Sure. Yeah.

Starting point is 00:04:03 I mean, and we obviously zero touch provisioning is a big part of our overall solution. And we look at it as not just for that initial, like, how do you get the initial set of hardware infrastructure out? How do you get the initial applications out there? But as I was talking earlier, how do you scale that out? You know, when you need now we need GPU resources or we need additional storage or we we need those. Again, designing from day one how you can zero-touch deploy and provision new types of resources, kind of the same way you would in the cloud, but it's obviously not the cloud. Some hardware eventually has to show up. Some hardware is eventually going to break. that have failed, that eventually, hopefully, you've designed the system to be resilient, to handle that failure, that you're not emergency flying a helicopter out to the oil rig or

Starting point is 00:04:50 those kinds of scenarios. But so provision and also quickly provisioning and reacting to patching security. Those are essential across the edge of a zero-day type of exploit, making sure that you can zero-touch know, making sure that you can zero touch quickly, rapidly deploy those out across a large fleet. Again, completely different than the kinds of things you see in the data center, even with a lot of VM instances. That's a much easier problem to solve than disconnected sites, different, you know, in many cases, you know, different hardware configurations, running different applications, depending on,

Starting point is 00:05:24 you know, the different kinds of sites, things like that. So a lot of different things to consider there. One of the things at Edge Field Day during your presentation that I think stuck with a number of people was a phrase came up, and it was related to this zero-touch provisioning idea. And it was a term, disposable units of compute. Can you tell us a little bit more about how that came up in the discussion and what that means? Sure. Yeah. I mean, it's kind of a design goal from of our infrastructure, our scale computing software and hardware infrastructure from day one that things are going to fail. So plan for anything to fail and provide the appropriate resiliency in an automated fashion without requiring on a cloud brain, much less a human to interact with that. So everything within a scale platform is designed to be resilient, disposable.

Starting point is 00:06:13 Now, we do have some applications where it's just a single node and you don't necessarily have the requirements for a full clustered type of solution, which we obviously offer and it is kind of our flagship. But even there, you know, you can you can accommodate that, make it disposable of all right, we're not going to have an on prem infrastructure and a cloud failure at the same time, we hope. I mean, there's cases where you can make that. And so we'll fail back to the cloud there or we'll have a maybe a cloud centric application that we're getting some benefits of local processing for latency. We can tolerate a little longer in an emergency or vice versa.

Starting point is 00:06:44 You know, failing back and forth could be another way of basically making the, designing your applications, designing your infrastructure to be disposable, which is really key across these sites, this many sites when you're talking a large wide deployment. I think one of the other aspects of this whole disposable units of compute,

Starting point is 00:07:04 which it sounds kind of like a product from some company, but I can't quite put my finger on it. Yeah, yeah. One of the things that also kind of goes hand in hand with that is that when you're deploying so much infrastructure at so many locations, every choice you make that makes it just a little more expensive is magnified. And so you have to really use small, lower cost, maybe lower reliability systems because they have to be disposable. And it's almost like there's a tipping point where in the enterprise, you're basically deploying things

Starting point is 00:07:42 that are supposed to be really, really solid. I mean, essentially, we're deploying little mini mainframes that are supposed to have all sorts of redundancy features and all sorts of high availability features, and we'll pay huge amounts of money for reliable storage and redundant this and redundant that and so on. At the edge, you have to ask yourself the question, does it make sense to pay for this stuff? Or should we just use the cheapest, disposablest stuff, because we know we're going to burn it up and chuck it at the end. And so again, I think that that's another aspect

Starting point is 00:08:17 here that is very different at the edge than in the data center. Yeah. Yeah, I would agree. And you know, you have a lot of choices there. You know, so, so, you know, we, we have this discussion a lot of times with customers who, you know, saying that if they are acknowledged, Hey, my application needs some, needs some resiliency, some data persistent resiliency, some, you know, failover or something like that. And they might come to us saying, well, the easiest way I can think of to do that is I'm just going to buy two of everything, you know, one active, one passive, if the one goes away, you know, we'll just fail over the other one. And without realizing, well, OK, you know, you're probably going to buy pretty expensive boxes to do that. You're going to set up, you know, RAID arrays and things like that.

Starting point is 00:08:56 What if you could distribute that across three smaller disposable things or four or five or six smaller disposable things? And then you can bolt on. And how do you scale out from a two node failover cluster? You know, it's really hard to go to that third node or now you do pairs of, you know, all these things and proliferate. So we have a lot of those kinds of discussions of, you know, do you want to get locked into, this is the way our stores all have to be.

Starting point is 00:09:17 They all have to look like this, you know, this kind of hardware footprint, this kind of management stack, or do you want to design for something that can kind of plug and play, you know, different resources? So, you know, we make of management stack, or do you want to design for something that can kind of plug and play different resources? We make in our solution, let you mix and match nodes, things that are storage heavy, things that are compute heavy, things that are GPU resident, things like that, and so that you can know that over time, you can stay within our entire management

Starting point is 00:09:40 framework, even zero touch deploy, whatever kind of resources you need, aggregate those into each edge site, and you're not locked in. You don't have to make all those decisions up. But you also don't have to pay for stuff that you think you might need two years down the line, which is another thing we often see customers kind of having to do. We think we're going to scale to this. So we're just going to buy more CPU than we need. We're going to buy more RAM than we need and cross our fingers that, you know, A, we don't waste it, and B, we didn't estimate or guess wrong and just blow right through it. But each of those choices

Starting point is 00:10:09 is going to have a lot of implications. Yes, exactly. You know, oh, well, we're going to do, you know, 32 gigs instead of 16 gigs. Well, you just spent a lot more money. Yes, exactly. Another thing that I think gets multiplied by the number of footprints with the edge

Starting point is 00:10:24 is the amount of work it takes to do any given thing manually, right? Like no matter what it is that you might have thought manual was possible in a data center environment or even in the cloud where we still try and default to automation, the edge is just to a whole nother level, right? Like being able to respond to incidents that occur manually, super challenging, the idea of like deploying things manually. We've talked a little bit about supporting things manually. So automation just becomes a really critical concept and a critical factor. How do you see that, Dave, when you think about being able to effectively scale the edge, but being able to lay in appropriate automation to make that happen? Yeah, yeah. We've really from day one, exposed all of our functions via API

Starting point is 00:11:08 that we use ourselves in testing. So they're well battle tested. We've really put a lot of effort into both automation that happens under the hood that nobody sees, but is super valuable. Like when there's obviously a drive failure, a node failure, a network outage, just things like that. But even things that are probably less visible, you don't think about like, hey, we're upgrading, you know, a Linux kernel here.

Starting point is 00:11:33 We also probably need to upgrade the BIOS on this physical device or this, you know, firmware on this network card that, you know, some things like that that can be automated. And especially, again, when you're dealing across a large set of fleets, you really want to have consistency. And if you've got different drivers, different BIOS versions, things like that, all of a sudden, why is the site always a little weirder than the other sites? It's like, so automating from the deployment, from what your infrastructure footprint looks like to automating deployment of your applications, to patching, to operating systems, to, you know, how do you deploy your containers and all that? And then how do you, you know, again, react to these time critical events like the zero day security patch, we need to get this out or there's, you know, a critical fix that we need to get out to a thousand sites, you know, quickly.

Starting point is 00:12:21 So, yeah, I was getting a vibe, Dave, that was kind of very similar to what we hear from some of the cloud providers who talk about undifferentiated heavy lifting. And maybe that's the question really is when you think about the way that the edge works, is it possible to enable people who want to put things at the edge to be able to focus on just their app or, you know, just the core services that they want? Or does the nature of the edge, you know, does it require that you get a lot closer to exactly what infrastructure you're running on? Like, is there a model where you can be more focused and you don't have to see the whole picture? What are your thoughts on that? Well, I mean, if I'm understanding your question correctly, I mean, from the scale perspective, that's a lot of what we want to do is let, you know, let the DevOps team, let the application developers team not have to care about the details of, you know, the BIOS update that just came out for these, you know, NUC11 or these CPU generations or things like that.

Starting point is 00:13:19 And handle that form using automation. A, because it's important. it's got to be done quickly, B, because it's got to be done right, you know, and so, you know, employing humans to do it and do it, you know, rapidly is generally a recipe for disaster. So those are definitely the kinds of things that we try to just take care of and hide to a degree.

Starting point is 00:13:40 And, you know, in the cloud, you don't even know what box you're on or what the drivers are. I mean, you assume that those are all being handled for you. We want to provide that kind of experience where, you know, where it's desired, which I think is in most cases in the distributed edges. If people didn't have to worry about hardware, they'd rather not. You know, it's like, but it's a necessary evil. And we talked to a lot of software developers as well that are building these apps that go on top of, you know, they go out to all these locations.

Starting point is 00:14:05 They don't want to be able to deal with the hardware. And so, you know, we're good at that. We've built a lot of IP around how do you monitor specific hardware? How do you test? How do you qualify? How do you update, you know, an entire software stack on top of very specific fingerprinted hardware additions in order to achieve reliability. I mean, in our case, there is, you know, there is no customer that's running a configuration from hardware to firmware to BIOS to software stack to, you know, right up to their applications, which is what they care about, that we don't have in our lab exactly. It's not a guess. It's not an HCL or check all these pieces yourself and hope that you're close enough, we ensure that

Starting point is 00:14:46 in our solution. And that definitely goes a big way across these large number of sites of ensuring consistency. I can say from a company that had to think about the whole thing ourselves because we couldn't find at the time any options that gave us sort of the building blocks that we wanted. I don't think we knew that scale actually did any of these things when we got started at Chick-fil-A. But it is a tremendous amount of work. And a lot of organizations probably don't appreciate how much engineering goes into doing all that hardware, thinking about images and all of the things you just mentioned, BIOS and what operating system we're going to use and how are you going to keep it supported and patched over time? Like you're basically in the data center business. If you're going to do all this stuff with thousands and thousands and

Starting point is 00:15:32 thousands of data centers, and it's just a tremendous amount of work to do it well. And again, like when, when things happen and there's issues, you just got lots of copies. So, um, not to mention the brick ability factor of not wanting to destroy all the hardware that exists by doing something that makes it go offline and now they all have to be replaced. So there's just a ton of work that goes into that, I think. And being able to find ways to offload as much of the, quote, undifferentiated heavy lifting as possible seems like a really good strategy for people who are more focused on presenting an application to someone that does a thing for them that adds value as opposed to being really, really focused themselves on all of the infrastructure. And one of the things that occurs to me, though, is that a lot of the tools that you might use to do that sort of hardware

Starting point is 00:16:19 management, whether it's on the cloud side, a lot of the infrastructure as code work that's been done. And then in the enterprise side, there's obviously a lot of sort of out-of-band management, IPMI kind of stuff. And then on the desktop side, there's a lot of patch management, things like that. A lot of these tools are designed for an environment that is not at all like the environment we're talking about, and maybe not at all like the hardware that we're talking about. And in terms of practical experience, I guess for both of you, does it work? Is this stuff good? Is this stuff what you need? Or do you need kind of different tools, different technologies, because it just isn't applicable to the edge?

Starting point is 00:17:02 I guess I'll go on. From our perspective, I think a lot of the tools can be used. For example, Ansible. We put a lot of effort into taking our API, coming up with a fully declarative, idempotent Ansible collection that is actively maintained, is actively expanded, is certified with Red Hat, that is designed to, and the good thing about Ansible is so not only can it configure and control and declaratively manage the infrastructure, the scale computing infrastructure in our case, it also can extend to the applications themselves. There's plugins for everything.

Starting point is 00:17:37 I can, and even the network infrastructures. So I think that's an example of an existing tool that's very, very widely supported, a large ecosystem of vendors that can be used across an edge location. You do have to use it a little bit differently. And that's one of the things that we did. For example, you can use our fleet manager console as well as our on-premises system as your source of truth. So when you're writing a playbook, you don't have to have an inventory of every VM across all the sites. You can orchestrate using us as the source of truth for your fleet, for all your sites, and then for the system locally as well. And then we can actually go and do things like launch those playbooks. And those are things we'll be doing and adding more over time of basically managing those kinds of jobs, those tasks, those playbook runs for things like your applications and even doing things like custom application monitoring,

Starting point is 00:18:27 you know, using tools like Ansible or other, you know, monitoring to say, hey, not only is our infrastructure up, but my application is performing, it's performing acceptably and be able to kind of gather that telemetry and that observability data for applications that may not have it built in. If you're using Kubernetes or things like that, you've probably got good tooling to do that. If you're using some of the legacy stuff that we still see in manufacturing sites and retail sites, there's some need to have a standardized way to get some of that data. Yeah, I tend to agree. I think there's a good deal of convergence in a lot of the tooling, but at the same time, we have to probably acknowledge that the constraints that are different mean that certain paradigms are going to be different. Like we already talked

Starting point is 00:19:10 about scale meaning a different thing. It doesn't mean horizontal auto scaling. It means, you know, number of footprints and a number of copies. Another thing that, you know, is just different is a lot of environments. Ours, for example, we couldn't put, you know, anything to do pixie booting or we didn't have near the network control that we even have in Ours, for example, we couldn't put anything to do pixie booting, or we didn't have near the network control that we even have in the cloud through APIs and things like that. So I think there's a lot of convergence, but I think there's going to be a tension with getting to full convergence because of some of the constraint challenges that are just unique to those edge environments. Is that something that you guys have seen and

Starting point is 00:19:44 consistent with what you think as well, Dave? Yeah, yeah. And, you know, we do see more on the monitoring side, you know, customers that are using their existing monitoring tools. So, you know, we're feeding, you know, providing data both, you know, from our infrastructure, from the site level, and in some cases, even the applications into, you know, some even, you know, fairly old school kind of monitoring stuff. But they, you know, they have their NOCs and they're, you know, trying to extend, you know, some even, you know, fairly old school kind of monitoring stuff, but they, you know, they have their knocks and they're, you know, trying to extend, you know, the edge locations out into some of those environments in many cases and doing it fairly successfully because, I guess, because they're not, you know, having to deal with very specific harbor monitoring,

Starting point is 00:20:18 things like, you know, a drive failure, we're going to handle that. We're going to correct that, give them maybe one event that says, hey, you need to send a new drive out to the site next time you go. It's not a fire drill. You don't need to roll a truck today. We've taken care of the immediate crisis. Your applications, your data are fine. But somebody needs to know that and work that into almost their ticketing process, their service desk, things like that. We see some integration starting to happen there even.

Starting point is 00:20:42 And to what extent do you even have somebody remotely? I know that this is another thing we've talked quite a lot about. So you're talking about ticketing, you're talking about truck rolls and so on, but you don't really have anybody there to do that. No, I'm referring to all central. I mean, telemetry coming from the sites, coming from the ship, like, hey, we're in the middle of the ocean, the drive failed, the system took care of itself, the applications are fine. You need to request the hard drive go to this dock, this port that they're showing up to in two weeks, those kinds of things. Or, hey, we need to send whatever, an extra node.

Starting point is 00:21:16 We're having a capacity warning. We need to send an extra node, configure it for zero-touch provisioning so that when that ship gets in the port, they plug it in and it joins the cluster and resolves that resource need. Yeah, that's an interesting differentiator because, you know, in the case of like retail or restaurant or something, you know, somebody can get an overnight package and plug it in theoretically, probably, maybe, hopefully. You know, but you mentioned, you know, there's a lot of edge environments that are not like that, you know, where they absolutely cannot get replacement hardware and so on. And I imagine that there it might make sense to overbuy, overbuild, overprovision, because you just don't know if you're going to be able to have a replacement node. You might as well deploy four or five, six nodes now.

Starting point is 00:21:59 Yeah, there's definitely some math you can do to calculate your expected cost of sending a helicopter of your expected, you know, cost of sending a helicopter out versus, you know, yeah, I'm just going to have a, you know, a spare node. And with us, it can actually be part of the cluster being consumed. So you're still getting benefits from that from kind of load balancing. But, you know, you're kind of designing the survivability, make sure you have enough resources to, you know, handle the storage, handle the compute, you know, even potentially with multiple failures. Well, and I would think too that in that case, you might want to actually deploy maybe more reliable, more high-end, more highly available hardware in some of those locations or not. I mean, I think that that's part of the math too.

Starting point is 00:22:38 You might look at it and be like, okay, so we could deploy like an HA system with redundant power supplies and RAID and all this kind of stuff, or maybe we just have an extra cheap node out there, you know? Exactly. Yeah. Yeah. And we, we tend to see more of that ladder. It's like, but yeah. Yeah. That's a paradigm difference. I think for sure,

Starting point is 00:22:57 which is you're going to think about your problems a little bit differently because of not just their cost profile, but actually the complexity to manage and the more pieces you have and the more sophisticated you get about what you put at the edge, the more complex the management is going to be. And so you either need something that's really simple or you need, I think, really great, really great primitives you build on top of and really great automation to make sure that if any of these weird things happen and there's failures, you've kind of architected it in design for what's the graded state going to look like, you know, you know, if you have a RAID device or something, but it just blows up and is bricked, like, are you completely down? And was that a big

Starting point is 00:23:38 thing? And, you know, you're highly dependent on it. So I think it's interesting to weigh the trade-offs of a really lightweight, like simple fairly cheap, low complexity architecture that employs these ideas of zero touch and disposability and things like that versus the, we're going to make it a little bit more like it's a small data center. We're going to buy more enterprise grade components. We're going to put more resiliency into it and try and manage towards success that way. They're both the edge. They're just two different ways of thinking about it. And I think it's interesting.

Starting point is 00:24:12 And it probably really depends on your use case. If you're out in the ocean or if you're in a restaurant in the city, you may be doing different things or maybe sometimes the same. So it's very interesting. Yeah. And the other thing, and it's not always perfectly correlated, but a lot of these cases where, yeah, you might make those tradeoffs of, yeah, I need this extra level of resiliency. You may they also a lot of them have space constraints. So so form factor and power and cooling, you know, all of a sudden either is not possible or it's like, wow, if I can get three or four or five of these little tiny Intel NUC units and cluster them together resiliency or similar devices, you know, it's like, and then even, you know, environment, you can ruggedize those things really easily. You can, you know, so there's almost a different class of hardware that comes into play or at least consideration in a lot of these environments due to space constraint and power and cooling and whatnot.

Starting point is 00:24:57 But yeah, you hit on something important, I think, which is the environmentals that I'm not sure that we've talked about before, but they're just different at the edge. And it depends what kind of edge we're talking about. I mean, there's data center-like edges, you know, in the CDNs and the regional data centers and things like that. But when you get to this more close to the user, close to the business, really remote edge,

Starting point is 00:25:17 the more and more that looks different, the more you're likely to have somebody unplug something in the office on accident. Not that that's ever happened to our environment before, or spill coffee on it, or it gets wet from a storm or whatever the case may be. But those heating and cooling could be their one, I suppose. You know, moisture in the air or other things like that all is not clean and regulated and perfect at the edge. So it definitely makes you have a new set of concerns. And that informs the way that you think about some of it. Would you agree with that in anything that you'd add?

Starting point is 00:25:53 Yeah, no, I would definitely agree. I mean, we see some very unique environments and, you know, some of our edge customers who were early on with kind of obvious stuff. I mean, they're bringing us these new use cases like, hey, can we have a cluster on a crane that moves around? No, there's no way we can connect Ethernet to that. Of course, it's wireless. You know, it's like, you know, so, you know, factory floors where there are unique systems for kind of different manufacturing pods because it's better to just keep the data local there versus even sending it across a shop network, you know, during the day, during traffic.

Starting point is 00:26:22 So, yeah, you run into a lot of unique environmental stuff. The retail stores are, you know, very interesting, you know, where they, you know, tend to hoist up the little micro data centers, you know, somewhere in the stock room or, you know, things like that. And, you know, you've got physical security. We haven't really touched on here as well, but, you know, being able to secure those devices, it's a lot easier to secure small form factor devices and data center class devices in a lot of cases. But it's important that easier to secure small form factor devices and data center class devices

Starting point is 00:26:45 in a lot of cases. But it's important that you consider the physical security as well. Yeah, I was talking to one company and they were talking as well about a very specific thing that just hadn't occurred to me. DC power. In retail or restaurant or something, AC, of course, you can plug into the wall, you got a power strip, whatever. But they were talking about industrial IoT and specifically about ships. And they were talking about, yeah, it's all DC power. You know, they absolutely cannot use anything with an AC power adapter. Apparently, there's this Anderson power pole, which is like the special connector that they're using. You know, they'll have, you know, basically a power, a power system that they'll use with battery backup, that'll power like a

Starting point is 00:27:30 whole cluster together and things like that. But it's all it's all DC. And just hadn't thought about things like that. And stuff like that, just, again, you know, you kind of look at it, and you're like, well, nothing in the data center. I mean, I guess there's a DC movement in the data center. But that's a topic for a different day. But nothing in the data center that comes off the rack is really ready for an environment where you need to plug it into a 19-volt DC dedicated line or something like that. I mean, that's just not a thing. And again, the cool thing about these little small form factor devices is that most of them actually do have external power bricks. And you can just chuck that and rig up a DC wiring and so on.

Starting point is 00:28:13 Or in some cases, power over Ethernet as well. Yeah, exactly. That's an interesting aspect. I mean, I guess, you know, because we talked as well about Wi-Fi. I guess the flip side of that is sort of like all ethernet power, power over ethernet kind of environments. Um, you know, something in the ceiling, something, you know, that where, where you don't have, um, power in the room. Um, yeah, just, just so many things to think about, but when you're, when you're designing this stuff, of course, I guess there's also a trade-off between standardization and specialization

Starting point is 00:28:43 as well, because, you know, maybe, maybe you're going to do some extraordinary measure to make this system work in this particular environment because all the other systems work that way and you want to make sure you're deploying everything everywhere. I guess, you know, how do you deal with situations like that where you don't have a uniform place to deploy it? So, yeah, we definitely see environments that have different needs,

Starting point is 00:29:07 whether it be legacy hardware that an application is running on legacy hardware that maybe they're ready to replace. And oftentimes that's where we're just lifting and shifting. We're going in and taking an old Windows virtual machine or old Linux box, or some cases appliance and moving that into a virtual machine on resilient, well-managed infrastructure.

Starting point is 00:29:26 The other thing we do see is, I think I mentioned earlier, our clusters do let you mix and match node types. We have all different kinds of capacities, performance, CPU-heavy, storage-heavy. So while we talk about standardizing, you can have sites of, we see this commonly in retail. It's like, well, this store also has a pharmacy. So we have this extra application. We have this extra resource we need. Oh, this site is one of our, you know, customer experience stores. So we've got all sorts of GPUs and digital signage, and we have nodes that are tailored for those kinds of workplaces, those kinds of applications, but they still can benefit from all the same management, the same zero touch provisioning, the same fleet management with our console.

Starting point is 00:30:05 And so that's kind of what we try to do to, and the same automation. So you're going to have automation blueprints of, you know, templatize, you know, your pharmacy stores, these and the different sets of applications and resources that they need to kind of plan for and handle and design for those kinds of differences as well. Well, this has been really great. I think we've hit on a number of really interesting challenges and interesting solutions to thinking about how we scale the edge.

Starting point is 00:30:30 Maybe to wrap up with one more question, and I'll ask it to both of you guys, you can both answer. When you think about people, companies that are looking to deploy things to the edge for whatever reason, what do you think is the biggest challenge that they're gonna face? And what advice would you give them?

Starting point is 00:30:49 I guess I'll jump in first. I mean, I think the biggest challenge is, you know, planning for not just the one proof of concept, you know, what you can do in a lab, but planning from day one on how you're gonna get this out to however many stores, however many different, you know, kind of infrastructure platforms that you need as part of your POC. And so we, you know, we want customers who are looking at our stuff,

Starting point is 00:31:09 for example, to experience our fleet management, to go through zero touch provision, to use automation in their POC. Don't just say, oh, we're going to go do it manually once and then go back and figure out the automation later. I mean, we think that's really a important thing to think about, to design for, to understand. And so, you know, we want and we're proud of our tools that we provide to do that. So we want to help people try that experience and simulate, you know, a large scale deployment, even if that's, you know, your POC. But that's probably my recommendation is just, you know, try to make it as real world as possible. And, you know, doing one local POC is great, but you definitely need to go the next step and, you know, think about the fleet level deployment.

Starting point is 00:31:47 Yeah. And that really goes, I think, to the topic of this whole discussion is that scale really changes everything. And, you know, it's one of those things where just because it worked here doesn't mean that it's going to work everywhere and doesn't mean that it's going to work forever. And so you really have to make sure that you're going to build something that will actually work hands off, that it's got to be automated. You know, we talked about that as well in previous discussions. It just seems like that's the big thing, the big mistake people are going to make is they're going to build something that works on my desk and say, okay, this is what we're going to go for and then not be able to support it long term out there.

Starting point is 00:32:25 Brian, I'm going to throw it at you to tell us what your experience is. Yeah, I think this is really good. I think thinking about how you're going to scale your solution from day one is probably critical to success. And it doesn't I mean, it's all kinds of things. It's how are you going to get it to the sites? How are you going to manage it? Who's going to do the things if there's any human tasks? You probably shouldn't do that.

Starting point is 00:32:45 You should probably lean on automation. But really thinking about how's this all going to work when it's in all of its end places, not just thinking about how do we get it to work in one copy. Some of the biggest challenges that you're going to find are in managing the constraints of your environment and then multiplying that by X, whatever your number of copies is. So I think designing and starting with a really good foundation is obviously going to set companies that are thinking about doing this up for success.

Starting point is 00:33:13 So I think that would be my recommendation is think about scaling now, even before you're sure that you're ever going to have to. I think we said the word scale more times than ever. Charge for that? Do we pay for it? Yeah. I don't know. This isn't sponsored, man. I should. Listeners, send us in a dollar for every time we said scale. That'd be great. Thank you so much. It's been a great conversation. And again, scale is the differentiating factor here when it comes to the edge, at least for this conversation. Dave, where can we connect with you? Where can we continue this conversation with you? And do you have anything going on that you want to pitch? David Demlow on Twitter. Feel free to reach out and you can also find me on LinkedIn. But yeah, I mean, we're definitely moving forward with the Edge development, our recently

Starting point is 00:34:08 renounced fleet manager product. We definitely want everybody to take a look at that, become a development partner. I mean, we work with a lot of different companies, a lot of different use cases. So we actively understand that there's difference in these environments. Come work with us. And if there's something that's a little bit different or a little hardware platform that you need, we can, you know, design an edge solution that is designed to scale with your environment and your applications and your needs in mind.

Starting point is 00:34:34 Great. And Brian, how about you? What's new? Yeah, I continue to write about the edge, among other things, at my Substack. It's the Chamber of Tech Secrets. Fun name, brianchambers.substack.com. And you can kind of just follow me and see what's going on on Twitter as well, B-R-I-C-H-A-M-B. Usually getting into some interesting tech discussions. We've been talking a lot about the role of the architect lately, which has been a fun one as well. So

Starting point is 00:35:00 go check that stuff out. And yeah, I appreciate it. Excellent. Yeah. And I got to say the Substack is a lot of fun because it's not just sort of here's, you know, I'm writing about this, I'm writing about that. You kind of have a message in each of these posts, which I really enjoy. And you've got kind of a thing going on there. So it's working. It's working for me. Thank you so much. Yeah. And thank you, Brian. Thank you, Dave. As for me, of course, you can find me here every Monday on Utilizing Tech. You'll also find me every Tuesday or most Tuesdays in the On-Premise IT podcast, which is available in a podcatcher near you, as well as every Wednesday in the Gestalt IT Rundown, which is our news show.

Starting point is 00:35:41 And of course, you can find me on the socials at S Foskett on Twitter and Mastodon and more. Thank you so much for listening to Utilizing Edge, part of the Utilizing Tech podcast series. If you enjoyed this discussion, please do subscribe. We would love to hear from you as well. Give us a rating, give us a review wherever you can. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, though, head to our special dedicated site, utilizingtech.com, or find us on the social media sites,

Starting point is 00:36:17 Twitter, Mastodon, at Utilizing Tech. Thanks for listening, and we'll see you next week.

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 05x04: Designing a Scalable Edge Infrastructure with Scale Computing

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.