Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 08: What Does AI Mean For the Future of Networking with @ChrisGrundemann

Episode Date: October 13, 2020

Stephen Foskett and Andy Thurai are joined by Chris Grundemann to discuss how AI is used in enterprise networking, and how it is changing the industry. They begin with a discussion of AI in enterprise... networking, connecting it with software-defined networking and other trends. What network management tasks can be improved through the use of AI? Grundemann looks to using the technology in root cause analysis and fault correlation as well as prediction of network events. The discussion then turns to the ways that AI workloads will change the workload or demand on networking. AI systems demand data, throughput, and low latency, and the networking must adapt to support these workloads. This episode features: Stephen Foskett, publisher of Gestalt IT and organizer of Tech Field Day. Find Stephen's writing at GestaltIT.com and on Twitter at @SFoskett     Andy Thurai, technology influencer and thought leader. Find Andy's content at theFieldCTO.com and on Twitter at @AndyThurai Chris Grundemann a Gigaom Analyst and VP of Client Success at Myriad360. Connect with Chris on ChrisGrundemann.com on Twitter at @ChrisGrundemann Date: 10/13/2020 Tags: @SFoskett, @AndyThurai, ChrisGrundemann

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Utilizing AI, the podcast about enterprise applications for machine learning, deep learning, and other artificial intelligence topics. Each episode brings experts in enterprise infrastructure together to discuss applications of AI in today's data center. Today, we're discussing the implications for networking of AI-based applications and also the implications on AI for networking. First, let's meet our guest today. Hi there. My name is Chris Grundemann. I work at Myriad 360, where we focus on cybersecurity and data center solutions.
Starting point is 00:00:41 I also write as an analyst for GigaOM. And you can find me on Twitter at Chris Grendeman. I'm Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. You can find me on Twitter at S. Foskett. And I am Andy Thurai, the founder and principal of thefieldcto.com, home of OnBias Emerging Tech Advisory Services. You can find me on Twitter at Andy Therai. Thanks, Andy and Chris. It's great to have you here. So it's interesting.
Starting point is 00:01:11 I mean, it seems that networking and AI really are a match made in heaven because on the one hand, one of the earliest and most useful applications of AI in the enterprise is in network monitoring and network management. And then on the flip side, of course, networking technology is one of the things that's enabling AI. So let's start with that first part, Chris. As somebody who's seen a lot of what's going on in
Starting point is 00:01:36 networking, you know, you've been to a lot of networking field day events, for example, and I know that you write about this for GigOM and so on. How have you seen this happening in enterprise networking? How have you seen AI coming home? Absolutely. Yeah, I think that there's really two pieces where automation and also telemetry are kind of the two big pieces. I like to talk about visibility and control.
Starting point is 00:02:01 And these were, you know, what's interesting to me is they were initially kind of the promise of SDN and what we were talking about in software-defined networking. And I think AI and different ML techniques really take this to the next level where you can look at both, you know, understanding what's going on in the network and also making things happen in the network in an autonomic way. And so AI seems to be kind of almost the secret sauce that will kind of close that loop of you know translating a human's intent activating that in the network and then also assuring that that's actually what's going on so I think it's really really interesting as we see you know artificial intelligence
Starting point is 00:02:37 and then all the various techniques within that applied to networking in order to kind of realize this promise that we've been talking about for almost a decade now? So I got a question. The networking, it's been around for what, 30 years, 40 years, I know that long, right? And it started extending to the edges and then the cloud came and then most of the invention, the innovation, the networking space has been around to fit the cloud model. But now that we're moving to the AI and ML about training in the core and moving the model to the edges and the whole nine yards, how do you see the networking change
Starting point is 00:03:19 to adapt these newer technologies? Yeah, it's interesting. The way I look at it really is, if you go all the way back to the beginnings of TCP IP, I think that the first 30 years were about figuring out if this could work. And then the next 30 years were about making it work. So kind of the previous, the last most recent 30 years
Starting point is 00:03:38 has really been about making it work. And at this point, we're at the point where Wi-Fi works on a plane going 600 miles an hour at 30,000 feet. So we've pretty much figured out how to get it work. That doesn't mean it always works all the time, but we've pretty much figured that part out. And so now the next phase, I think, of networking is really about making sure it works properly. It's all about user experience, application performance, really tuning the dials and making sure that the network works in the way you need it to. Not just that it's green light on on but that your performance is exactly what
Starting point is 00:04:07 you want and I think you know that is further complicated but also enhanced by kind of cloud technologies so the way it's complicated is obviously your corporate network now extends much beyond the confines of the four walls where your office may have been and then definitely you know across the public internet into public cloud across different SAS companies offerings and so you've got this really complicated network now you have to deal with it's not in your control anymore and on the flip side you've also got these new you know technologies and micro services and desegregation and a lot of things that
Starting point is 00:04:38 have kind of come out of that the hyperscalers and the move towards cloud that I think enhance our ability to run networks and so it's it's really interesting kind of mutual, you know, coming at it from two sides where you've enhanced the complexity, but you've also provided new tools. And I think AI fits really well into that because as you expand these networks out into places where you don't control them and don't have, you know, direct visibility, being able to understand and inference what's going on there is really important. And I think that's where artificial intelligence can definitely play a role of taking in all this massive amount of data that a network can throw off and really predicting and understanding what that means, especially in environments where you don't have direct visibility or control.
Starting point is 00:05:15 So as someone who's been involved in network monitoring and network management for a long time, I mean, what are the key tasks that worry you? What are the things that keep you up at night that you feel that AI can do? I mean, specifically, so things like, you know, like monitoring, you know, uptime, monitoring performance, you know, intrusion, network performance monitoring, that kind of thing. Yeah, that's a great question. I think that, you know, the basics of that we've got down pretty well, right? I mean, just basic fault tolerance, red, green, you know, we've kind of figured got down pretty well right I mean just basic fault tolerance red green you know we've kind of figured that out pretty well and I think where AI really comes into play is that again that next level which is one of the things that's perplexed network operators for a very
Starting point is 00:05:55 long time I've had many wait you know sleepless nights trying to troubleshoot an issue trying to figure out the root cause right I see this cascading failure there's all these things down there's red lights and blinking alarms all over the place but why and that's a place where I think AI can really move us to the next level is is you know speeding up root cause identification and correlation of events that's a big area I think that's really interesting the other area that's interesting specifically is in more of the prediction right so obviously maybe it's
Starting point is 00:06:23 not easy but it's definitely a concrete thing to understand after a failure has occurred or after you've run out of capacity and you're starting to drop packets on the floor but but looking at the network in a way where you can predict oh this user is moving you know through these different access points and they're about to leave the network you know what's gonna happen when they transition to the cellular network and can we predict that and and can we you know, head that off by, you know, at the application layer somehow and then create some easier transition in that way.
Starting point is 00:06:49 Or, you know, on a big WAN, I see there's trends happening where, you know, the capacity is getting to a point where we may need to add additional circuits or add additional bandwidth and seeing that coming is something that takes, you know, a lot of human time or, you know, a sleepless AI bot can definitely do that much more efficiently. So I think that predictive analysis of both user experience or application performance, but also capacity planning in addition to the root cause analysis are really three key
Starting point is 00:07:16 areas that we'll see AI cause major advancements, I think. when it comes to the application level manageability and monitoring and uptime in other words APN applicants application performance monitoring right they do a full stack amount of monitoring observability now when we throw AI in the mix you know trend of the core and moving to the edges. Do any of those observability and monitoring concept, particularly for the network, does it change or same as before? No, I think they're definitely enhanced, right? I think that, I don't know that you fundamentally change the way that we use those tools, I guess, but I do think the way that those tools
Starting point is 00:08:04 work and the way we work with them may fundamentally change. And this plays tools, I guess, but I do think the way that those tools work and the way we work with them may fundamentally change. And this plays into one of the areas, I think that has the most promise for artificial intelligence, which is really augmented intelligence, right? For me, especially as a network operator and network engineer, I don't necessarily want,
Starting point is 00:08:19 and maybe this is something where I've got to ease into it, but I think this is a long-term thing where I don't necessarily want a black box AI system to be covertly making decisions that I don't understand why or how or when. But providing me that context and saying, hey, this happened and I think this is why and letting me validate that and having the machine say, also, I think this is the right remediation path and I'm being able to say, yes, go forward with that. And then maybe over time, as I get more comfortable that it's making those decisions, certain things I can just say, yes, you know, always do that without checking, but other things I'm probably gonna want to be involved in and in
Starting point is 00:08:54 the loop. And so I think the human in the loop is a big piece of this. So because of that, I don't see that it's going to change drastically, but I do think that it will enhance that interaction. And I'll be getting richer data and richer information from those observability tools that will allow me to make better decisions quicker. And eventually maybe we'll get to a fully autonomous network, but I really do think the human loop is going to be a necessary step before we get there. Yeah, in a way it seems like there's sort of a two-way street happening here that, and I guess this is what often happens with technology, that AI allows you to process more data, but then it also demands more data.
Starting point is 00:09:35 It demands more metrics and more units. And so it's sort of building on itself. And like you said, it seems to me that the fundamental goal of all of this should be to help humans do their job better, not to maybe reach that ultimate nirvana of a fully autonomous network, right? I mean, or am I off base with that? Well, I think we'll get there over time. And I don't know that we ever go fully autonomous, right? Maybe we will. I think it's maybe interesting to look at the corollary of cars and automobiles. And again, this is something that, you know, again, I see AI as this kind of ongoing trend on the spectrum of moving towards more automation, more autonomous networks. And I see similarities in vehicles. And what I mean is, you know, the first cars,
Starting point is 00:10:21 if you owned a car, you really had to be be a mechanic there was no way to really get around in a model a or a model T even without you know having some wrenches and or at least hiring a driver who had some wrenches and and could work on the car and keep it running and as we moved over time you know now we're to the point where many cars are you know drive by wire the steering wheel and the gas pedals are not actually connected to anything other than computers and you know for a backyard mechanic to be able to crack in there and change anything is actually almost impossible. So we moved from having to be a mechanic to almost impossibly to be a mechanic without a lot of
Starting point is 00:10:51 training and a lot of skills. And then we're on the path to, you know, more and more driver assistance and maybe eventually autonomous cars. And I think networking is on a similar path where, you know, at first people who stood up networks really had to be really strong network engineers and architects and had to understand the ins and outs of them and you know we continue to add layers of abstraction to that where it becomes in one way easier to operate a network but on the other hand harder to dig into the nuts and bolts and understand what's going on and I think we'll continue to see that trend and I think AI is part of that trend which is exactly like you
Starting point is 00:11:22 said it's giving us tools that abstract some of the complexity away from us, but that also makes it harder sometimes if something goes really wrong. Right, one of the problems of just, you know, general kind of script making automation is you can fail at scale. And I think AI is similar. If the AI goes rogue in some way or acts in a way that you don't predict, or if the network and the physical things that are going on act in a way that the AI doesn't predict, you can have some really interesting consequences that may be hard to troubleshoot. So yes, I think it is both a bounty and an eventual challenge.
Starting point is 00:11:53 So that kind of brings up an interesting question. So let me, it's kind of a catchphrase question. So obviously there is AI for the network, AI help network to improve things. And then there is we build network for AI. So which one has more progress and which one is struggling? Obviously AI in the last five or 10 years with the GPUs and the way of building things and the computers providing unlimited power, HPCs, has tremendously improved AI and ML. And then network has its own way of innovating things, SD-WAN, private, autonomous networks, and the whole line. So it also improved
Starting point is 00:12:33 tremendously. So let me ask you that question. Which one has far progressed? Which one is lagging? Is it AI for network or network for AI, which is far more advanced, which can handle that things are thrown by the other one. Yeah, that's interesting. You know, I don't know that I have a clear answer for that. I do think that it is an interesting distinction that both of those things are happening. As you just mentioned, and as Stephen kind of said in the beginning there, where I think, you know, one aspect is AI helping us to run networks.
Starting point is 00:13:02 And there's definitely some advancements there. I think there's also some AI washing that's happened in that space. And there's a lot of people talking about how AI, you know, networks? And there's definitely some advancements there. I think there's also some AI washing that's happened in that space, and there's a lot of people talking about how AI will help us there in the future, and I'm almost presenting that as it happening now. There are some companies that are doing some interesting things with machine learning
Starting point is 00:13:15 and just some really interesting algorithms and expert systems as well today to kind of use AI type or even true AI systems to help run the network. So I do see that as being fairly advanced, but I also see some huge promise in the idea of using network data, data from the network to train AI systems that are more broadly looking
Starting point is 00:13:34 at the IT stack as a whole or the enterprise architecture as a whole, right? I think that the network is definitely a place where we can, I mean, obviously every application sends every packet over the network, right? And so looking at that and understanding that and using that as a data source for AI that does other things, I think is a really interesting area. I think it's less explored, personally, but that may just be my bias of coming at this from a networking
Starting point is 00:13:56 perspective. So one of the interesting things that kind of comes out of this too is that I think throughout this we've been kind of talking about how AI helps networks, but of course there's the flip side of that and that's how networks can help AI. And, you know, I think that's another interesting aspect here too, that basically, as you said, we've kind of figured out how to move packets quickly and, you know, reliably, low latency, you know, high throughput. All of that is feeding this AI monster as well. So how is, you know, how are networks supporting, you know, AI workloads? And I guess that's a question for kind of both of you. So I guess, Chris, maybe you want to start out, but, you know, Andy,
Starting point is 00:14:35 I'd be interested in your perspective on this as well. Yeah, I do. You know, definitely from my perspective, I've seen some companies that were doing some really interesting things of using, you know, massive packet capture systems to really understand what was going on the network and do some of this kind of AI ops type things for networking. And I've also seen at least one of those companies now start to pivot away from looking specifically at the network and start looking at, in their case, security challenges. But they're again using this data, this massive amount of data to be able to pull off the network and seeing user behavior and seeing application behavior and seeing deviations from norm and and then processing that
Starting point is 00:15:08 and in the context of security threats and i think we'll continue to see that trend where more and more companies that are you know using data from the network directly um to get an unbiased view of what the applications are doing what the people are doing or even you know other devices right in the iot world uh and use that data as an engine to understand what's going on in the broader context of the application or the enterprise, etc. So, so, yes, I think that's definitely happening already and we'll continue to see that move forward. A couple of thoughts on that one is When network attack happens. I mean, especially with the cloud, all the services are available all the time, right? 24 by 7 by 365 for the bad guys and the bad nations to attack. So the threat, you know, the attack timeframe and also the threat vector on the surface has increased.
Starting point is 00:16:02 As good as advancements on the network and the cloud and all nine yards, still there are a lot of unknowns. As I say, the unknown unknowns are still there, right? So how do you figure out which areas to fix? So this is one of my pet peeves. Some companies have done a really good job of threat identification and vulnerability identification, particularly when they're
Starting point is 00:16:26 in the security field. And they try to better themselves, but I asked this, who shall remain unnamed, a vendor specifically this question, when you find out this information, would you be willing to share it across the platform with other vendors? Their specific answer is no. I don't know if it is because they think it's their proprietary information that's their secret sauce, but I think honestly, they should be sharing it, making that better for the common good. So that's one view. And again, it's helping out a lot in the security threat areas particularly. And there's a lot of work that's being done, but it's kind of in isolated islands. And the second one, which is an interesting thought
Starting point is 00:17:08 in a conversation with the company, there is this thought leader that I was speaking with, and the conversation is that when the attacks happen, the networks are so sort of advanced and autonomous and can reconfigure themselves. When an attack happens in a certain area, the networks can figure out where the attack is coming in from and isolate the entire traffic
Starting point is 00:17:34 from that region so the other areas won't get affected. So dump that area which is offending, at least for a while until you figure out what you want to do with that, so others won't get affected. Rather than, if a DDoS attack is coming in from a certain region, rather than having them come and keep attacking the service and take the service down, you just take the network out, the subnet out completely, which means your service will be available for other users. Those are my viewpoints.
Starting point is 00:18:03 Yeah, absolutely. I have to agree with you on both those points. for other users. So that's, those are my viewpoints. Yeah, absolutely. I have to agree with you on both those points. I think that the first one, I agree, I think a call for more data sharing is absolutely warranted. You know, and this is something that's been going on, you know, both both sides for a long time in the security world where a lot of vendors have their own, you know, some kind of team that's doing threat analysis and stuff and some of them share more data than others.
Starting point is 00:18:28 And that's one of the reasons why we've seen kind of the certs pop up, right, that they kind of should be there specifically to share this data. And this is a problem that's bigger than networking for sure and bigger than cybersecurity and IT in the AI space, which is, you know, having availability of these data, of the data sets is super important. It's really, really hard for anyone to, uh you know make innovations in the ai space without having the training data available and so sharing that data and even maybe some of the analysis of it i think is going to be huge and moving us forward as a community as an industry so so yes you know plus one on that for
Starting point is 00:18:59 sure uh and then the second piece i do think that this is one area where being fully autonomous in the reaction may be a win for networks. And that's the security side. I think that there are so many stories of folks there and make some decisions, at least alert us and tell us that something's going on, or even better, in something like a DDoS attack, maybe because seconds matter, having the machine actually make the decision and stop the traffic immediately itself. Those are definitely areas where I think we'll see some handing over the keys, so to speak, to AI in the network administration pretty quickly because where those seconds in time matters. Maybe having a human in the loop doesn't make sense. Yeah, not only that, but that's a key, right? Human in the loop is always the weakest link when it comes to computers and AI networking, right? So because you have SD-WAN capabilities, that self-defining networks,
Starting point is 00:19:59 so the network can reconfigure, redefine itself, saying that, you know what? I'm sensing an attack. I'm taking myself out. Or I create a different route. Maybe that I set up a honeypot service that, you know, when an attack is coming in, route them to this honeypot service, capture all the information about that attacker that's coming in from, they don't know that it's not an actual service, but then I collect every information about them for my later forensic analysis kind of thing. You know what I mean? Absolutely, yeah.
Starting point is 00:20:26 And we've seen applications that do that in a more manual fashion for years, and it's interesting to think of how, you know, an AI system could benefit that. And then, of course, there's just the nuts and bolts aspect of this too, which is we're talking about a lot of data. And AI systems are going to demand a lot of data. They're going to need a lot of throughput. They're going to need low latency. They're going to need to get that stuff moved from here to there. And so from a perspective of storage and servers and networks, just nuts and bolts stuff, we need better, faster, easier networks to support the AI workload, just like we need
Starting point is 00:21:08 the AI workload to support these networks. So thank you very much for joining us today, Chris and Andy. It's been great having you here, and it's been really enjoyable to have this discussion. Chris, where can people connect with you and follow your thoughts on networking, enterprise AI, and other topics? Yeah, thank you. Definitely. Chris Grundemann.com is a website where I kind of link out to everything else or on Twitter at Chris Grundemann.
Starting point is 00:21:33 I'm happy to reply to direct messages or anything there. Andy as well. I know that you've been doing a lot of work on this space. Where can we learn more about AI from you? Yeah, sure. At my website, thefieldcto.com or on Twitter at Andy Thurai, T-H-U-R-A-I. Thanks a lot. And you can find my writing at gestaltit.com and on Twitter at S Foskett. Thank you all for listening to the Utilizing AI podcast. If you enjoyed this discussion, please remember to subscribe, rate and review the show on iTunes now that we're listed on iTunes. Thank you all for listening to the Utilizing AI podcast. If you enjoyed this discussion, please remember to subscribe, rate, and review the show on iTunes, now that we're listed on iTunes,
Starting point is 00:22:10 since that really does help our visibility. And please do share this show with your friends. This podcast is brought to you by gestaltit.com and thefieldcto.com. For show notes and more episodes, please go to utilizing-ai.com or find us on Twitter at utilizing underscore AI. Thanks, and we'll see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.