Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 09x05: The Role of Data Infrastructure in Enterprise AI with Ingo Fuchs of NetApp

Episode Date: October 27, 2025

As customers try to figure out how to present data to Agentic AI applications, many of them are realizing that it’s time for the storage infrastructure team to step up and take a seat at the table. ...In this episode of Utilizing Tech, recorded live at NetApp Insight in Las Vegas, hosts Stephen Foskett and Guy Currier from The Futurum Group sit down with Ingo Fuchs, Chief Technologist for AI at NetApp, to explore the critical role of data infrastructure in supporting enterprise AI and agentic AI applications. As organizations move AI workloads into production, traditional infrastructures—especially storage teams—must take a more active role in enabling performance, efficiency, and governance. Ingo emphasizes the emerging needs for data quality, control, compliance, and currency, particularly as AI agents begin making decisions and interacting with sensitive enterprise data. The conversation highlights how NetApp’s capabilities, such as AI Data Engine and native infrastructure integrations, enable real-time data pipeline management, enforce guardrails, and ensure consistent and secure data delivery. This shift represents a transformative intersection of storage, infrastructure, and AI operations, paving the way for scalable and reliable enterprise AI solutions.'Guest: Ingo Fuchs, Chief Technologist of AI at NetAppHosts: ⁠⁠⁠⁠⁠⁠⁠⁠Stephen Foskett⁠⁠⁠⁠⁠⁠⁠⁠, President of the Tech Field Day Business Unit and Organizer of the ⁠⁠⁠⁠⁠⁠⁠⁠Tech Field Day Event Series⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Frederic Van Haren⁠⁠⁠⁠, Founder and CTO of HighFens, Inc. ⁠⁠⁠⁠Guy Currier⁠⁠⁠⁠, Chief Analyst at Visible Impact, The Futurum Group.For more episodes of Utilizing Tech, head to ⁠⁠⁠⁠⁠⁠⁠⁠the dedicated website⁠⁠⁠⁠⁠⁠⁠⁠ and follow the show ⁠⁠⁠⁠⁠⁠⁠⁠on X/Twitter⁠⁠⁠⁠⁠⁠⁠⁠, ⁠⁠⁠⁠⁠⁠⁠⁠on Bluesky⁠⁠⁠⁠⁠⁠⁠⁠, and ⁠⁠⁠⁠⁠⁠⁠⁠on Mastodon⁠⁠⁠⁠⁠⁠⁠⁠.

Transcript
Discussion (0)
Starting point is 00:00:00 As customers try to figure out how to present data to agentic AI applications, many of them are realizing that it's time for the storage team and the storage infrastructure to step up and take a seat at the table. That's the topic of this conversation here at NetApp Insight on this episode of Utilizing Tech. Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Future Group. This season focuses on practical applications. for Agentic AI and other innovations in artificial intelligence. I'm your host Stephen Foskett, organizer of AI Field Day and other Tech Field Day events.
Starting point is 00:00:39 And joining me this week as my co-host here, live in person at NetApp Insight, is Mr. Guy Currier. Welcome, Guy. Thanks, Stephen. It's good to be here. It's good to be at Netap Insight. It's good to be here on the pod again, talking about Agentic AI. I think the best thing about NetApp Insight is that this is a real practitioner-oriented kind of crowd. I always say that when I go to an industry event, my favorite thing is talking to the customers that are there and asking them for sort of a reality check. What are you doing? What do you think of this technology? What do you think of these announcements? Will you kind of get
Starting point is 00:01:18 on board with the direction the industry is facing? Yeah, I think that's true. I think every conference the highlight is to be able to speak to practitioners, also to strategists and architects who are a different form of practitioner, and to see and hear their reactions, usually, frankly, at a vendor conference, pretty enthusiastic about what the vendor has to offer, but then just getting a little deeper and seeing how it may apply to whatever it is they're trying to do now, across industries, in certain specific scenarios where the rubber hits the proverbial road, because that enthusiasm needs to translate into something valuable, valuable into value into areas where they can innovate in their particular industries,
Starting point is 00:02:03 maybe not have to worry about the tech or maybe geek out into the tech in order to get that value out of it. And getting all that color on it, I think, is really helpful for us as we absorb all the different stories, messages, products, and stuff out there. Yeah, and when it comes to Agentic AI, you know, many companies are, just starting on their AI journey. And so it's hard to know how long, how far along we are, what roadblocks they've found so far, what they've found that works, what they've found that doesn't work. And so this week on the podcast, we've invited on to sort of speak to that,
Starting point is 00:02:39 somebody who focuses on this for NetApp. So Ingo Fuchs, welcome to the show. Thank you very much. Thanks for having me here today. Well, my name is Ingo Fuchs, as you said. I'm the chief technologist for AI here at NetUp, and so I spend a lot of time talking to customers and partners and analysts and people like yourselves and just really trying to help our customers cut through all the noise and all the hype and figure out how they can really derive value out of AI and what the right technology bits and pieces are to make that happen. I want to start, though, with one thing that I just thought of as you were talking about this moment ago, Stephen, which is a lot of people are just starting out on their AI journey. Some are starting out on their agentic AI journey
Starting point is 00:03:20 before they've been really gotten very far on their AI journey. The general sense I get is that there's so many things that are different, or let's say more extreme, about this particular technology revolution. And one of them, I think, is that customers, adopters, users are just all over the place. There's more variety right now. There's some who have barely touched it, other than, of course, employees using chat GPT or what have you. There are others who are really well advanced. There are many who have been doing AI for years and years.
Starting point is 00:03:54 Are you getting that same sense from the customers that you work with, that it's just such a wide variety of scenarios? Yeah, I think I agree with your observation. I think there are really two things that stand out to me. One is that we see a very, very large trend towards productizing AI, actually moving AI workloads into production. So you absolutely right, that people are in very, very different places
Starting point is 00:04:19 in terms of experimenting with AI, having small production workloads even. Some companies very successfully use generative AI to produce videos and images and blog posts and maybe making their email processes more effective. But the opportunity with agentic AI and building AI factories, that is really where the rubber hits the road. As you said earlier, that is where money can be made,
Starting point is 00:04:45 and that is where your competitiveness really comes to fruition. That's where you out-compete your competition. And so with that move to production come new requirements. And I'm sure we'll go deeper into that. But the other trend that I also want to talk about here as we are at this conference with storage and infrastructure practitioners is that the storage team has a massive role to play
Starting point is 00:05:11 as companies are looking into deriving value from AI, deriving value from their data and moving AI workloads into production. I was just literally half an hour ago. I was meeting with a customer, and it was the storage team and the infrastructure architecture team, and they started the conversation with, well, we don't care about AI because the AI team told us that we don't have a role to play. And I was like, well, hold on a second. Let me kind of explain to you what the role of a storage and infrastructure team is for
Starting point is 00:05:40 production AI workloads and how much you can help your AI group and your AI practitioners in your life of business to really be way more efficient and make a much, much bigger impact. Because there's so much that a storage person and infrastructure person just knows and so many techniques that they're very familiar with that are just routine for them. They don't even recognize what it can do for production AI. Yeah, and I think it's interesting that the storage team has that idea about themselves. I think that, you know, in many cases, storage has been sort of marginalized, and, you know, they're just over there. They care about, like, the deepest part of infrastructure, you know, it's, yeah, we don't need to involve those. But to be honest, I think that a company that's not engaging their storage team and their storage products in their journey toward AI is really going to miss out because it's going to be so inefficient.
Starting point is 00:06:40 Whatever they come up with is going to be expensive and challenging, and, you know, there's going to be, you know, potential problems where they could head that off by just getting the storage team involved. Can we just break out of this idea? It just seems ancient at this point. The demands that the repositories, the resource pools, the applications, the demands being put on, you know, the entire infrastructure, are extreme in comparison to, you know, even five years ago.
Starting point is 00:07:15 So this idea that there's any corner of that infrastructure that does not have something to contribute to, especially to the latest AI revolution or the latest revolution of AI, I mean that's, I think we should just put that to rest. There is nobody we talk to in any part of IT who can't apply what it is they're doing day to day, and maybe even doing day to day for 10 years.
Starting point is 00:07:37 They can't contribute, no? Yeah, no, I agree. I think what we're seeing is that AI is becoming just another enterprise application in a lot of ways. Now, every enterprise application has unique requirements. They need certain things, they need certain functionalities and architectures that are suited for them. But AI is, especially agendic AI, is becoming an enterprise workload. And so from that perspective, IT needs to be able to provide value when it comes to that. So creating like an outlier, some shadow AI, shadow IT kind of organization with all their own stuff for an experiment that often works really well.
Starting point is 00:08:17 They can be in the cloud. They can be on premises. But if you're moving into production, it's a little bit like when public cloud first came out, you know, over a decade ago. And people that were early adopters were mostly concerned with, does it work? And then they found, oh, it does work. That's great. At some point in time, they were like, ooh, wait, you're not doing any backups. of that data, you're not replicating that
Starting point is 00:08:39 to another availability zone, you're not replicating to another region, you're not doing any compliance checks. Re-learning all the lessons. No, DR, right? And so all the IT folks were like, duh, you know, these are very, very basic things that we have done for decades, we know how to do that.
Starting point is 00:08:57 And then when you apply that to AI, think about vector DB bloat, just as one example, right? So when you're looking at typical AI workload, you have all of your unstructured data, could be petabytes of unstructured, data, you need to put structure on top of that, which is vectorization, right? So you're creating these embeddings and you store that in a database. A typical vector DB very often will be about 10 times the size of your original data set.
Starting point is 00:09:23 So if you have a petabyte of unstructured data, and then you end up with a 10 petabyte vector database. Well, but if you do this right with summit like NetUp, you don't need that 10x factor because we don't need to store the same copy over and over and over and over again. And guess what? We have done this for decades. We can create clones. We have done snapshots. We have done mirroring.
Starting point is 00:09:46 We have done all of these things for decades. So it's really, really simple to really apply these principles to these modern workloads. And I'm not even talking about compliance and data guardrails and privacy and all of that other stuff that also need to be considered. Yeah. And I think that we should zoom in on that. a little bit though, because to me, the hallmark of an enterprise application, whether it's an AI application or AI agents or just a conventional application, is the data. It is the enterprise's data. Ultimately, that is what makes it an enterprise application. And that has so
Starting point is 00:10:28 many challenges. So I would say that, you know, generative AI in the form of chat or image generation or something like that, it's not really inherently an enterprise application. Now, I can see that it might be helpful to have a chatbot to help answer questions based on customer data or something like that. I can see that that might be helpful. But Agenic, having AI agents that are assisting your business
Starting point is 00:10:56 and doing tasks on behalf of the company and the employees, now that's a whole different ballgame. And that's why I get a lot more excited about where we're going with this, but, but, but if you need data, then that opens up a whole can of worms in terms of figuring out which data that is, you know, like you said, optimizing it, configuring it so that the AI application can access it, and then controlling it, because that, I think, is really the biggest challenge. Another aspect to that, I agree with everything that you said, but another aspect to add to that
Starting point is 00:11:34 if you're thinking about a Gentic AI, is agents can make decisions. That's one of the key differences with the GENDIC AI. And so if an agent is making decisions based on outdated or incorrect data, that can be very, very bad. That's going to be the wrong decision. Right? And so even if you roll it back
Starting point is 00:11:52 to a very simple example of a chatbot, even if you just say like, oh, it's not even an agent, it's just a chatbot. If you have a customer service chatbot that is providing feedback to a customer, And sometimes these are, they're even pretending to be human agents, right? So, but, okay, so you have a customer service chatbot that is using a customer service policy from two years ago,
Starting point is 00:12:13 not going to lead to, that is outdated by now, is not going to be useful. If you are a hospital network and the patient has just come in, the blood test was taken, the blood test was updated in the system, but your AI data pipeline hasn't been updated with the latest data yet, you're not going to get good outcomes. So data currency and accuracy is very important. And so with data currency, that is a benefit for company like NetA
Starting point is 00:12:39 because the data sits on our systems. We can detect that something has changed. And we can propagate that change with the correct guardrails through the EID data pipeline, making sure that the customer experience at the end of that pipeline is the right one. With the correct and consistent global guardings, guardrails, for example, or other factors.
Starting point is 00:13:05 Absolutely. I want to follow up a little bit later on this question of currency. But what are the different factors? Can you sort of lay out based on the work you've been doing? Guard rails are one factor. I guess currency is another factor. What are these different factors? Because I think it would be really helpful for our audience
Starting point is 00:13:30 to understand the scope of this. that. I think we all tend to see our part of the elephant. We don't see the whole thing. Absolutely. You know, so from my perspective, it always starts with, like you said, like data fuels AI, right? So if you have poor data, you're going to get poor outcomes. Those quality. And so, yeah. And so start with that data and having the right data. And so that starts with finding the right data, identifying the right data. So that means you need to have a view into a unified data model where you can find all your data, whether it's on premises, in the cloud, and that includes neo-clouds and sovereign clouds.
Starting point is 00:14:06 So first of all, the question is, can you find the right data that you want to feed into the system? The second question is, then, can you prevent data that you don't want in the AI data pipeline at the beginning of that pipeline? So if you have something like credit card numbers, search security numbers, you know, medical conditions, you know, something that you don't want the agent to disclose at the end of that pipeline, the best place to stop that information is, at the beginning of that pipeline. So that's where you need data classification
Starting point is 00:14:36 and data curation. Because ultimately, nothing, no guardrails that have been discovered yet can actually put a guardrail around AI that will keep it 100% of the time from doing something unpredictable. It's a modern arms race. Yeah, and one and there'll be other experts.
Starting point is 00:14:55 So the only way to keep the data from leaking is to keep that data from going in in the first place. Yes, exactly. And this way you build like multiple layers. So I'm not saying like this is the only layer of protection that you want, you have multiple layers. Same with like ransom air protection for example, right? You have multiple layers to protect your core value,
Starting point is 00:15:14 your data from being attacked. So you do the same thing with guardrails in the AI world is where you need to protect your data. So you have that data that you feed into the pipeline. You need to think about efficiency. You need to think about reliability. If you're using AI in production, if AI is controlling a manufacturing workflow,
Starting point is 00:15:32 You don't want to shut down the factory for a few days because you have a mistake in your data, right? So you gotta have that reliability, you gotta have the availability, all those standard things. But then I also wanna talk about security because the more value that is in your data, and as you're using AI, your data becomes more valuable,
Starting point is 00:15:51 the more important it is to protect your data and that is ransomware attacks, that's exfiltration attacks, that's worrying about things like post-quantum cryptography, which is maybe like a whole other topic for a different day, but thinking about if somebody comes in today and steals data that is very, very valuable to you, it's probably also very valuable to them,
Starting point is 00:16:14 like your customer database. So yeah, maybe they can't decrypt it today, but with quantum computing, maybe they can decrypt it in a few years. And it's probably still valuable to them. Even if your customer database is two or three years old, your competitor is probably still going to find value in that. So security is another really important aspect in this. So a lot of these requirements sound very familiar, don't they?
Starting point is 00:16:38 But even just core infrastructure considerations like multi-tenancy suddenly become important. So now think about you have agentic AI where you might have thousands of teams of agents that are all over your shared infrastructure with all of your other mission-critical applications, all operating in the same environment with secure multi-teacher. tendency and quality of service enforcement, you can make sure that your mission-critical applications perform as expected, whereas if you don't have that, you might have some rogue agents that are just gonna grab
Starting point is 00:17:13 all of the CPUs and all the network and everything else that they can grab. You wanna prevent that. So you really gotta apply your IT standards, principles, and guardrails, and on all of these dimensions for AI workloads. They are your next enterprise workload, and it needs to be treated the same way.
Starting point is 00:17:33 Let me follow up now on the concurrency and adjacency part of this, data adjacency. I think one of the things that we learned over the course of this season of utilizing tech is the ways in which you can think of an agentic AI as, let's call it, a software replacement. In other words, what you might achieve with a piece of desk, software or a piece of enterprise software or a piece of SaaS, a SaaS interface, you now can
Starting point is 00:18:09 achieve with agentic AI, whether it's scheduled, whether it's responsive. I'm thinking also the health care example that you gave. Here, data concurrency and adjacency is really critical for agentic AI in particular because if you start to think of it as this is a different way for me to you know, tabulate or write or be productive in my job or follow a workflow, instead of using a software interface, I'm using an AI agent, then really the matter with which you tend to be working is proximal, adjacent, local, and concurrent data. The healthcare example really puts it into light because I don't want to be getting recommendations based on blood tests that are three months old. I want it to be on the blood test that I just took. But if I'm just a knowledge
Starting point is 00:18:58 worker. I am making decisions or doing things in the software based on what I'm working on right now, not based on what some AI was trained on over the course of six months of prior data. Yeah. So how do you help to ensure that and solve for that when you're thinking about possibly terabytes of training data, but yet, you know, don't just say rag at me. Like you were talking about something that may have, especially with agentic, multiple input points, multiple types of requirements and qualifications. And following on to that, I think that one of the interesting things about AI agents is that they need to have consistent data from agent to agent to agent, but they don't often necessarily need exactly the same data. And so in many ways, you need to be able to organize and control and present a time-consistent interface to this agent and then this other agent and this other one. So think about some kind of like a sales workflow or something like that.
Starting point is 00:20:07 You know, if you close the deal or sell one of the widgets in between the first agent and the last agent accessing that data set, then, you know, they might be presented with different sets of data. and that may be what you want, or it may not be what you want. And so you need to be able to have that kind of control. It's like a multi-tenancy almost. Yeah, and it leads to security considerations and privacy considerations. And what can you disclose?
Starting point is 00:20:33 What do you want to disclose at particular stages of a process, for example? And so when you're curating your data sets as part of your AI data pipeline process, and that's something like NetUp API Data Engine, for example, does for you, is that you can define these guardrails, you can define what that curation should look like
Starting point is 00:20:56 and what should be presented for separate workflows. But I wanted to expand on your point a little bit beyond what you said. And also thinking about today, we kind of have this duality of data is either structured or unstructured, or we're using file protocols for unstructured data and block protocols for structured data. We have been following this system for a very long time.
Starting point is 00:21:17 I believe that it's fundamentally changing now. So we're now looking at unstructured data, saying, well, it's unstructured data, but to make sense of unstructured data, we actually gotta put some kind of structure on top of that. We gotta assign some numerical values and store them in a database. So we're putting structure on top of unstructured data,
Starting point is 00:21:33 which is this whole process around vectorization and embeddings and all that kind of stuff that I mentioned earlier. When we do that, what it leads to is that we can have semantic interactions with your infrastructure. You can have a communication with your data. You can talk to your data using Agentic AI, using protocols like MCP.
Starting point is 00:21:55 So that is gonna be a whole new world of interfacing. So you're not using file protocol, you're not using a block protocol, you're using something like MCP, using an Agentic AI protocol for your agents or chatbot to talk to your data. Well, this is core. Finally, this is core to Agentic AI design.
Starting point is 00:22:16 is this sort of malleability or adaptability of connection, of, it's, I don't know, you know, it's prompt engineering squared. Yeah, I guess so. I think it goes beyond that, right? It gets like a fundamental redefinition of how you interact with your infrastructure and what your infrastructure needs to do.
Starting point is 00:22:41 You know, in the past, would you have expected a storage company to understand the content, context and the actually understand the data that's stored in these systems typically not you would look at some higher level you know data company to do that at the end of a process this is really interesting because now you're building that into the infrastructure itself you still might have some you know most companies will still have some broader you know data-aware application that does specific things for their industry or for the use case.
Starting point is 00:23:17 So there's going to be an other layer of abstraction on top of that. But that layer can now semantically interface with the infrastructure and get data, metadata, and context directly out of the infrastructure instead of trying to retrofit it after the fact. That's way more efficient, way more secure, because you can apply guardrails all the way through the entirety of the process. You can prevent data that you don't want to be seen ever from even getting into the pipeline. of these things, it requires a fundamental rethinking of how you feel about data, how to process
Starting point is 00:23:51 data, and what you want to get out of your infrastructure that sits at the heart of your business. I was going to say that this is really interesting because this kind of melting and merging of functionality up and down the stack. We're seeing quite generally, but it's really evident in the two of the big announcements at NetUp Insight, which were like AFX and... and the AI Data Engine, which are taking on at what I think I'm sort of inaccurately calling the device level, a great deal of the functionality needed for these ML and AI systems.
Starting point is 00:24:29 Yeah, absolutely. That's in the device. It's in the storage tier. Stuff that was even above the abstraction layer recently as a year ago. Yeah, and as you saw in our sessions, right? So we talk a lot with partners of ours, like, in first, chromatica or domino data labs, you know, obviously the hypers with their own AI solutions where we have these unique integrations through our first-party services.
Starting point is 00:24:54 So we are not displacing those tools at all. We are making these tools better by making sure that they get the right data at the right time, the right quality, the data is current, it's not outdated, so all of these tools become so much more powerful and so much more effective in the work that they do because they get better data from NetUp from that customer. It's the customer's data. It's always the customer's data. But we built the infrastructure to make these tools more impactful, more efficient, to
Starting point is 00:25:26 drive better outcomes for the customers. And at the end of the day, that's all that matters, is that our customers succeed. It's really an interesting topic, and it's something that we've learned a lot about here at NetApp Insight this week. And as Guy points out, this whole season of utilizing tech, we're going to be continuing this conversation on AI. We're actually relaunching the original Utilizing Tech seasons as Utilizing AI. We're going to continue utilizing AI as an ongoing weekly podcast, and that you can find in your favorite podcast application as utilizing AI. Utilizing tech
Starting point is 00:26:02 will be back with other topics in IT as a seasonal, serial podcast in the future as well. So before we go, though, let's kind of wrap this one up and talk to a little bit about where we can continue this conversation and where folks can learn more. So Ingo, where can people find you? Yeah, the best way to get in touch with me is through LinkedIn, quite frankly, I'm presenting a lot of conferences and for our customers go through your sales team. And where can people learn more about some of these NetUp capabilities that you mentioned? Yeah, check out NetUp.com and our YouTube channel. We have some amazing videos out there. We have demos.
Starting point is 00:26:43 For those of you that want to go really, really deep into the technology, we provide hands-on labs that are available, where you can actually explore production environment systems that are filled with data. We can experience the entirety of the data pipeline from beginning to end. And I'll also point out that here at NetApp Insight, we recorded a number of Tech Field Day sessions, which are deep dives into these products,
Starting point is 00:27:08 and those are also available on the Tech Field Day U. on the Tech Field Day YouTube channel, as well as the Tech Field Day website. Guy? You can find my works at Futrumgroup.com. My writings are there. I'm also pretty active in a stream on LinkedIn.com slash in slash Guy Courier and at blue sky, guycourier. B-sky. And as for me, you'll find me here at AI Field Day, Tech Field Day, and so on, along with, of course, the weekly Tech Strong Gang podcast, you know, the Tech Field Day podcast, the rundown, all sorts of different ones that we were producing here for the Futuram Group. So if you enjoyed this conversation, you can find more episodes.
Starting point is 00:27:50 Just go to your favorite podcast application and search for utilizing tech. This podcast was brought to you by Tech Field Day, which is part of the Futurum Group. For show notes and more episodes, head over to our dedicated website, which is Utilizing Tech. or you can find us on X Twitter, Blue Sky Mastodon at Utilizing Tech. Thanks for listening and we will catch you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.