Everyday AI Podcast – An AI and ChatGPT Podcast - EP 550: How a Transformative Data Strategy Powers AI Success

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. In the rush for AI success, it's really easy to overlook probably one of the more important things.

Starting point is 00:00:52 And that's your data strategy. As generative AI has become more and more accessible to non-technical people, people that don't have, you know, huge data teams or maybe experience on data strategy, it can be pretty easy. to overlook what is probably the biggest step. And that's why I'm excited for today's conversation on how a transformative data strategy can power your AI success. All right, thank you for tuning in and welcome to Everyday AI.

Starting point is 00:01:26 What's going on y'all? My name's Jordan Wilson and I'm the host of Everyday AI and this is your daily live stream podcast and free daily newsletter helping everyday business leaders like you and me, not just keep up with what's happening in the world of AI, but how we can use it to get ahead, to grow our companies and our careers.

Starting point is 00:01:42 So if that sounds like what you're doing, you are in the right place. It starts here with our unedited, unscripted, live streaming podcast. But where you actually are going to go and put this into practice is on our website. So please, if you haven't already, go to your everyday AI.com, sign up for that free daily newsletter. There, we're going to be recapping the highlights from today's conversation, which I'm excited about. But also in the newsletter, you're going to see everything else that's happening in the world of AI.

Starting point is 00:02:08 Put simply for you to know. and take advantage of and so you can be the smartest person in AI at your company or in your department. So please make sure to go check that out. The AI news is going to be in there as well. So without further ado, let's go ahead and bring on our guest for today. I'm excited to have him. So live stream audience, please help me welcome to the show. We have Ashish Verma, the U.S. chief data and analytics officer at Deloitte.

Starting point is 00:02:35 Ashish, thank you so much for joining the Everyday AI show. Jordan, thank you for having. me. All right. Great conversation. Yeah, I'm excited for it. So first, I'm sure everyone, or almost everyone, is aware of Deloitte. But, you know, could you just tell us a little bit about what you do in your role there? Yeah, absolutely. So in my role as chief data analytics officer, you know, there are a few mandates that I have for our journey into sort of the world of AI and agentic and, and of course, Gen. I, right? Not in that order of fashion, but, you know, whatever the flavor of the day, as you can imagine, right? Data is sort of the underpinnings of all of these experiments that we do, right? Some of them are for ourselves and some of them are for our clients, but nonetheless, right? Like if you start to look at sort of all of the data that we need that fuels this experiment, you know, we pretty soon began to realize that, you know, it just not was our data that we needed to sort of do this at scale.

Starting point is 00:03:30 It was our data. It was third party data. It was a business partner data. It was synthetic data and so on and so forth. As we talked through, you know, the process to procure that data. to standardize data, to make it available in the data marketplace for people to be able to interact with it, the data concierge function. That entire mandate sort of rolls up to the office as the CTO.

Starting point is 00:03:50 So my mandate in essence is to make sure that if we're going to experiment with AI or agents or algorithms, that our ambition is commensurate with our data strategy and that we have the right data with the right compute environment to make it happen. You hit all of my favorite keywords there, agents, algorithms, data, strategy, this is going to be a fun conversation. But let's just kind of skip ahead to the end here. And then maybe we'll rewind a little bit of sheesh, but why is data so incredibly important

Starting point is 00:04:23 when it comes to digital and AI transformation? Why does it start there? You know, if you were to look at the underpinnings of sort of like the end outcome, right, of any of these, whether it's an agent or it's an algorithm, right? you would start to realize that, you know, data is what feeds it, right? Data is what drives the outcome, right? Now, whether it's deterministic or probabilistic, you know, we can get into sort of the nuances of,

Starting point is 00:04:48 you know, today's, you know, agent-centric coding platforms and reasoning versus sort of, like, you know, how we quoted in the past, but nonetheless, right? You have to use data for the underpinnings of the attribution of sort of training these models or training these agents or training these algorithms. And pretty soon you realize that you don't have to be. have enough of that within the four walls of your organization, right? There is no buddy in the world today that can sort of point to their data strategy from, you know, a year ago or two years ago where they said like, you know, as long as I got my house in order, my internal data that

Starting point is 00:05:23 met sort of the mandate or what I could do for my business partners, right, whether that business partners, your CFO or your CMO or whoever, right? In essence, they were wanting to make sure that you had the hygiene right. And in essence, you could, you know, procure for them a compute environment for whatever they intended to do. At best, it was conformed SQL or ad hoc querying or a report or a dashboard or some flavor of that sort. Now, when you extrapolate to where we are today and you start to see sort of what you need, right, you never have enough of what you need, you know, within the four walls.

Starting point is 00:05:54 And, you know, what you're attempting to do, the reasoning or the algorithm or the agent is forcing you to sort of not just interface with your data, but also your data and somebody else's that and somebody else being public domain, right, depending upon sort of what you're doing, or synthetic data depending upon what you're doing, or a business partner's data depending upon what you're doing. So the sort of the use case determines which path you take. But irrespective of the use case, you pretty soon realize that it's just not your data. It's your data. It's second party data, which is the data with you and your business partners. It's third party data that you procure. We at Deloitte procure hundreds of million dollars

Starting point is 00:06:32 worth of third-party datasets from, you know, from every other data broker that you can conceive in the world. And of course, longitudinal data sets that you can sort of assemble that you have to do through the synthetic data route. And I do actually want to get back to the synthetic data because that's something I'm curious about. But it's interesting because I think that the landscape has changed a lot, right? Specifically with the kind of introduction of generative AI over the last five or so years.

Starting point is 00:07:02 But before that, I think that, you know, certain enterprises, they could have a moat just in the technology, right? You know, if if you had, you know, big data rooms or, you know, AI and ML teams for, you know, a couple of decades, like a lot of larger enterprises have, you know, that could be a huge competitive advantage. But now the barrier of entry has gone down significantly. So, you know, I'm curious both for your own firsthand experiences and with the worldwide clients that I know Deloitte is working with, how important is data specifically even more important than even the technology? Because anyone can go out and use these agents, anyone can go out and use the state of the art, you know, large language models. Is data actually the differentiator now?

Starting point is 00:07:51 Yeah, it absolutely is. And, you know, for those of you that have sort of done this or in the middle of this, you know, this is going to start to resonate, right? Like when you realize that you can't sort of get, you know, when people talk about hallucination, right, they think it's, you know, something is fundamentally gone wrong. And I tell them it's a feature set, right? Because in any probabilistic model, like some aspect of, you know, getting to the answer is sort of predicting the outcome, right? So in your attribution of your data set and the labeling of your data set is what makes the hygiene and or the outcome possible, right? So if you skip the part of the annotation or the labeling and you sort of don't understand the policy or user's engine around

Starting point is 00:08:32 these data sets, you pretty soon come to the conclusion that your ambition is not commensurate because your data doesn't support your ambition. And that is sort of where most chief data officers begin to struggle to figure out sort of how do they accelerate this. And the acceleration part comes back to sort of where we started this conversation, right? What is your data strategy? What are the key pillars of your data strategy? irrespective of whether we spoke about procurement of the data set or the ambition of that, you know, data set as a result of whatever you're attempting to procure. And I love that.

Starting point is 00:09:05 Your data doesn't support your ambition. I think that's an important one for our listeners to hear. But, you know, could you maybe talk a little bit about some common threads that you all have seen at Deloitte when it comes to, you know, companies trying to deliver AI at scale? What are the things on the data side that you keep seeing big companies get right? And what are the things that you see them keep getting wrong? I think the first thing that I think is paramount to sort of getting this is what I call the data marketplace. So we've been running the equivalent of an Amazon marketplace for data for the better part of about 2 and a half years now.

Starting point is 00:09:48 And think of it as a single landing spot, which basically is how you enter the universe to figure out, what data we have. We have roughly about 520 data feeds at this given point in time. Think of those 520 covering all permutations, public domain, the Lloyd Internal, synthetic, third party, so on, so forth. And the reason why that data marketplace is very important in essence is that is sort of where we understand the use case consumption criteria or usage criteria that sort of formulates a procurement strategy, right? If we didn't have the data marketplace, it was very, very difficult to interact with are business user world. I mean, there's 450,000 people at Lloyd,

Starting point is 00:10:27 455,000, 178,000 in the US, right? So when 178,000 people come knocking to figure out what data you have, what policy engine on that data you need it, and what can it feed and what it cannot feed, what the terms and conditions are, I don't think that you can have a human middleware in the equation, concierging that dataset one user at a time. So I think the biggest thing that I get asked about is, you know,

Starting point is 00:10:52 what led to a data marketplace and how does that data marketplace become contextual to people's ambition, right? Like so, you know, today we run a data marketplace that is sort of on its way to become contextual. So almost like, hey, let me tell you what I have based on you telling me what do you need to do. Right. So the data interacts with sort of your behavior and use case to sort of lead you down the path of the right data set with the right policy engine and the compute environment, as opposed to deterministic search, which is sort of what you. the old world was, right? You sort of showed up to the door doorstep and you said, look, I wanted to conform SQL or I want to pivot this or I want to build a dashboard. Give me so much of this and so

Starting point is 00:11:31 much of that. And then, you know, off I go and I, you know, I curate the data pipeline and I build the end result, right? No longer true, right? Because it's not, it's multi-vetted data sets. It's not just your data. It's your data and external data and third-party data and synthetic data. And it's not a single compute environment, depending upon what you're attempting to do. I've got to give you CPUs. I thought to give you GPU. I got a CCTPUs and, you know, some tooling on top of it above the compute for you to get to the answer. So where they people sort of pretty soon start to realize that the data concierge, the data marketplace, the compute environment, and the ambition all start to need to correlate to something that is sort of on the roadmap of a CIO or a CDO to put into place, right? Or else you're doing this fairly sporadically.

Starting point is 00:12:18 It's, you know, and you're reacting to sort of what people need as opposed to what. you need to have for the ambition to be true. One thing I'm always thinking about is there's obviously different sectors in the business world that naturally have access to more quantifiable data, right? But then for those that maybe don't have as much, right? They don't have as much structured data, but they have a lot of unstructured information, right, that helps their company move forward. how should those types of organizations be looking at their data?

Starting point is 00:12:54 Like, is there a way that they can, you know, really corral maybe more of the unstructured data to really help propel their transformation forward? Yeah. I mean, like I talk about, you know, you will also sort of, you know, come to another conclusion when you start this journey for agentic and AI, right? Most of it is unstructured before it really is structured, right? like so, you know, documents, PowerPoints, right? Like the things that you pretty much didn't, you know, go mine before is sort of like, you know,

Starting point is 00:13:28 the secret sauce for, you know, how you lend it, you know, conformity for your ambition. I'll give you the example, right? In our world, something as simple as, you know, staffing people through a resource management function is pretty much making sure that you can sort of tie the role description to the right resume, So when you show up to an engagement, right, and whether we sold an engagement to migrate something to the cloud or we build it, you know, we're building an agent in Salesforce or we're doing an SAP transformation, you need to have a particular skill set, right? That means you've done this before in a particular industry. You're certified in the technology. That's how a resource manager sort of matches you and your experience to the role. And every resume is either in a word document or a PowerPoint. There is no humanly possible way for a resource manager to reach $45,000 or $150,000 or $1,000. 177,000 resumes to find you the right role. So what they do is they do a keyword search, right? Partly because the resume database is not contextualized or indexed for you to be able to do

Starting point is 00:14:29 sort of contextual search, like you are used to when you get into the interface of a Google and the UI, Ux prompt, you type in English what you need and you see relevant ranked search results, right? But what actually happened is Google parsed the entire World Wide Web, parked it in a content store, indexed that data set, and gave you. of contextuality through query to be able to figure out rank and relevance for you to get to the answer. We did the same thing with a resume database, right? We contextualized it.

Starting point is 00:14:56 We indexed it. We gave it a query engine. Now it's as simple as sort of doing on the UIUX prompt, like a role description. It shows up in near real time with the resource and whether they're staffed or not staffed. So my resume information and my staffing information are collated for the answer that you need. that took a resource manager or several resource managers to do one resource, one role at a time. And I think that's a great use case, an example that a lot of people can relate to. So I want to ask you a little bit here about agentic AI, but before we do, real quick, a quick break from our sponsors.

Starting point is 00:15:37 This podcast is supported by Google. Hey, everyone, David here, one of the product leads for Google Gemini. Check out VO3, our state-of-the-art AI. video generation model in the Gemini app, which lets you create high quality eight second videos with native audio generation. Try it with a Google AI pro plan or get the highest access with the ultra plan. Sign up at Gemini.com to get started and show us what you create. All right. So we talked a lot about the importance of data for a, you know, helping your digital transformation strategy. But when it comes to agents, like that's when I

Starting point is 00:16:21 even start thinking about data a little differently, right? Because even if it's a human, you know, operating a large language model powered system, there's still a human that kind of looks at that data at some point you hope and they're like, yeah, that's, that's correct. But when it comes to agetic AI and when these systems are going to start using our dynamic data and start executing decisions on our behalf, I think it even more so prioritizes the importance of correct data. Could you talk a little bit about, you know, what you've seen so far in your experience in that regard when it comes to having your data right specifically for agentic AI? Yeah. I mean, I'll tell you, right? Like the reasoning aspect of an agent, you know,

Starting point is 00:17:06 is sort of what is very appealing about the fact that, you know, you can have a set of tasks being done on behalf of a human or a machine by an agent, right? So think of agent as, you know, something that knows how to reason through a set of complex tasks to arrive at an outcome when you feed it some data. I think where things we talk about, you know, agents behaving themselves or an agent registry or an agent orchestration, all the nuances of getting agents to operate. And by the way, this nuance of an agent is going to arrive within your world, you know, in a single fashion is sort of not true, right? You know, when you orchestrate an agent and when you operate an agent from one agent to the other, you will transcend, you know, softwares or vendors or platforms or data, right? So what you have to get right in essence is that the attribution of the data set that feeds that agent, you know, needs to be annotated correctly for you to be able to get that agent to sort of behave within the guardrails or boundaries of what you're accepting the answer to be.

Starting point is 00:18:12 And the nuances of that is realized when you start to train the agent to start to do things and you realize that, you know, it is doing something that is not deterministic and it's doing things sort of that are, you know, not expected. And the reason why that is transpiring is because the attribution of the data that feeds that agent is sort of doing or feeding it things that's needing it to sort of, you know, an unexpected answer, right? That's the best way I can put it or not what you would have expected. So I think that if you start to look at attribution for the purposes of agentic or if you look at attribution for the purposes of labeling for agentic, we'll pretty soon come to the conclusion that that is sort of one of the biggest drivers for why agents orchestration or registration or interoperability of agents become such an important component, which is by protocols like, you know, open standard protocols for agent to agent is a big topic of conversation

Starting point is 00:19:07 And, you know, no matter where you go these days. Yeah. And yeah, talking about these different, you know, protocols. And maybe if you could explain a little bit for our less technical audience, kind of like what you said there is, you know, labeling data for agenic AI. Like, is it different, right? And how should, you know, especially those larger organizations that do have the resources, how should they be treating their data differently if it's ultimately going to be going

Starting point is 00:19:36 through a large language model type application with a human operating it versus an agentic operation. What is that, you know, what are the main differences, if any, for handling that data for agentic use? Yeah, no, the process of how it goes through sort of the curation in one technology versus the other is the nuance of, you know, whether you use an LLM or you use an LML-centric agent or not, right? But the nuances of labeling is very evident even in structured data what you do today, right?

Starting point is 00:20:06 So if you didn't have the right cataloging or the business metadata or business glossary, right? Usage today is a problem as well. I mean, when I talk to most organizations, they talk about how they haven't conquered their structured data challenges. And they're not talking about sort of, they're talking about process-centric software and instantiating data that needs to be labored for usage, right? So if you look at sort of the world of how data is created within the four walls of most organizations today, you run a process-centric software. That's SAP, that's service-now, that's Salesforce. so on and so forth. And the process instantiates data. Once the process instantiates data, somebody needs to annotate or label that data for business context, technical context,

Starting point is 00:20:46 so that the usage, the persona that uses it, whether it's a business person that develops a report on the back of it or data engineer that builds a data pipeline on the back of that data, knows sort of what its intent is and starts to know the boundaries of usage of the data, right? That is a fundamental challenge irrespective of agent or LLM. The problem is magnified because in tomorrow's world or an agent world, that data is not originating within the world of the organization. So guess what the burden of proof lies. It lies to the people that use something that is not happening within their four walls. Now you're talking about labeling, annotation, business class fee, technical catalog to be built for those datasets. Imagine if it was

Starting point is 00:21:29 hard to do it for your own data. Imagine how impossible is it to do it for something that happens outside of your full walls. Yeah, that's an interesting way to think about it is, you know, the data that is, you know, originates within those four walls and those four walls are, you know, yeah, how can you even define them, especially when we talk about, you know, multi-agentic orchestration? And, you know, if you have different agents going out there and creating new data points on their own, but there's maybe not, you know, maybe they're not working directly with a human in that regard. And if it's this multi-agentic setup, yeah, how like how, how, how, how, can, you know, business leaders even start to, you know, think or plan for being able to collect

Starting point is 00:22:12 that data that's way beyond those traditional four walls? Yeah, which is where registration of agents and so the guardrails of how they behave against that registration and you, you know, what invokes an agent and how do you register an agent and how you orchestrate an agent, I think we're still seeing the beginnings of that, right? Anybody that is claiming that they've done the set scale and it works seamlessly, you know, we don't buy it, right? Because, you know, we do our experimentation and we realize how hard it is, right? And we're just getting started on multi-agent orchestration, you know, even before multi-agent. We're just getting started on single agents, you know, sort of doing the intended outcome before we talk about

Starting point is 00:22:50 agent to agent and handing off to other agents, right? Like, that is still, that is still something that we need to conquer, right? I don't believe that, you know, that journey has come to its logical conclusion. I think we're just getting started. Yeah. And, you know, I think when I think about AI success and, you know, the companies that are doing it versus the companies that are, you know, maybe further behind, I think Deloitte obviously has been at the forefront, right? Like working with some of the largest organizations in the world on their AI strategy. What would you say if we rewind and we look at Deloitte, right, as a case study? What are some of those things that even internally that really helped propel your own AI success as an organization, specifically when it came to your data strategy?

Starting point is 00:23:38 You know, we sort of recognized early on that this was not something that we could wait and watch for it to get to a particular phase or stage. You turn around and say, that's when we'll depart those in the water, right? Like we figured that, you know, this would be done to us if we didn't do it to ourselves, right? there was a level of awareness about what it was doing to the value chain of our clients that it needed sort of our intervention a lot earlier than, you know, we typically, you know, would have thought of it about, right? So the best way for me to describe it is if you look at biofarmor or if you look at biotech, right, the evolution of disease pathology from pharmacology to gene editing because somebody sequenced 210 proteins and you can you can tell

Starting point is 00:24:23 what disease structure does to that. So hence, you know, gene editing is the way to treat disease pathology and not a bunch of biometrics where you go for blood tests and somebody says, oh, you know, your sodium is off or your potassium is off. Hence, you know, disease pathologies, this or that, right? In reality, if you look at sort of what that does to life sciences where, you know, disease pathology is now very different, right? Or going to be very different.

Starting point is 00:24:49 Drug discovery is going to be very different. Manufacturing clinical trials, supply chain is going to be very different. is why we are in this journey. I mean, we realize that the same aspect of what AI is doing to the value chain of pharma or healthcare or, you know, autonomous cars. You take the example, or retail, or, you know, there is no industry or vertical or sector

Starting point is 00:25:11 that it is not going to touch in the short or long term. The question becomes, if we don't participate in this, the portfolio of services that make us relevant today will make us irrelevant tomorrow because we didn't arrive at the time that AI arrived in the value chain. So we did it to ourselves, knowing fully well, that the portfolio of services that we need to build, the best way for me to describe it is the menu when you walk into a hotel or a restaurant of your choice and, you know, the menu doesn't evolve over a period of time.

Starting point is 00:25:40 You stop going to the restaurant. So our menu needs to evolve in conjunction with the evolution of what's happening to these industries of sectors and the clients that we serve. And that was the reason for embracing it from the get-go. Yeah. So Ashish, we've covered a lot in today's conversation, but as we wrap up, what would you say is the one most important takeaway that you have for our listeners when it comes to the importance of their data strategy powering their AI success? I mean, what I would say is, you know, walk with the end in mind, right? Like, you know, if you sort of understand the outcome that you need to intend to do with your data, right? Like, that is your North Star, right? Everything else that you do should be in the service of that. that, right? So, for example, if your ambition is to be agentic or if your ambition is to be, you know, agentic plus, you know, whatever the permutation or choice of tool that you use or

Starting point is 00:26:32 consumption pattern, right, means you've only used the data to consume it in a certain way, whether it's for reasoning, whether it's for LLM, whether it's for conform SQL, whatever it may be, AIML, you pretty much have to build your data strategy, anticipating that that is sort of, you know, the capabilities that you need to have, not when the use case. arrives, not when your business partner arise, but in anticipation of the fact that it is what I call Horizon 2, not even Horizon 3,

Starting point is 00:27:01 and most of these problems when I classified in my mind, they don't look like Horizon 2. They actually look like today's problems, right? And for us to be able to sort of be relevant to our business partners, we needed to have a data strategy that would serve the interest and needs of

Starting point is 00:27:17 how we procure data, what data do we procure, how do we annotate it, how do we how do we get into computer environment. It was such great advice and really helping us lay the roadmap out because everyone's worried and wondering about data and their strategies. So sheesh, thank you so much for taking time out of your day to join the Everyday AI show. We really appreciate it. Thank you, Jordan.

Starting point is 00:27:41 All right. Now as a reminder, y'all, if you missed something you said there because there is a lot of great value, don't worry. We're going to be recapping it all in our newsletter. So make sure, if you haven't already, go. to your EverydayAI.com. If this is helpful, tell us someone about it. If you're listening on the podcast, please make sure to follow the show and subscribe.

Starting point is 00:27:59 Thank you for tuning in. Hope to see you back tomorrow and every day for more Everyday AI. Thanks y'all. Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface.

Starting point is 00:28:27 You direct the outcome while the assistant accelerates execution. Stay in control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com

Starting point is 00:28:58 and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 550: How a Transformative Data Strategy Powers AI Success

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.