Everyday AI Podcast – An AI and ChatGPT Podcast - EP 595: Data First: The Strategic Playbook for AI Success

Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. In the rush for AI success, it's really easy to overlook probably one of the more important things.

Starting point is 00:00:53 And that's your data strategy. As generative AI has become more and more accessible to non-technical people, people that don't have, you know, huge data teams or maybe experience on data strategy, it can be pretty easy. to overlook what is probably the biggest step. And that's why I'm excited for today's conversation on how a transformative data strategy can power your AI success. All right, thank you for tuning in and welcome to Everyday AI.

Starting point is 00:01:26 What's going on y'all? My name's Jordan Wilson and I'm the host of Everyday AI and this is your daily live stream podcast and free daily newsletter helping everyday business leaders like you and me, not just keep up with what's happening in the world of AI, but how we can use it to get ahead, to grow our companies in our careers.

Starting point is 00:01:42 So if that sounds like what you're doing, you are in the right place. It starts here with our unedited, unscripted, live streaming podcast. But where you actually are going to go and put this into practice is on our website. So please, if you haven't already, go to your everyday AI.com, sign up for that free daily newsletter. There, we're going to be recapping the highlights from today's conversation, which I'm excited about. But also in the newsletter, you're going to see everything else that's happening in the world of AI. Put simply for you to know.

Starting point is 00:02:10 and take advantage of and so you can be the smartest person in AI at your company or in your department. So please make sure to go check that out. The AI news is going to be in there as well. So without further ado, let's go ahead and bring on our guest for today. I'm excited to have him. So live stream audience, please help me welcome to the show. We have Ashish Verma, the U.S. chief data and analytics officer at Deloitte. Ashish, thank you so much for joining the Everyday AI show. Jordan, thank you for having. me. That's a great conversation.

Starting point is 00:02:42 Yeah, I'm excited for it. So first, I'm sure everyone or almost everyone is aware of Deloitte. But could you just tell us a little bit about what you do in your role there? Yeah, absolutely. So in my role as chief data analytics officer, you know, there are a few mandates that I have for our journey into sort of the world of AI and agentic and of course, Gen. And of course, Gen. I'm not in that order of fashion, but, you know, whatever the flavor of the day, as you can imagine, right? data is sort of the underpinnings of all of these experiments that we do, right?

Starting point is 00:03:14 Some of them are for ourselves and some of them are for our clients, but nonetheless, right? Like if you start to look at sort of all of the data that we need that fuels this experiment, you know, we pretty soon began to realize that, you know, it just not was our data that we needed to sort of do this at scale. It was our data. It was third party data. It was a business partner data. It was synthetic data.

Starting point is 00:03:36 And so on and so forth, as we talked through, you know, the process to procure that data. to standardize data, to make it available in the data marketplace for people to be able to interact with it, the data concierge function. That entire mandate sort of rolls up to the office as the CEO. So my mandate in essence is to make sure that if you're going to experiment with with AI or agents or algorithms, that our ambition is commensurate with our data strategy and that we have the right data with the right compute environment to make it happen. You hit all of my favorite keywords there, agents, algorithms, data, strategy. This is going to be a fun conversation. But, you know, let's just kind of skip ahead to the

Starting point is 00:04:16 end here. And then maybe we'll rewind a little bit of sheesh, but, you know, why is data so incredibly important when it comes to digital and AI transformation? Why does it start there? You know, if you look at the underpinnings of sort of like the end outcome, right, of any of these, whether it's an agent or it's an algorithm, right? You would start to realize that, you know, data is what feeds it, right? Data is what drives the outcome, right? Now, whether it's deterministic or probabilistic, we can get into sort of the nuances of,

Starting point is 00:04:49 today's agent-centric coding platforms and reasoning versus sort of like, you know, how we quoted in the past. But nonetheless, right? You have to use data for the underpinnings of the attribution of sort of training these models or training these agents or training these algorithms. And pretty soon you realize that you don't have enough of that within the four walls of your organization, right?

Starting point is 00:05:11 There is no buddy in the world today that can sort of point to their data strategy from, you know, a year ago or two years ago where they said like, you know, as long as I got my house in order, my internal data that met sort of the mandate of what I could do for my business partners, right, whether their business partners, your CFO or your CMO or whoever, right? In essence, they were wanting to make sure that you had the hygiene, right? And in essence, you could, you know, procure for them a compute environment for whatever what they intended to do. At best, it was conformed SQL or ad hoc querying or a report or a dashboard

Starting point is 00:05:43 or some flavor of that sort. Now, when you extrapolate to where we are today and you start to see sort of what you need, right, you don't, you never have enough of what you need, you know, within the four walls. And, and, you know, what you're attempting to do, the reasoning or the algorithm or the agent is forcing you to sort of not just interface with your data, but also data, your data and somebody else's data and somebody else being public domain, right, depending upon sort of what you're doing or synthetic data depending upon what you're doing, or a business partner's data depending upon what you're doing. So the sort of the use case determines which path you take.

Starting point is 00:06:19 But irrespective of the use case, you pretty soon realize that it's just not your data. It's your data. It's second party data, which is the data with you and your business partners. It's third party data that you procure. We had Deloitte procure hundreds of million dollars worth of third party data sets from, you know, from every other data broker that you can conceive in the world. And of course, longitudinal data sets that you can sort of assemble that you have to do through the synthetic data route.

Starting point is 00:06:45 And I do actually want to get back to the synthetic data because that's something I'm curious about. But it's interesting because I think that the landscape has changed a lot, right? Specifically with the kind of introduction of generative AI over the last five or so years. But before that, I think that, you know, certain enterprises, they could have a moat just in the technology, right? You know, if, if you had, you know, big data rooms or, you know, AI and ML teams for,

Starting point is 00:07:15 you know, a couple of decades, like a lot of larger enterprises have, you know, that could be a huge competitive advantage. But now the barrier of entry has gone down significantly. So, you know, I'm curious both, you know, for your own firsthand experiences and with the, you know, worldwide clients that I know Deloitte is working with, how important is data specifically even more important than even the technology? Because anyone can go out and use these agents, anyone can go out and use the state-of-the-art, you know, large language models. Is data actually the differentiator now? Yeah, it absolutely is. And, you know, for those of you that have sort of done this or in the middle of this, you know, this is going to start to resonate, right? Like when you realize that you can't

Starting point is 00:08:02 sort of get, you know, when people talk about hallucination, right, they think it's, you know, something is fundamentally gone wrong. And I tell them it's a feature set, right, because in any probabilistic model, like some aspect of, you know, getting to the answer is sort of predicting the outcome, right? So in your attribution of your data set and the labeling of your data set is what makes the hygiene and or the outcome possible, right? So if you skip the part of the annotation or the labeling and you sort of don't understand the policy or users, engine around these data sets, you pretty soon come to the conclusion that your ambition is not commensurate because your data doesn't support your ambition. And that is sort of where

Starting point is 00:08:42 most chief data officers begin to struggle to figure out sort of how do they accelerate this. And the acceleration part comes back to sort of where we started this conversation, right? What is your data strategy? What are the key pillars of your data strategy, irrespective of whether we spoke about procurement of the dataset or the ambition of that, you know, data set as a result of whatever you're attempting to procure. And I love that. Your data doesn't support your ambition. I think that's an important one for our listeners to hear. But could you maybe talk a little bit about some common threads that you all have seen at Deloitte when it comes to companies trying to deliver AI at scale? What are the

Starting point is 00:09:25 things on the data side that you keep seeing big companies get right? And what are the things, things that you see them keep getting wrong? I think the first thing that I think is paramount to sort of, you know, getting this is what I call the data marketplace, right? So we've been running the equivalent of an Amazon marketplace for data for the better part of about two and a half years now. And think of it as a single landing spot, which basically is how you enter the universe to figure out what data we have.

Starting point is 00:09:56 We have roughly about 520 data feeds at this given point in time. Think of those 520 covering all permutations, public domain, Deloitte internal, synthetic, third party, so on and so forth. And the reason why that data marketplace is very important in essence is that is sort of where we understand the use case consumption criteria or usage criteria that sort of formulates a procurement strategy, right? If we didn't have the data marketplace, it was very, very difficult to interact with our business user world.

Starting point is 00:10:25 I mean, there's 450,000 people at Lloyd, 455,000, 175,000, 178,000 in the US, right? So when 178,000 people come knocking to figure out what data you have, what policy engine on that data you need it, and what can it feed and what it cannot feed, what the terms and conditions are, I don't think that you can have a human middleware in the equation conceding that data set one user at a time. So I think the biggest thing that I get asked about is, you know, what led to a data marketplace and how does a data marketplace become contextual to people's ambition, right? Like so. Today we run a data marketplace that is sort of on its way to become contextual.

Starting point is 00:11:05 So almost like, hey, let me tell you what I have based on you telling me what do you need to do. So the data interacts with sort of your behavior and use case to sort of lead you down the path of the right data set with the right policy engine and the compute environment, as opposed to deterministic search, which is sort of what the old world was, right? You sort of showed up to the door door step and you said, look, I wanted to conform SQL or I want to pivot this or I want to build a dashboard. give me so much of this and so much of that.

Starting point is 00:11:32 And then off I go and I, you know, I curate the data pipeline and I build the end result, right? No longer true, right? Because it's not, it's multi-vetted data sets. It's not just your data. It's your data and external data and third-party data and synthetic data. And it's not a single compute environment, depending upon what you're attempting to do. I've got to give you CPUs. I've got to give you CPUs.

Starting point is 00:11:53 I got to give you TPUs and, you know, some tooling on top of it above the compute for you to get to the answer. So where day people sort of pretty soon start to realize that the data concierge, the data marketplace, the computer environment, and the ambition all start to need to correlate to something that is sort of on the roadmap of a CIO or a CDO to put into place, right? Or else you're doing this fairly sporadically. It's, you know, and you're reacting to sort of what people need as opposed to what you need to have for the ambition to be true. One thing I'm always thinking about is there's obviously different sectors in the business world that naturally have access to more quantifiable data, right? But then for those that maybe don't have as much, right? They don't have as much structured data, but they have a lot of unstructured information, right, that helps their company move forward. How should those types of organizations be looking at their data? Like, is there a way that they can, you know, really corral maybe more of the unstructured data to really help propel their transformation forward?

Starting point is 00:13:06 Yeah. I mean, like I talk about, you know, you will also sort of, you know, come to another conclusion when you start this journey for agentic and AI, right? Most of it is unstructured before it really is structured, right? like so, you know, documents, PowerPoints, right? Like the things that you pretty much didn't, you know, go mine before is sort of like, you know, the secret sauce for, you know, how you lend it, you know, conformity for your ambition. I'll give you the example, right? In our world, something as simple as, you know, staffing people through a resource management function

Starting point is 00:13:43 is pretty much making sure that you can sort of tie the role description to the right resume, So when you show up to an engagement, right, and whether we sold an engagement to migrate something in the cloud or we build it, you know, we're building an agent in Salesforce or we're doing an SIP transformation, you need to have a particular skill set, right? That means you've done this before in a particular industry. You're certified in the technology.

Starting point is 00:14:04 That's how a resource manager sort of matches you and your experience to the role. And every resume is either in a word document or a PowerPoint. There is no humanly possible way for a resource manager to reach 455,000 or 177,000 resumes to find you the right role. So what they do is they do a keyword search, right? Partly because the resume database is not contextualized or indexed for you to be able to do sort of contextual search, like you are used to when you get into the interface of a Google

Starting point is 00:14:34 and the UI, UX prompt, you type in English what you need and you see relevant ranked search results, right? But what actually happened is Google parsed the entire World Wide Web, parked it in a content store, indexed that data sets, and gave you sort of contextuality through query to be able to figure out rank and relevance for you to get to the answer. We did the same thing with a resume database, right?

Starting point is 00:14:56 We contextualized it, we indexed it, we gave it a query engine. Now it's as simple as sort of doing on the UIUX prompt, like a role description. It shows up in near real time with the resource and whether they're staffed or not staffed. So my resume information and my staffing information

Starting point is 00:15:15 are collated for the answer that you need, that took a resource manager or several resource managers to do one resource, one role at a time. And I think that's a great use case, an example that a lot of people can relate to. So I want to ask you a little bit here about agenic AI, but before we do, real quick, a quick break from our sponsors. This podcast is supported by Google. Hey, everyone, David here, one of the product leads for Google Gemini.

Starting point is 00:15:45 If you dream it and describe it, V-O-3 and Gemini can help you bring it to life as a video. Now with incredible sound effects, background noise, and even dialogue. Try it with a Google AI Pro Plan or get the highest access with the Ultra Plan. Sign up at Gemini.com to get started and show us what you create. All right, so, we talked a lot about the importance of data for a, you know, like helping your digital transformation strategy. But when it comes to agents, like that's when I even start thinking about data a little differently, right? Because even if it's a human, you know, operating a large language model powered system, there's still a human that kind of looks at that data.

Starting point is 00:16:33 At some point, you hope, and they're like, yeah, that's, that's correct. But when it comes to agetic AI and when these systems are going to start using our dynamic data and start executing decisions on our behalf, I think it even more so prioritizes the importance of correct data. Could you talk a little bit about what you've seen so far in your experience in that regard when it comes to having your data right specifically for agentic AI? Yeah, I mean, I'll tell you, right? Like the reasoning aspect of an agent, you know, is sort of what is very appealing about the fact that, you know, you can have a set of tasks being done on behalf of.

Starting point is 00:17:14 of a human or a machine by an agent, right? So think of agent as something that knows how to reason through a set of complex tasks to arrive at an outcome when you feed it some data. I think where things, we talk about, you know, agents behaving themselves or an agent registry or an agent orchestration, all the nuances of getting agents to operate. And by the way, this nuance of an agent is going to arrive within your world, you know, in a single fashion is sort of not true, right? You know, when you orchestrate an agent and when you operate an agent from one agent to the other, you will transcend, you know, softwares or vendors or platforms or data, right?

Starting point is 00:17:53 So what you have to get right in essence is that the attribution of the data set that feeds that agent, you know, needs to be annotated correctly for you to be able to get that agent to sort of behave within the guardrails or boundaries of what you're accepting the answer to be. And the nuances of that is realized when you start to train the agent to start to do things and you realize that, you know, it is doing something that is not deterministic and it's doing things sort of that are, you know, not expected. And the reason why that is transpiring is because the attribution of the data that feeds that agent is sort of doing or feeding it things that's leading it to sort of, you know, an unexpected answer, right? That's the best way I can put it or not what you would have expected. So I think that if you start to look at attribution for the purposes of agentic or if you look at attribution for the purposes of labeling for agentic, we'll pretty soon come to the conclusion that that is sort of one of the biggest drivers for why agents orchestration or registration or interoperability of agents become such an important component, which is why

Starting point is 00:19:02 protocols like open standard protocols for agent to agent is a big topic of conversation, And, you know, no matter where you go these days. Yeah. And yeah, talking about these different, you know, protocols. And maybe if you could explain a little bit for our less technical audience, kind of like what you said there is, you know, labeling data for agenic AI. Like, is it different, right? And how should, you know, especially those larger organizations that do have the resources,

Starting point is 00:19:32 how should they be treating their data differently if it's ultimately going to be going through a large language model type application with a human operating it versus an agentic operation. What is that, you know, what are the main differences, if any, for handling that data for agentic use? Yeah, no, the process of how it goes through sort of the curation in one technology versus the other is the nuance of, you know, whether you use an LLM or you use an LML-centric agent or not, right? But the nuances of labeling is very evident even in structured data, what you do today, right? So if you didn't have the right cataloging or the business metadata or business glossary, right, usage today is a problem as well. I mean, when I talk to most organizations,

Starting point is 00:20:14 they talk about how they haven't conquered their structured data challenges. And they're not talking about sort of, they're talking about process-centric software and instantiating data that needs to be labored for usage, right? So if you look at sort of the world of how data is created within the four walls of most organizations today, you run a process-centric software. That's SAP, that's service-sand-s, on the sales force, so on and so forth. And the process instantiates data, right? Once the process instantiates data, somebody needs to annotate or label that data for business context,

Starting point is 00:20:44 technical context, so that the usage, the persona that uses it, whether it's a business person that develops a report on the back of it or data engineer that builds a data pipeline on the back of that data, knows sort of what its intent is and starts to know the boundaries of usage of the data, right? That is a fundamental challenge,

Starting point is 00:21:02 irrespective of agent or LLM. The problem is magnified because in tomorrow's world or an agent world, that data is not originating within the world. So guess what the burden of proof lies. It lies with the people that use something that is not happening within their four walls. Now you're talking about labeling, annotation, business class free, technical catalog to be built for those datasets. Imagine if it was hard to do it for your own data. Imagine how impossible is it to do it for? something that happens outside of your four walls.

Starting point is 00:21:36 Yeah, that's an interesting way to think about it is, you know, the data that is, you know, originates within those four walls and those four walls are, you know, yeah, how can you even define them, especially when we talk about, you know, multi-agentic orchestration. And, you know, if you have different agents going out there and creating new data points on their own, but there's maybe not, you know, maybe they're not working directly with a human in that regard and if it's this multi-agentic setup. Yeah, how can business leaders even start to think or plan for being able to collect that data that's way beyond those traditional four walls?

Starting point is 00:22:16 Yeah, which is where registration of agents and so the guardrails of how they behave against that registration and what invokes an agent and how do you register an agent and how you orchestrate an agent. I think we're still seeing the beginnings of that, right? Anybody that is claiming that they've done this at scale and it works seamlessly, you know, we don't buy it, right? Because, you know, we do our own experimentation and we realize how hard it is, right? And we're just getting started on multi-agent orchestration, you know, even before multi-agent.

Starting point is 00:22:45 We're just getting started on single agents, you know, sort of doing the intended outcome before we talk about agent to agent and handing off to other agents, right? Like that is still, that is still something that we need to conquer, right? I don't believe that, you know, that journey has come to its logical conclusion. I think we're just getting started. Yeah. And, you know, I think when I think about AI success and, you know, the companies that are doing it versus the companies that are, you know, maybe further behind, I think Deloitte obviously has been at the forefront, right? Like working with some of the largest organizations in the world on their AI strategy. what would you say if we rewind and we look at Deloitte as a case study?

Starting point is 00:23:28 What are some of those things that even internally that really helped propel your own AI success as an organization specifically when it came to your data strategy? You know, we sort of recognized early on that this was not something that we could wait and watch for it to get to a particular phase or stage. You turn around and say that's when we'll depart those in the water, right? Like we figured that, you know, this would be. done to us if we didn't do it to ourselves, right? There was a level of awareness about what it was doing to the value chain of our clients that it needed sort of our intervention a lot earlier than,

Starting point is 00:24:03 you know, we typically, you know, would have thought of it about, right? So the best way for me to describe it is if you look at biofarmor or if you look at biotech, right, the evolution of disease pathology from pharmacology to gene editing because somebody sequenced 210 proteins and you You can tell what disease structure does to that. So hence, you know, gene editing is the way to treat disease pathology and not a bunch of biometrics where you go for blood tests and somebody says, oh, you know, your sodium is off or your potassium is off. Hence, you know, disease pathologies, this or that, right?

Starting point is 00:24:38 In reality, if you look at sort of what that does to life sciences where, you know, disease pathology is now very different, right? Or going to be very different. Drug discovery is going to be very different. manufacturing clinical trials, supply chain is going to be very different, is why we are in this journey. I mean, we realize that the same aspect of what AI is doing to the value chain of pharma or health care or, you know, autonomous cars, you take the example, or retail, or, you know, there is no industry or vertical or sector that it is not going to touch in the shorter long term.

Starting point is 00:25:14 The question becomes, if we don't participate in this, the portfolio of services that make us relevant today will make us irrelevant tomorrow because we didn't arrive at the time that AI arrived in the value chain. So we did it to ourselves knowing fully well that the portfolio of services that we need to build, the best way for me to describe it is the menu when you walk into a hotel or a restaurant of your choice and, you know, the menu doesn't evolve over a period time. You stop going to the restaurant. So our menu needs to evolve in conjunction with the evolution of what's happening to these industries or sectors and the clients that we serve. And that was the reason for embracing it from the get-go.

Starting point is 00:25:52 Yeah. So, Ashish, we've covered a lot in today's conversation, but as we wrap up, what would you say is the one most important takeaway that you have for our listeners when it comes to the importance of their data strategy powering their AI success? I mean, what I would say is, you know, walk with the end in mind, right? Like, you know, if you sort of understand the outcome that you need to intend to do with your data, right? Like, that is your North Star, right?

Starting point is 00:26:18 everything else that you do should be in the service of that, right? So, for example, if your ambition is to be agentic or if your ambition is to be, you know, agentic plus, you know, whatever the permutation or choice of tool that you use or consumption pattern, right? It means you've only used the data to consume it in a certain way, whether it's for reasoning, whether it's for LLM, whether it's for conforms, SQL, whatever it may be, AIML, you pretty much have to build your data strategy

Starting point is 00:26:44 anticipating that that is sort of, you know, the capabilities that you need to have, not when the use case arrives, not when your business partner arise, but in anticipation of the fact that it is, you know, what I call Horizon 2, right, not even Horizon 3, you know. And most of these problems, when I classified in my mind, they don't look like Horizon 2. They actually look like today's problems, right? And for us to be able to sort of be relevant to our business partners, we needed to have a data strategy that would serve the interest and needs of how we procure data, what data do we procure, how do we annotate it, how do we label it, how do we get into a computer environment.

Starting point is 00:27:24 It was such great advice and really helping us lay the roadmap out because everyone's worried and wondering about data and their strategies. So, uh, sheesh, thank you so much for taking time out of your day to join the everyday AI show. We really appreciate it. Thank you, Jordan. All right. Now as a reminder, y'all, if you miss something you said there, because there is a lot of great value. Don't worry, we're going to be recapping it all in our newsletter. So make sure, if you haven't already, go to your EverydayAI.com. If this is helpful, tell someone about it.

Starting point is 00:27:54 If you're listening on the podcast, please make sure to follow the show and subscribe. Thank you for you back tomorrow and every day for more Everyday AI. Thanks, y'all. Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest. illustrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface.

Starting point is 00:28:27 You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. little more AI magic. Visit your everyday AI.com and sign up to our daily newsletter so you don't get

Starting point is 00:29:00 left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 595: Data First: The Strategic Playbook for AI Success

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.