Everyday AI Podcast – An AI and ChatGPT Podcast - EP 550: How a Transformative Data Strategy Powers AI Success
Episode Date: June 19, 2025You think using AI is your moat? Nope. Just using LLMs isn't enough to power your company's AI success. But do you know the real fuel? Having your data right is the ACTUAL key. So how do y...ou do it? And how does your company's data strategy change with agentic AI? Find out from Deloitte's US Chief Data Analytics Officer, Ashish Verma.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Have a question? Join the convo here.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Transformative Data Strategy for AI SuccessImportance of Data Strategy in AIDeloitte's Data Marketplace ApproachMulti-Agent Orchestration ChallengesStructured vs. Unstructured Data in AISynthetic Data and AI TransformationAgentic AI and Data Labeling EssentialsAI's Impact on Business Value ChainTimestamps:00:00 "AI Success Requires Data Strategy"05:27 Data Integration and Utilization Insights10:31 Contextual Data Marketplace Evolution13:06 Structuring Unstructured AI Insights17:02 Agent Reasoning and Orchestration Insights20:37 Data Annotation Challenges23:39 AI's Impact on Industry Evolution26:09 "Data Strategy: Begin with the End"Keywords:transformative data strategy, AI success, generative AI, non-technical people, data teams, data strategy, business leaders, companies, careers, unedited podcast, livestream, Deloitte, US chief data and analytics officer, data analytics, GenAI, data experiments, third-party data, synthetic data, data marketplace, data concierge, chief data officer, compute environment, deterministic, probabilistic, AI transformation, digital transformation, data minder, CFO, CMO, public domain data, business partner data, metadata, business glossary, technical catalog, agentic AI, multi-agent orchestration, agent registry, agent orchestration, open standard protocols, economic AI, digital transformation strategy, data advantages, structured data, unstructured data, hybrid data, PowerPoint, staffing optimization, resource management, query engine, relevance-ranked search, annotation, data regulation, governance, data procurement, data curation, data feeds, data platforms, information indexing, future predictions.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info)
Transcript
Discussion (0)
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
In the rush for AI success, it's really easy to overlook probably one of the more important things.
And that's your data strategy.
As generative AI has become more and more accessible to non-technical people, people that don't have, you know, huge data teams or maybe experience on data strategy, it can be pretty easy.
to overlook what is probably the biggest step.
And that's why I'm excited for today's conversation
on how a transformative data strategy
can power your AI success.
All right, thank you for tuning in
and welcome to Everyday AI.
What's going on y'all?
My name's Jordan Wilson and I'm the host of Everyday AI
and this is your daily live stream podcast
and free daily newsletter helping everyday business leaders
like you and me, not just keep up
with what's happening in the world of AI,
but how we can use it to get ahead,
to grow our companies and our careers.
So if that sounds like what you're doing, you are in the right place.
It starts here with our unedited, unscripted, live streaming podcast.
But where you actually are going to go and put this into practice is on our website.
So please, if you haven't already, go to your everyday AI.com, sign up for that free daily
newsletter.
There, we're going to be recapping the highlights from today's conversation, which I'm excited about.
But also in the newsletter, you're going to see everything else that's happening in the world
of AI.
Put simply for you to know.
and take advantage of and so you can be the smartest person in AI at your company or in your department.
So please make sure to go check that out.
The AI news is going to be in there as well.
So without further ado, let's go ahead and bring on our guest for today.
I'm excited to have him.
So live stream audience, please help me welcome to the show.
We have Ashish Verma, the U.S. chief data and analytics officer at Deloitte.
Ashish, thank you so much for joining the Everyday AI show.
Jordan, thank you for having.
me. All right. Great conversation. Yeah, I'm excited for it. So first, I'm sure everyone, or almost everyone, is aware of Deloitte. But, you know, could you just tell us a little bit about what you do in your role there? Yeah, absolutely. So in my role as chief data analytics officer, you know, there are a few mandates that I have for our journey into sort of the world of AI and agentic and, and of course, Gen. I, right? Not in that order of fashion, but, you know, whatever the flavor of the day, as you can imagine, right?
Data is sort of the underpinnings of all of these experiments that we do, right?
Some of them are for ourselves and some of them are for our clients, but nonetheless, right?
Like if you start to look at sort of all of the data that we need that fuels this experiment,
you know, we pretty soon began to realize that, you know, it just not was our data that we needed
to sort of do this at scale.
It was our data.
It was third party data.
It was a business partner data.
It was synthetic data and so on and so forth.
As we talked through, you know, the process to procure that data.
to standardize data, to make it available in the data marketplace for people to be able to
interact with it, the data concierge function.
That entire mandate sort of rolls up to the office as the CTO.
So my mandate in essence is to make sure that if we're going to experiment with
AI or agents or algorithms, that our ambition is commensurate with our data strategy
and that we have the right data with the right compute environment to make it happen.
You hit all of my favorite keywords there, agents, algorithms,
data, strategy, this is going to be a fun conversation.
But let's just kind of skip ahead to the end here.
And then maybe we'll rewind a little bit of sheesh,
but why is data so incredibly important
when it comes to digital and AI transformation?
Why does it start there?
You know, if you were to look at the underpinnings
of sort of like the end outcome, right,
of any of these, whether it's an agent or it's an algorithm, right?
you would start to realize that, you know, data is what feeds it, right?
Data is what drives the outcome, right?
Now, whether it's deterministic or probabilistic, you know, we can get into sort of the nuances of,
you know, today's, you know, agent-centric coding platforms and reasoning versus sort of, like,
you know, how we quoted in the past, but nonetheless, right?
You have to use data for the underpinnings of the attribution of sort of training these
models or training these agents or training these algorithms.
And pretty soon you realize that you don't have to be.
have enough of that within the four walls of your organization, right? There is no buddy in the world
today that can sort of point to their data strategy from, you know, a year ago or two years
ago where they said like, you know, as long as I got my house in order, my internal data that
met sort of the mandate or what I could do for my business partners, right, whether that
business partners, your CFO or your CMO or whoever, right? In essence, they were wanting to
make sure that you had the hygiene right. And in essence, you could, you know, procure for them
a compute environment for whatever they intended to do.
At best, it was conformed SQL or ad hoc querying or a report or a dashboard or some flavor
of that sort.
Now, when you extrapolate to where we are today and you start to see sort of what you need,
right, you never have enough of what you need, you know, within the four walls.
And, you know, what you're attempting to do, the reasoning or the algorithm or the agent is
forcing you to sort of not just interface with your data, but also your data and somebody else's
that and somebody else being public domain, right, depending upon sort of what you're doing,
or synthetic data depending upon what you're doing, or a business partner's data depending
upon what you're doing. So the sort of the use case determines which path you take.
But irrespective of the use case, you pretty soon realize that it's just not your data.
It's your data. It's second party data, which is the data with you and your business partners.
It's third party data that you procure. We at Deloitte procure hundreds of million dollars
worth of third-party datasets from, you know, from every other data broker that you can
conceive in the world.
And of course, longitudinal data sets that you can sort of assemble that you have to do
through the synthetic data route.
And I do actually want to get back to the synthetic data because that's something I'm
curious about.
But it's interesting because I think that the landscape has changed a lot, right?
Specifically with the kind of introduction of generative AI over the last five or so years.
But before that, I think that, you know, certain enterprises, they could have a moat just in the
technology, right? You know, if if you had, you know, big data rooms or, you know, AI and ML teams for,
you know, a couple of decades, like a lot of larger enterprises have, you know, that could be a
huge competitive advantage. But now the barrier of entry has gone down significantly. So, you know,
I'm curious both for your own firsthand experiences and with the worldwide clients that I know
Deloitte is working with, how important is data specifically even more important than even the
technology? Because anyone can go out and use these agents, anyone can go out and use the state
of the art, you know, large language models. Is data actually the differentiator now?
Yeah, it absolutely is. And, you know, for those of you that have sort of done this or in the middle
of this, you know, this is going to start to resonate, right? Like when you realize that you can't
sort of get, you know, when people talk about hallucination, right, they think it's, you know,
something is fundamentally gone wrong. And I tell them it's a feature set, right? Because in any
probabilistic model, like some aspect of, you know, getting to the answer is sort of predicting
the outcome, right? So in your attribution of your data set and the labeling of your data set is
what makes the hygiene and or the outcome possible, right? So if you skip the part of
the annotation or the labeling and you sort of don't understand the policy or user's engine around
these data sets, you pretty soon come to the conclusion that your ambition is not commensurate
because your data doesn't support your ambition. And that is sort of where most chief data
officers begin to struggle to figure out sort of how do they accelerate this. And the acceleration
part comes back to sort of where we started this conversation, right? What is your data strategy?
What are the key pillars of your data strategy?
irrespective of whether we spoke about procurement of the data set or the ambition of that,
you know, data set as a result of whatever you're attempting to procure.
And I love that.
Your data doesn't support your ambition.
I think that's an important one for our listeners to hear.
But, you know, could you maybe talk a little bit about some common threads that you all
have seen at Deloitte when it comes to, you know, companies trying to deliver AI at scale?
What are the things on the data side that you keep seeing big companies get right?
And what are the things that you see them keep getting wrong?
I think the first thing that I think is paramount to sort of getting this is what I call the data marketplace.
So we've been running the equivalent of an Amazon marketplace for data for the better part of about 2 and a half years now.
And think of it as a single landing spot, which basically is how you enter the universe to figure out,
what data we have. We have roughly about 520 data feeds at this given point in time. Think of those
520 covering all permutations, public domain, the Lloyd Internal, synthetic, third party, so on, so forth.
And the reason why that data marketplace is very important in essence is that is sort of where
we understand the use case consumption criteria or usage criteria that sort of formulates a procurement
strategy, right? If we didn't have the data marketplace, it was very, very difficult to interact with
are business user world.
I mean, there's 450,000 people at Lloyd,
455,000, 178,000 in the US, right?
So when 178,000 people come knocking to figure out what data you have,
what policy engine on that data you need it,
and what can it feed and what it cannot feed,
what the terms and conditions are,
I don't think that you can have a human middleware in the equation,
concierging that dataset one user at a time.
So I think the biggest thing that I get asked about is, you know,
what led to a data marketplace and how does that data marketplace become contextual to people's ambition,
right? Like so, you know, today we run a data marketplace that is sort of on its way to become
contextual. So almost like, hey, let me tell you what I have based on you telling me what do
you need to do. Right. So the data interacts with sort of your behavior and use case to sort of lead you
down the path of the right data set with the right policy engine and the compute environment,
as opposed to deterministic search, which is sort of what you.
the old world was, right? You sort of showed up to the door doorstep and you said, look, I wanted to
conform SQL or I want to pivot this or I want to build a dashboard. Give me so much of this and so
much of that. And then, you know, off I go and I, you know, I curate the data pipeline and I build
the end result, right? No longer true, right? Because it's not, it's multi-vetted data sets.
It's not just your data. It's your data and external data and third-party data and synthetic data.
And it's not a single compute environment, depending upon what you're attempting to do.
I've got to give you CPUs. I thought to give you GPU.
I got a CCTPUs and, you know, some tooling on top of it above the compute for you to get to the answer.
So where they people sort of pretty soon start to realize that the data concierge, the data marketplace, the compute environment, and the ambition all start to need to correlate to something that is sort of on the roadmap of a CIO or a CDO to put into place, right?
Or else you're doing this fairly sporadically.
It's, you know, and you're reacting to sort of what people need as opposed to what.
you need to have for the ambition to be true.
One thing I'm always thinking about is there's obviously different sectors in the business
world that naturally have access to more quantifiable data, right?
But then for those that maybe don't have as much, right?
They don't have as much structured data, but they have a lot of unstructured information,
right, that helps their company move forward.
how should those types of organizations be looking at their data?
Like, is there a way that they can, you know, really corral maybe more of the unstructured data
to really help propel their transformation forward?
Yeah.
I mean, like I talk about, you know, you will also sort of, you know, come to another conclusion
when you start this journey for agentic and AI, right?
Most of it is unstructured before it really is structured, right?
like so, you know, documents, PowerPoints, right?
Like the things that you pretty much didn't, you know, go mine before is sort of like, you know,
the secret sauce for, you know, how you lend it, you know, conformity for your ambition.
I'll give you the example, right?
In our world, something as simple as, you know, staffing people through a resource management function
is pretty much making sure that you can sort of tie the role description to the right resume,
So when you show up to an engagement, right, and whether we sold an engagement to migrate something to the cloud or we build it, you know, we're building an agent in Salesforce or we're doing an SAP transformation, you need to have a particular skill set, right? That means you've done this before in a particular industry. You're certified in the technology. That's how a resource manager sort of matches you and your experience to the role. And every resume is either in a word document or a PowerPoint. There is no humanly possible way for a resource manager to reach $45,000 or $150,000 or $1,000.
177,000 resumes to find you the right role.
So what they do is they do a keyword search, right?
Partly because the resume database is not contextualized or indexed for you to be able to do
sort of contextual search, like you are used to when you get into the interface of a Google
and the UI, Ux prompt, you type in English what you need and you see relevant ranked search
results, right?
But what actually happened is Google parsed the entire World Wide Web, parked it in a content
store, indexed that data set, and gave you.
of contextuality through query to be able to figure out rank and relevance for you to get to the answer.
We did the same thing with a resume database, right?
We contextualized it.
We indexed it.
We gave it a query engine.
Now it's as simple as sort of doing on the UIUX prompt, like a role description.
It shows up in near real time with the resource and whether they're staffed or not staffed.
So my resume information and my staffing information are collated for the answer that you need.
that took a resource manager or several resource managers to do one resource, one role at a time.
And I think that's a great use case, an example that a lot of people can relate to.
So I want to ask you a little bit here about agentic AI, but before we do, real quick, a quick break from our sponsors.
This podcast is supported by Google.
Hey, everyone, David here, one of the product leads for Google Gemini.
Check out VO3, our state-of-the-art AI.
video generation model in the Gemini app, which lets you create high quality eight second
videos with native audio generation. Try it with a Google AI pro plan or get the highest access
with the ultra plan. Sign up at Gemini.com to get started and show us what you create.
All right. So we talked a lot about the importance of data for a, you know,
helping your digital transformation strategy. But when it comes to agents, like that's when I
even start thinking about data a little differently, right? Because even if it's a human, you know,
operating a large language model powered system, there's still a human that kind of looks at that
data at some point you hope and they're like, yeah, that's, that's correct. But when it comes to
agetic AI and when these systems are going to start using our dynamic data and start
executing decisions on our behalf, I think it even more so prioritizes the importance of correct
data. Could you talk a little bit about, you know, what you've seen so far in your experience
in that regard when it comes to having your data right specifically for agentic AI?
Yeah. I mean, I'll tell you, right? Like the reasoning aspect of an agent, you know,
is sort of what is very appealing about the fact that, you know, you can have a set of tasks being
done on behalf of a human or a machine by an agent, right? So think of agent as, you know,
something that knows how to reason through a set of complex tasks to arrive at an outcome when you feed it some data.
I think where things we talk about, you know, agents behaving themselves or an agent registry or an
agent orchestration, all the nuances of getting agents to operate. And by the way, this nuance of an agent
is going to arrive within your world, you know, in a single fashion is sort of not true, right?
You know, when you orchestrate an agent and when you operate an agent from one agent to the other, you will transcend, you know, softwares or vendors or platforms or data, right?
So what you have to get right in essence is that the attribution of the data set that feeds that agent, you know, needs to be annotated correctly for you to be able to get that agent to sort of behave within the guardrails or boundaries of what you're accepting the answer to be.
And the nuances of that is realized when you start to train the agent to start to do things and you realize that, you know, it is doing something that is not deterministic and it's doing things sort of that are, you know, not expected.
And the reason why that is transpiring is because the attribution of the data that feeds that agent is sort of doing or feeding it things that's needing it to sort of, you know, an unexpected answer, right?
That's the best way I can put it or not what you would have expected.
So I think that if you start to look at attribution for the purposes of agentic
or if you look at attribution for the purposes of labeling for agentic,
we'll pretty soon come to the conclusion that that is sort of one of the biggest drivers
for why agents orchestration or registration or interoperability of agents become such an important component,
which is by protocols like, you know, open standard protocols for agent to agent is a big topic of conversation
And, you know, no matter where you go these days.
Yeah.
And yeah, talking about these different, you know, protocols.
And maybe if you could explain a little bit for our less technical audience,
kind of like what you said there is, you know, labeling data for agenic AI.
Like, is it different, right?
And how should, you know, especially those larger organizations that do have the resources,
how should they be treating their data differently if it's ultimately going to be going
through a large language model type application with a human operating it versus an agentic
operation.
What is that, you know, what are the main differences, if any, for handling that data for
agentic use?
Yeah, no, the process of how it goes through sort of the curation in one technology
versus the other is the nuance of, you know, whether you use an LLM or you use an LML-centric
agent or not, right?
But the nuances of labeling is very evident even in structured data what you do today, right?
So if you didn't have the right cataloging or the business metadata or business glossary, right?
Usage today is a problem as well.
I mean, when I talk to most organizations, they talk about how they haven't conquered their structured data challenges.
And they're not talking about sort of, they're talking about process-centric software and instantiating data that needs to be labored for usage, right?
So if you look at sort of the world of how data is created within the four walls of most organizations today, you run a process-centric software.
That's SAP, that's service-now, that's Salesforce.
so on and so forth. And the process instantiates data. Once the process instantiates data,
somebody needs to annotate or label that data for business context, technical context,
so that the usage, the persona that uses it, whether it's a business person that develops a report
on the back of it or data engineer that builds a data pipeline on the back of that data,
knows sort of what its intent is and starts to know the boundaries of usage of the data, right?
That is a fundamental challenge irrespective of agent or LLM. The problem is
magnified because in tomorrow's world or an agent world, that data is not originating within
the world of the organization. So guess what the burden of proof lies. It lies to the people that
use something that is not happening within their four walls. Now you're talking about labeling,
annotation, business class fee, technical catalog to be built for those datasets. Imagine if it was
hard to do it for your own data. Imagine how impossible is it to do it for something that
happens outside of your full walls. Yeah, that's an interesting way to think about it is, you know,
the data that is, you know, originates within those four walls and those four walls are, you know,
yeah, how can you even define them, especially when we talk about, you know, multi-agentic orchestration?
And, you know, if you have different agents going out there and creating new data points on their
own, but there's maybe not, you know, maybe they're not working directly with a human in that
regard. And if it's this multi-agentic setup, yeah, how like how, how, how, how,
can, you know, business leaders even start to, you know, think or plan for being able to collect
that data that's way beyond those traditional four walls? Yeah, which is where registration
of agents and so the guardrails of how they behave against that registration and you, you know,
what invokes an agent and how do you register an agent and how you orchestrate an agent,
I think we're still seeing the beginnings of that, right? Anybody that is claiming that they've done
the set scale and it works seamlessly, you know, we don't buy it, right? Because, you know,
we do our experimentation and we realize how hard it is, right? And we're just getting started
on multi-agent orchestration, you know, even before multi-agent. We're just getting started on
single agents, you know, sort of doing the intended outcome before we talk about
agent to agent and handing off to other agents, right? Like, that is still, that is still something
that we need to conquer, right? I don't believe that, you know, that journey has come to its
logical conclusion. I think we're just getting started. Yeah.
And, you know, I think when I think about AI success and, you know, the companies that are doing it versus the companies that are, you know, maybe further behind, I think Deloitte obviously has been at the forefront, right?
Like working with some of the largest organizations in the world on their AI strategy.
What would you say if we rewind and we look at Deloitte, right, as a case study?
What are some of those things that even internally that really helped propel your own AI success as an organization,
specifically when it came to your data strategy?
You know, we sort of recognized early on that this was not something that we could wait and watch for it to get to a particular phase or stage.
You turn around and say, that's when we'll depart those in the water, right?
Like we figured that, you know, this would be done to us if we didn't do it to ourselves, right?
there was a level of awareness about what it was doing to the value chain of our clients
that it needed sort of our intervention a lot earlier than, you know, we typically, you know,
would have thought of it about, right? So the best way for me to describe it is if you look at
biofarmor or if you look at biotech, right, the evolution of disease pathology from
pharmacology to gene editing because somebody sequenced 210 proteins and you can you can tell
what disease structure does to that.
So hence, you know, gene editing is the way to treat disease pathology
and not a bunch of biometrics where you go for blood tests
and somebody says, oh, you know, your sodium is off or your potassium is off.
Hence, you know, disease pathologies, this or that, right?
In reality, if you look at sort of what that does to life sciences
where, you know, disease pathology is now very different, right?
Or going to be very different.
Drug discovery is going to be very different.
Manufacturing clinical trials, supply chain is going to be very different.
is why we are in this journey.
I mean, we realize that the same aspect of what AI is doing
to the value chain of pharma or healthcare
or, you know, autonomous cars.
You take the example, or retail, or, you know,
there is no industry or vertical or sector
that it is not going to touch in the short or long term.
The question becomes, if we don't participate in this,
the portfolio of services that make us relevant today
will make us irrelevant tomorrow
because we didn't arrive at the time that AI arrived in the value chain.
So we did it to ourselves, knowing fully well, that the portfolio of services that we need to build,
the best way for me to describe it is the menu when you walk into a hotel or a restaurant of your choice
and, you know, the menu doesn't evolve over a period of time.
You stop going to the restaurant.
So our menu needs to evolve in conjunction with the evolution of what's happening to these industries
of sectors and the clients that we serve.
And that was the reason for embracing it from the get-go.
Yeah. So Ashish, we've covered a lot in today's conversation, but as we wrap up, what would you say is the one most important takeaway that you have for our listeners when it comes to the importance of their data strategy powering their AI success?
I mean, what I would say is, you know, walk with the end in mind, right? Like, you know, if you sort of understand the outcome that you need to intend to do with your data, right? Like, that is your North Star, right? Everything else that you do should be in the service of that.
that, right? So, for example, if your ambition is to be agentic or if your ambition is to be,
you know, agentic plus, you know, whatever the permutation or choice of tool that you use or
consumption pattern, right, means you've only used the data to consume it in a certain way,
whether it's for reasoning, whether it's for LLM, whether it's for conform SQL, whatever it may be,
AIML, you pretty much have to build your data strategy, anticipating that that is sort of, you know,
the capabilities that you need to have, not when the use case.
arrives, not when your business partner
arise, but in anticipation of the fact that
it is what I call
Horizon 2, not even Horizon 3,
and most of these problems
when I classified in my mind, they don't look like
Horizon 2. They actually look like today's problems,
right? And for us
to be able to sort of be relevant
to our business partners,
we needed to have a data strategy
that would serve the interest and needs of
how we procure data, what data do we procure,
how do we annotate it, how do we
how do we get into computer environment.
It was such great advice and really helping us lay the roadmap out because everyone's worried
and wondering about data and their strategies.
So sheesh, thank you so much for taking time out of your day to join the Everyday AI show.
We really appreciate it.
Thank you, Jordan.
All right.
Now as a reminder, y'all, if you missed something you said there because there is a lot of
great value, don't worry.
We're going to be recapping it all in our newsletter.
So make sure, if you haven't already, go.
to your EverydayAI.com.
If this is helpful, tell us someone about it.
If you're listening on the podcast, please make sure to follow the show and subscribe.
Thank you for tuning in.
Hope to see you back tomorrow and every day for more Everyday AI.
Thanks y'all.
Meet Firefly AI assistant.
Now live in Adobe Firefly, the Allman One Creative AI Studio.
Just describe what you want to create in your own words and the assistant handles the rest,
orchestrating multi-step workflows across Adobe Creative Cloud apps,
including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
Stay in control with the ability to step in and refine at any time.
See it today at firefly.adobie.com.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
For a little more AI magic, visit Your EverydayAI.com
and sign up to our daily newsletter so you don't get left behind.
Go break some barriers and we'll see you next time.
