Everyday AI Podcast – An AI and ChatGPT Podcast - EP 595: Data First: The Strategic Playbook for AI Success
Episode Date: August 22, 2025You think using AI is your moat? Nope. Just using LLMs isn't enough to power your company's AI success. But do you know the real fuel? Having your data right is the ACTUAL key. So how do y...ou do it? And how does your company's data strategy change with agentic AI? Find out from Deloitte's US Chief Data Analytics Officer, Ashish Verma.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Have a question? Join the convo here.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Transformative Data Strategy for AI SuccessImportance of Data Strategy in AIDeloitte's Data Marketplace ApproachMulti-Agent Orchestration ChallengesStructured vs. Unstructured Data in AISynthetic Data and AI TransformationAgentic AI and Data Labeling EssentialsAI's Impact on Business Value ChainTimestamps:00:00 "AI Success Requires Data Strategy"05:27 Data Integration and Utilization Insights10:31 Contextual Data Marketplace Evolution13:06 Structuring Unstructured AI Insights17:02 Agent Reasoning and Orchestration Insights20:37 Data Annotation Challenges23:39 AI's Impact on Industry Evolution26:09 "Data Strategy: Begin with the End"Keywords:transformative data strategy, AI success, generative AI, non-technical people, data teams, data strategy, business leaders, companies, careers, unedited podcast, livestream, Deloitte, US chief data and analytics officer, data analytics, GenAI, data experiments, third-party data, synthetic data, data marketplace, data concierge, chief data officer, compute environment, deterministic, probabilistic, AI transformation, digital transformation, data minder, CFO, CMO, public domain data, business partner data, metadata, business glossary, technical catalog, agentic AI, multi-agent orchestration, agent registry, agent orchestration, open standard protocols, economic AI, digital transformation strategy, data advantages, structured data, unstructured data, hybrid data, PowerPoint, staffing optimization, resource management, query engine, relevance-ranked search, annotation, data regulation, governance, data procurement, data curation, data feeds, data platforms, information indexing, future predictions.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info)
Transcript
Discussion (0)
This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
In the rush for AI success, it's really easy to overlook probably one of the more important things.
And that's your data strategy.
As generative AI has become more and more accessible to non-technical people, people that don't have, you know, huge data teams or maybe experience on data strategy, it can be pretty easy.
to overlook what is probably the biggest step.
And that's why I'm excited for today's conversation
on how a transformative data strategy
can power your AI success.
All right, thank you for tuning in
and welcome to Everyday AI.
What's going on y'all?
My name's Jordan Wilson and I'm the host of Everyday AI
and this is your daily live stream podcast
and free daily newsletter helping everyday business leaders
like you and me, not just keep up
with what's happening in the world of AI,
but how we can use it to get ahead,
to grow our companies in our careers.
So if that sounds like what you're doing, you are in the right place.
It starts here with our unedited, unscripted, live streaming podcast.
But where you actually are going to go and put this into practice is on our website.
So please, if you haven't already, go to your everyday AI.com, sign up for that free daily
newsletter.
There, we're going to be recapping the highlights from today's conversation, which I'm excited about.
But also in the newsletter, you're going to see everything else that's happening in the world of AI.
Put simply for you to know.
and take advantage of and so you can be the smartest person in AI at your company or in your
department. So please make sure to go check that out. The AI news is going to be in there as well.
So without further ado, let's go ahead and bring on our guest for today. I'm excited to have him.
So live stream audience, please help me welcome to the show. We have Ashish Verma, the U.S.
chief data and analytics officer at Deloitte.
Ashish, thank you so much for joining the Everyday AI show.
Jordan, thank you for having.
me. That's a great conversation.
Yeah, I'm excited for it.
So first, I'm sure everyone or almost everyone is aware of Deloitte.
But could you just tell us a little bit about what you do in your role there?
Yeah, absolutely.
So in my role as chief data analytics officer, you know, there are a few mandates that I have for our journey into sort of the world of AI and agentic and of course, Gen.
And of course, Gen.
I'm not in that order of fashion, but, you know, whatever the flavor of the day, as you can imagine, right?
data is sort of the underpinnings of all of these experiments that we do, right?
Some of them are for ourselves and some of them are for our clients, but nonetheless, right?
Like if you start to look at sort of all of the data that we need that fuels this experiment,
you know, we pretty soon began to realize that, you know, it just not was our data that we needed
to sort of do this at scale.
It was our data.
It was third party data.
It was a business partner data.
It was synthetic data.
And so on and so forth, as we talked through, you know, the process to procure that data.
to standardize data, to make it available in the data marketplace for people to be able to
interact with it, the data concierge function. That entire mandate sort of rolls up to the office
as the CEO. So my mandate in essence is to make sure that if you're going to experiment with
with AI or agents or algorithms, that our ambition is commensurate with our data strategy
and that we have the right data with the right compute environment to make it happen.
You hit all of my favorite keywords there, agents, algorithms, data,
strategy. This is going to be a fun conversation. But, you know, let's just kind of skip ahead to the
end here. And then maybe we'll rewind a little bit of sheesh, but, you know, why is data so incredibly
important when it comes to digital and AI transformation? Why does it start there?
You know, if you look at the underpinnings of sort of like the end outcome, right, of any of
these, whether it's an agent or it's an algorithm, right? You would start to realize that, you know,
data is what feeds it, right?
Data is what drives the outcome, right?
Now, whether it's deterministic or probabilistic,
we can get into sort of the nuances of,
today's agent-centric coding platforms and reasoning
versus sort of like, you know, how we quoted in the past.
But nonetheless, right?
You have to use data for the underpinnings
of the attribution of sort of training these models
or training these agents or training these algorithms.
And pretty soon you realize that you don't have enough of that
within the four walls of your organization, right?
There is no buddy in the world today that can sort of point to their data strategy from,
you know, a year ago or two years ago where they said like, you know,
as long as I got my house in order, my internal data that met sort of the mandate
of what I could do for my business partners, right, whether their business partners,
your CFO or your CMO or whoever, right?
In essence, they were wanting to make sure that you had the hygiene, right?
And in essence, you could, you know, procure for them a compute environment for whatever
what they intended to do. At best, it was conformed SQL or ad hoc querying or a report or a dashboard
or some flavor of that sort. Now, when you extrapolate to where we are today and you start to see
sort of what you need, right, you don't, you never have enough of what you need, you know,
within the four walls. And, and, you know, what you're attempting to do, the reasoning or the algorithm
or the agent is forcing you to sort of not just interface with your data, but also data,
your data and somebody else's data and somebody else being public domain, right,
depending upon sort of what you're doing or synthetic data depending upon what you're doing,
or a business partner's data depending upon what you're doing.
So the sort of the use case determines which path you take.
But irrespective of the use case, you pretty soon realize that it's just not your data.
It's your data.
It's second party data, which is the data with you and your business partners.
It's third party data that you procure.
We had Deloitte procure hundreds of million dollars worth of third party data sets from, you know,
from every other data broker that you can conceive in the world.
And of course, longitudinal data sets that you can sort of assemble
that you have to do through the synthetic data route.
And I do actually want to get back to the synthetic data
because that's something I'm curious about.
But it's interesting because I think that the landscape has changed a lot,
right?
Specifically with the kind of introduction of generative AI over the last five or so years.
But before that,
I think that, you know, certain enterprises, they could have a moat just in the technology,
right? You know, if, if you had, you know, big data rooms or, you know, AI and ML teams for,
you know, a couple of decades, like a lot of larger enterprises have, you know, that could be
a huge competitive advantage. But now the barrier of entry has gone down significantly. So, you know,
I'm curious both, you know, for your own firsthand experiences and with the, you know, worldwide
clients that I know Deloitte is working with, how important is data specifically even more important
than even the technology? Because anyone can go out and use these agents, anyone can go out and use
the state-of-the-art, you know, large language models. Is data actually the differentiator now?
Yeah, it absolutely is. And, you know, for those of you that have sort of done this or in the
middle of this, you know, this is going to start to resonate, right? Like when you realize that you can't
sort of get, you know, when people talk about hallucination, right, they think it's, you know,
something is fundamentally gone wrong. And I tell them it's a feature set, right, because in any
probabilistic model, like some aspect of, you know, getting to the answer is sort of predicting
the outcome, right? So in your attribution of your data set and the labeling of your data set is
what makes the hygiene and or the outcome possible, right? So if you skip the part of the
annotation or the labeling and you sort of don't understand the policy or users,
engine around these data sets, you pretty soon come to the conclusion that your ambition is
not commensurate because your data doesn't support your ambition. And that is sort of where
most chief data officers begin to struggle to figure out sort of how do they accelerate this.
And the acceleration part comes back to sort of where we started this conversation, right?
What is your data strategy? What are the key pillars of your data strategy, irrespective of
whether we spoke about procurement of the dataset or the ambition of that, you know,
data set as a result of whatever you're attempting to procure.
And I love that. Your data doesn't support your ambition. I think that's an important one for our
listeners to hear. But could you maybe talk a little bit about some common threads that you all
have seen at Deloitte when it comes to companies trying to deliver AI at scale? What are the
things on the data side that you keep seeing big companies get right? And what are the things,
things that you see them keep getting wrong?
I think the first thing that I think is paramount to sort of, you know, getting this is what I call
the data marketplace, right?
So we've been running the equivalent of an Amazon marketplace for data for the better part
of about two and a half years now.
And think of it as a single landing spot, which basically is how you enter the universe to
figure out what data we have.
We have roughly about 520 data feeds at this given point in time.
Think of those 520 covering all permutations, public domain,
Deloitte internal, synthetic, third party, so on and so forth.
And the reason why that data marketplace is very important in essence is that is sort of
where we understand the use case consumption criteria or usage criteria that sort of formulates
a procurement strategy, right?
If we didn't have the data marketplace, it was very, very difficult to interact with
our business user world.
I mean, there's 450,000 people at Lloyd, 455,000, 175,000, 178,000 in the
US, right? So when 178,000 people come knocking to figure out what data you have, what policy
engine on that data you need it, and what can it feed and what it cannot feed, what the terms and
conditions are, I don't think that you can have a human middleware in the equation conceding
that data set one user at a time. So I think the biggest thing that I get asked about is, you know,
what led to a data marketplace and how does a data marketplace become contextual to people's
ambition, right? Like so.
Today we run a data marketplace that is sort of on its way to become contextual.
So almost like, hey, let me tell you what I have based on you telling me what do you need
to do.
So the data interacts with sort of your behavior and use case to sort of lead you down the path
of the right data set with the right policy engine and the compute environment, as opposed
to deterministic search, which is sort of what the old world was, right?
You sort of showed up to the door door step and you said, look, I wanted to conform SQL or
I want to pivot this or I want to build a dashboard.
give me so much of this and so much of that.
And then off I go and I, you know, I curate the data pipeline and I build the end result, right?
No longer true, right?
Because it's not, it's multi-vetted data sets.
It's not just your data.
It's your data and external data and third-party data and synthetic data.
And it's not a single compute environment, depending upon what you're attempting to do.
I've got to give you CPUs.
I've got to give you CPUs.
I got to give you TPUs and, you know, some tooling on top of it above the compute for you to get to the answer.
So where day people sort of pretty soon start to realize that the data concierge, the data marketplace, the computer environment, and the ambition all start to need to correlate to something that is sort of on the roadmap of a CIO or a CDO to put into place, right?
Or else you're doing this fairly sporadically. It's, you know, and you're reacting to sort of what people need as opposed to what you need to have for the ambition to be true.
One thing I'm always thinking about is there's obviously different sectors in the business world that naturally have access to more quantifiable data, right?
But then for those that maybe don't have as much, right?
They don't have as much structured data, but they have a lot of unstructured information, right, that helps their company move forward.
How should those types of organizations be looking at their data?
Like, is there a way that they can, you know, really corral maybe more of the unstructured data to really help propel their transformation forward?
Yeah.
I mean, like I talk about, you know, you will also sort of, you know, come to another conclusion when you start this journey for agentic and AI, right?
Most of it is unstructured before it really is structured, right?
like so, you know, documents, PowerPoints, right?
Like the things that you pretty much didn't, you know, go mine before is sort of like, you know,
the secret sauce for, you know, how you lend it, you know, conformity for your ambition.
I'll give you the example, right?
In our world, something as simple as, you know, staffing people through a resource management function
is pretty much making sure that you can sort of tie the role description to the right resume,
So when you show up to an engagement, right,
and whether we sold an engagement to migrate something in the cloud
or we build it, you know, we're building an agent in Salesforce
or we're doing an SIP transformation,
you need to have a particular skill set, right?
That means you've done this before in a particular industry.
You're certified in the technology.
That's how a resource manager sort of matches you and your experience to the role.
And every resume is either in a word document or a PowerPoint.
There is no humanly possible way for a resource manager to reach 455,000
or 177,000 resumes to find you the right role.
So what they do is they do a keyword search, right?
Partly because the resume database is not contextualized or indexed
for you to be able to do sort of contextual search,
like you are used to when you get into the interface of a Google
and the UI, UX prompt, you type in English what you need
and you see relevant ranked search results, right?
But what actually happened is Google parsed the entire World Wide Web,
parked it in a content store, indexed that data sets,
and gave you sort of contextuality through query
to be able to figure out rank and relevance
for you to get to the answer.
We did the same thing with a resume database, right?
We contextualized it, we indexed it,
we gave it a query engine.
Now it's as simple as sort of doing
on the UIUX prompt,
like a role description.
It shows up in near real time
with the resource and whether they're staffed or not staffed.
So my resume information and my staffing information
are collated for the answer
that you need, that took a resource manager or several resource managers to do one resource,
one role at a time.
And I think that's a great use case, an example that a lot of people can relate to.
So I want to ask you a little bit here about agenic AI, but before we do, real quick,
a quick break from our sponsors.
This podcast is supported by Google.
Hey, everyone, David here, one of the product leads for Google Gemini.
If you dream it and describe it, V-O-3 and Gemini can help you bring it to life as a video.
Now with incredible sound effects, background noise, and even dialogue.
Try it with a Google AI Pro Plan or get the highest access with the Ultra Plan.
Sign up at Gemini.com to get started and show us what you create.
All right, so, we talked a lot about the importance of data for a, you know,
like helping your digital transformation strategy.
But when it comes to agents, like that's when I even start thinking about data a little differently, right?
Because even if it's a human, you know, operating a large language model powered system, there's still a human that kind of looks at that data.
At some point, you hope, and they're like, yeah, that's, that's correct.
But when it comes to agetic AI and when these systems are going to start using our dynamic data and start executing decisions on our behalf,
I think it even more so prioritizes the importance of correct data.
Could you talk a little bit about what you've seen so far in your experience in that regard
when it comes to having your data right specifically for agentic AI?
Yeah, I mean, I'll tell you, right?
Like the reasoning aspect of an agent, you know, is sort of what is very appealing about
the fact that, you know, you can have a set of tasks being done on behalf of.
of a human or a machine by an agent, right?
So think of agent as something that knows how to reason through a set of complex tasks
to arrive at an outcome when you feed it some data.
I think where things, we talk about, you know, agents behaving themselves or an agent registry
or an agent orchestration, all the nuances of getting agents to operate.
And by the way, this nuance of an agent is going to arrive within your world, you know,
in a single fashion is sort of not true, right?
You know, when you orchestrate an agent and when you operate an agent from one agent to the other, you will transcend, you know, softwares or vendors or platforms or data, right?
So what you have to get right in essence is that the attribution of the data set that feeds that agent, you know, needs to be annotated correctly for you to be able to get that agent to sort of behave within the guardrails or boundaries of what you're accepting the answer to be.
And the nuances of that is realized when you start to train the agent to start to do things and you realize that, you know, it is doing something that is not deterministic and it's doing things sort of that are, you know, not expected.
And the reason why that is transpiring is because the attribution of the data that feeds that agent is sort of doing or feeding it things that's leading it to sort of, you know, an unexpected answer, right?
That's the best way I can put it or not what you would have expected.
So I think that if you start to look at attribution for the purposes of agentic
or if you look at attribution for the purposes of labeling for agentic, we'll pretty soon come to the
conclusion that that is sort of one of the biggest drivers for why agents orchestration or
registration or interoperability of agents become such an important component, which is why
protocols like open standard protocols for agent to agent is a big topic of conversation,
And, you know, no matter where you go these days.
Yeah.
And yeah, talking about these different, you know, protocols.
And maybe if you could explain a little bit for our less technical audience,
kind of like what you said there is, you know, labeling data for agenic AI.
Like, is it different, right?
And how should, you know, especially those larger organizations that do have the resources,
how should they be treating their data differently if it's ultimately going to be going
through a large language model type application with a human operating it versus an agentic
operation. What is that, you know, what are the main differences, if any, for handling that
data for agentic use? Yeah, no, the process of how it goes through sort of the curation in one
technology versus the other is the nuance of, you know, whether you use an LLM or you use an LML-centric
agent or not, right? But the nuances of labeling is very evident even in structured data, what
you do today, right? So if you didn't have the right cataloging or the business metadata or business
glossary, right, usage today is a problem as well. I mean, when I talk to most organizations,
they talk about how they haven't conquered their structured data challenges. And they're not talking
about sort of, they're talking about process-centric software and instantiating data that needs to be
labored for usage, right? So if you look at sort of the world of how data is created within the four
walls of most organizations today, you run a process-centric software. That's SAP, that's service-sand-s,
on the sales force, so on and so forth.
And the process instantiates data, right?
Once the process instantiates data,
somebody needs to annotate or label that data for business context,
technical context,
so that the usage, the persona that uses it,
whether it's a business person that develops a report on the back of it
or data engineer that builds a data pipeline
on the back of that data,
knows sort of what its intent is
and starts to know the boundaries of usage of the data, right?
That is a fundamental challenge,
irrespective of agent or LLM.
The problem is magnified because in tomorrow's world or an agent world, that data is not originating within the world.
So guess what the burden of proof lies.
It lies with the people that use something that is not happening within their four walls.
Now you're talking about labeling, annotation, business class free, technical catalog to be built for those datasets.
Imagine if it was hard to do it for your own data.
Imagine how impossible is it to do it for?
something that happens outside of your four walls.
Yeah, that's an interesting way to think about it is, you know, the data that is, you know,
originates within those four walls and those four walls are, you know, yeah, how can you even
define them, especially when we talk about, you know, multi-agentic orchestration.
And, you know, if you have different agents going out there and creating new data points on
their own, but there's maybe not, you know, maybe they're not working directly with a human
in that regard and if it's this multi-agentic setup.
Yeah, how can business leaders even start to think or plan for being able to collect
that data that's way beyond those traditional four walls?
Yeah, which is where registration of agents and so the guardrails of how they behave against
that registration and what invokes an agent and how do you register an agent and how you
orchestrate an agent.
I think we're still seeing the beginnings of that, right?
Anybody that is claiming that they've done this at scale and it works seamlessly,
you know, we don't buy it, right?
Because, you know, we do our own experimentation and we realize how hard it is, right?
And we're just getting started on multi-agent orchestration, you know, even before multi-agent.
We're just getting started on single agents, you know, sort of doing the intended outcome before we talk about agent to agent and handing off to other agents, right?
Like that is still, that is still something that we need to conquer, right?
I don't believe that, you know, that journey has come to its logical conclusion.
I think we're just getting started.
Yeah.
And, you know, I think when I think about AI success and, you know, the companies that are doing it versus the companies that are, you know, maybe further behind, I think Deloitte obviously has been at the forefront, right?
Like working with some of the largest organizations in the world on their AI strategy.
what would you say if we rewind and we look at Deloitte as a case study?
What are some of those things that even internally that really helped propel your own AI success as an organization
specifically when it came to your data strategy?
You know, we sort of recognized early on that this was not something that we could wait
and watch for it to get to a particular phase or stage.
You turn around and say that's when we'll depart those in the water, right?
Like we figured that, you know, this would be.
done to us if we didn't do it to ourselves, right? There was a level of awareness about what it was
doing to the value chain of our clients that it needed sort of our intervention a lot earlier than,
you know, we typically, you know, would have thought of it about, right? So the best way for me to
describe it is if you look at biofarmor or if you look at biotech, right, the evolution of
disease pathology from pharmacology to gene editing because somebody sequenced 210 proteins and you
You can tell what disease structure does to that.
So hence, you know, gene editing is the way to treat disease pathology
and not a bunch of biometrics where you go for blood tests and somebody says,
oh, you know, your sodium is off or your potassium is off.
Hence, you know, disease pathologies, this or that, right?
In reality, if you look at sort of what that does to life sciences where, you know,
disease pathology is now very different, right?
Or going to be very different.
Drug discovery is going to be very different.
manufacturing clinical trials, supply chain is going to be very different, is why we are in this journey.
I mean, we realize that the same aspect of what AI is doing to the value chain of pharma or health care or,
you know, autonomous cars, you take the example, or retail, or, you know, there is no industry or vertical or sector
that it is not going to touch in the shorter long term.
The question becomes, if we don't participate in this, the portfolio of services that make
us relevant today will make us irrelevant tomorrow because we didn't arrive at the time that
AI arrived in the value chain. So we did it to ourselves knowing fully well that the portfolio
of services that we need to build, the best way for me to describe it is the menu when you walk
into a hotel or a restaurant of your choice and, you know, the menu doesn't evolve over a period
time. You stop going to the restaurant. So our menu needs to evolve in conjunction with the evolution
of what's happening to these industries or sectors and the clients that we serve. And that was the reason
for embracing it from the get-go.
Yeah.
So, Ashish, we've covered a lot in today's conversation, but as we wrap up, what would
you say is the one most important takeaway that you have for our listeners when it comes
to the importance of their data strategy powering their AI success?
I mean, what I would say is, you know, walk with the end in mind, right?
Like, you know, if you sort of understand the outcome that you need to intend to do with
your data, right?
Like, that is your North Star, right?
everything else that you do should be in the service of that, right?
So, for example, if your ambition is to be agentic
or if your ambition is to be, you know, agentic plus,
you know, whatever the permutation or choice of tool that you use
or consumption pattern, right?
It means you've only used the data to consume it in a certain way,
whether it's for reasoning, whether it's for LLM, whether it's for conforms, SQL,
whatever it may be, AIML, you pretty much have to build your data strategy
anticipating that that is sort of, you know,
the capabilities that you need to have, not when the use case arrives, not when your business partner
arise, but in anticipation of the fact that it is, you know, what I call Horizon 2, right,
not even Horizon 3, you know. And most of these problems, when I classified in my mind,
they don't look like Horizon 2. They actually look like today's problems, right? And for us to be
able to sort of be relevant to our business partners, we needed to have a data strategy that would
serve the interest and needs of how we procure data, what data do we procure, how do we annotate it,
how do we label it, how do we get into a computer environment.
It was such great advice and really helping us lay the roadmap out because everyone's worried
and wondering about data and their strategies. So, uh, sheesh, thank you so much for taking time
out of your day to join the everyday AI show. We really appreciate it.
Thank you, Jordan. All right. Now as a reminder, y'all, if you miss something you said there,
because there is a lot of great value.
Don't worry, we're going to be recapping it all in our newsletter.
So make sure, if you haven't already, go to your EverydayAI.com.
If this is helpful, tell someone about it.
If you're listening on the podcast, please make sure to follow the show and subscribe.
Thank you for you back tomorrow and every day for more Everyday AI.
Thanks, y'all.
Meet Firefly AI Assistant.
Now live in Adobe Firefly, the Allman One Creative AI Studio.
Just describe what you want to create in your own words and the assistant handles the rest.
illustrating multi-step workflows across Adobe Creative Cloud apps,
including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
Stand control with the ability to step in and refine at any time.
See it today at firefly.adobie.com.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
little more AI magic. Visit your everyday AI.com and sign up to our daily newsletter so you don't get
left behind. Go break some barriers and we'll see you next time.
