Storage Unpacked Podcast - #267 – The Essential Role of Data within AI (Sponsored)
Episode Date: March 14, 2025In this episode, Chris talks to Sharad Kumar, Field CTO at Qlik about the value of good-quality data when developing AI solutions....
Transcript
Discussion (0)
This is Chris Evans and today I'm joined by Sharad Kumar from Click. Sharad, how are you?
Hey Chris, how are you doing? Very good, thank you. Very good. Could you just tell everybody
what you do for Click and then we can get into our discussion for today? Yeah, sure thing. Hi
everyone, this is Sharad Kumar. I'm the field CTO of our data business and my responsibility is I sit
at the intersection of our product organization and our field organization. And really the purpose is to make sure
our customers and our partners are getting the full value from our
software. Okay and in general what does the company do? Because people might not
have heard of you certainly not through this podcast. Yeah sure sure. So click we are a data integration, data quality, AI machine learning company, right. So we provide a
platform that enables our customers to get from data to
outcomes quickly and reliably. So we have a suite of products in
a platform which allows you to move data, transform data, build
trust in it, be able to do analytics on it,
be able to build models, machine learning around it, and then be able to get answers from your data,
and then take that and embed them to generate outcomes. So we're a software company, data
analytics. Perfect. Excellent. That sets the scene for us, without a doubt. And certainly it does in
terms of what our conversation is going to be about today. So it's interesting if we look at the
market today, lots of talk about AI, but you know, and you see lots of experts around AI.
But really, what I'm hoping we can talk about today and we can very much focus on is the
fact that within that AI boom that we're seeing at the moment, especially gen AI, our data
is probably the key and most valuable
piece to this rather than a lot of discussion which is currently about the infrastructure.
So really, I want to sort of dig in today with you and understand exactly what we should be
thinking about when we think about what the data really means to companies, how they should be
using it, and what your experiences are and what your company's experiences are with that data and how it actually should be used and managed within
the enterprise.
So our topic really is AI, but really it's more about the data.
So why don't you start us off by talking about where we are in the current sort of gen AI
model space, because I think that would be a great place to start, Sharad.
Yeah, sure thing, Chris.
So if you look, right, the market certainly changed in January. DeepSea
released their R1 model, which is about 671 billion parameters. And really the interesting
thing about it was the report that was done with a training cost of six million dollars only, which is about 20 to
maybe 50 times cheaper than the leading models. And that really shook the market. Now you could
argue whether that's accurate or not. You'll hear a lot of things. Is it under reporting of hardware
costs or it's not accurate costs because they use model distillation, but whatever that doesn't change the fact that this market
is evolving very, very quickly and really models are getting commoditized.
So we used to think there's a duopoly of open AI and Tropic, they were the two big giants
in terms of the models, but what we are seeing now is a rapid evolution where a lot more
models are coming into the market.
And it's really, everybody predicted it,
but really nobody thought it'll come so quickly, right?
So that's one way I think.
So the other part of it is you're gonna see models
with, I would say varying degree of largeness, if you may.
Right, so you have really large language models
like OpenAI GPT-4, which has about 1.5 to 1.7 trillion degree of largeness, if you may, right? So you have really large language models,
like OpenAI GPT-4, which has about 1.5 to 1.7 trillion
parameters, same thing, Google, Gemini, Ultra,
they're working on a big, and OpenAI is previewing,
I think it's still in preview, they have a model called O1,
which is gonna be more like 2.8 trillion.
So that's on one side, you have really large,
large language models, but on the other
side you're also seeing smaller, smaller models. So even when they released DeepSeq, with that they
released a series of smaller models which are distilled from, which are going to be easier to
use, consume less CPU. So I think what we're going to see is this whole spectrum where you have really,
really large language models, with a lot of parameters for more general purpose use, and then really
more specialized, potentially domain-based, industry-specific model, which are smaller,
which are for specialized task and for special purpose. So I think that's what we're going to
see. And the model is going to start getting commodized and this duopoly is gonna be, I would say, fractured.
It's already fractured and you'll have a lot more companies
providing models.
Okay, right.
So that's a great starting point.
And it's interesting you said $6 million
because that just immediately made me think
they managed to train something for the cost of a bionic man,
which is quite funny.
Going back to what we used to think was $6 million seemed
a lot years ago, you know, 40 years ago and that TV series was on, but you know,
nowadays it isn't much.
So with that in mind, do you think businesses will create their own models?
Or do you think that they'll take off the shelf models and use things that
were already in place like open AI?
Or do you think there's going to be a mixture of that?
I mean, what are we likely to see in the market?
Yeah. yeah.
So I would say fewer companies will create models from scratch, right?
Because we know to create and build models, it requires a lot of infrastructure, whether
you say 6 million or 100 million, right?
It requires a lot of data to train those models and it requires special data science and expertise.
So I think not every company is going to have those types of resources to create models. We'll see the model creation,
although it will get more commoditized, will be large tech and specialized tech companies
or specialized companies like you would see. Remember some time back, Bloomberg created a
model called Bloomberg GPT. It which is very specific for financial services domain
because they had a lot of data
that they could train the model on.
So I think not everybody is gonna be creating models.
What we are gonna see on the other spectrum we see
is most companies using off the shelf model
and really use this mechanism called RAG
or retrieval augmented generation,
where at the inference time time you pass your context,
your specific data to the model to get the answer. So that's the most prevalent way we see. But we're
also going to see people in the middle who are going to take existing models and either distill
them down to smaller models which are more specialized for tasks or fine tune, take existing
models and fine tune them with their own specific data for their own purpose.
And I think wherever people create new models,
I think they'll be still what I call small language models,
which will be very specialized to the task
or the industries they're operating.
Okay, so it sounds like the model side
isn't really that much of a challenge
because either people are gonna be building
these large language models or small language models.
There's gonna be a lot of that around.
So I'm guessing there must be another sort of step
past that within AI.
And I hear talk about things like the agentic architectures
and real-time data processing.
I think that's different from what you were meaning
when you mentioned RAG,
which obviously is retrieval augmented generation. So how will those technologies sort of come in and what do we think
that's going to happen there? Yeah, so I think if you kind of like I go back to I was listening to
this I think presentation from Andrew Ng who's kind of considered the father of AI and he kind
of made me interesting thing he said was AI is going to be like electricity,
right? It's going to be everywhere. And I think what the key thing that made me think of is going
to be everywhere, you don't just get value by electricity, you get value by using it,
right? Putting it to use and harnessing the electricity. So I think the same thing in AI, a lot of AI is going to get created, but how do you
move from lab to real world is going to be the key.
How do you go from this experimentation stage to operationalizing AI is going to be the
critical, whether you take off the shelf models and build AI applications using them or you
create your own model.
So, as a model is the first step, but then how do you create then AI applications using them or you create your own model. So as a model is the first step, but then how do you create then AI applications?
And I'll come to agents in a bit,
but then how do you apply them?
That's gonna be extremely critical kind of that movement
into the real world to solve real world challenges,
take your models and put them in points of engagements,
where people actually can
use AI, that's going to be the key. Sorry to interrupt, but I just think that sort of,
you know, taking a step back and thinking about that, I guess that's the same with any IT
that we buy and any IT that we use, you know, you give somebody a computer, it's a basically,
it's a brick until you actually put some applications on there and you run it with
something that actually helps your business. So I guess what you're saying with a lot of the AI stuff
is it will become so mainstream
and so sort of integral to our business process
that actually in reality, it's not the AI itself
that's the thing, it's the application of it
and how you use it and where you use it
and how you use it effectively.
Yeah, absolutely Chris.
And that's where I was coming to the two important
key elements
to making that happen.
So first is, and I think you said this earlier,
if your models are not the differentiation,
then your data is the differentiation.
So that means you need this trusted foundation for data.
So just like if you go back to the analogy of electricity,
if you need to produce electricity, you need
to make sure you can create it, you can distribute, you can use it in a robust way.
So what do you need?
You need transformers, power lines, switches, fuse it.
So same thing, if you want to do AI, you need to make sure you have a trusted foundation
for data, right?
That you can move it, move data, extract data from different types of structure
and structures in different form, be able to combine data, be able to transform it to
shape it for AI to use and make sure it's good quality, all the things.
So what we're seeing in the market is really the spending on AI products and services increasing
and it's doing a renewed focus on data management and data foundation.
Customers are beginning to think about data quality, data protection, metadata, things like that.
Because what they are realizing is I can experiment very quickly, but if I have to take that experiment
and operationalize it, what if my data is not good quality? I can take it and put it into production.
What if my data is not timely? What if there's
sensitive elements in the data? So they're all good for experimentation, but when I move it into the
real world, I'm going to put that model embed into my website for customer service. All those things
better be true about the data. So one is data is a critical part for making that last step happen,
because without that, you cannot really go from experimentation to operationization.
And the second thing is when you take that final step,
you need to make sure you have the right set
of what I call guardrails around your AI
for doing AI responsibly.
So what are some of the things there, right?
So you need to make sure that the content generated by AI
is not toxic, harmful or biased, right?
So that means sometimes you may have to filter it
before it reaches the customer.
You have to make sure your model output is not hallucinating.
That's creating the right content in the models
or the machine is not providing factually wrong
or misleading information, right?
So a bunch of other things like validating the data,
make sure the answers are compliant
with your regulatory policies, specifically your industry.
So I think if you just break it down
in order to make the last mile,
two things, you need a trusted foundation of data,
I need to make sure you're doing it responsibly.
You need to have both those things
in order to really take advantage
of AI. Okay, so I want to come back and talk about agentic architectures in a second because we sort of skipped over that. I mean, don't forget that because I rudely interrupted you. However,
I just, as you said those things, it just sort of made me think that there's two pieces to this.
And we'll dig into what the all of that piece of data side of things means in a moment. But the
two things that sort of flagged to me was, first of all, the quality of data side of things means in a moment. But the two things that sort of flagged to me was,
first of all, the quality of data in training the model
must be incredibly good because otherwise,
you said you're gonna get hallucinations,
you're gonna get falsities, I suppose,
and various other things.
And then the quality of the data you feed in
through something like RAG needs to be equally good
because you can't have the data that is, for instance,
looking at rules around what a customer's data looks like needs to be equally good because you can't have the data that is, for instance, looking
at rules around what a customer's data looks like when they're giving answers to a query.
Or for example, the Canadian issue where the airline said somebody could have a refund
and that was clearly a mistake.
And I'm guessing that they must have had access to a database where the AI could look at that
and say, here's the rules and that was wrong or something like that.
So I guess there's two bits to it, isn't there?
There's the value of the data that goes into training the model,
but the value of the data that's used by the model to actually do stuff
once you've actually got it trained.
Yeah, no, you're absolutely right.
You train your model on bad data, data that is skewed, right, inaccurate.
Yes, you're going to get wrong results.
And same thing, like you said, you take a trained model off the shelf, which we said,
like a lot of people are doing. And as part of RAG, you're feeding it data. So you feed it data
that's factually incorrect, that is missing data, incorrect data, it's not timely. So we'll come to
that. But in our view, quality of data is only one aspect of it. We
talk about trust in data, which is a lot broader than quality, and we can come to that.
Okay. So agentic then, let's do that.
Yeah. So what's happening today is we are moving into this agentic architecture,
which is all about automating task and work using AI agents. Now, you could probably ask me,
well, we have been automating workflows for a while, right?
Remember, not so far back,
we had this technology called RPA,
robotic process automation, which was all the rage.
But I think there's a big difference
between agentic architectures and something like RPA.
The way I describe it, RPA was instruction driven.
You tell it exactly what you're doing and
it'll just automate that. Okay, my workflow steps A, B, C, D in the sequence and RPA will automate
that. Now, agentic AI is more about intent driven. So, you express your intent to the machine and
what it'll do is it'll find the right agents and wire them together
to create that agentic workflow.
But each agent is capable of acting autonomously because they learn to do the test and that's
really the difference between co-pilots and agentic architecture.
Co-pilots help human.
Human is in the loop, the prompt, and they kind of help increase productivity
of agents. And then agentic architecture takes it to the next level where these agents have learned
are autonomous, can make decisions, can reason. And then based on an intent, you can actually pull
together a set of agents and wire them together on the fly to achieve that intent that you expressed.
Okay. I can see a comparison here coming in with something like
self-driving cars for sake of argument and it sounds like what we used to do
was basically the GPS device in your car that gave you the route
and we looked at it and went okay it's worked out the route for me but it
didn't really do a lot more than that and then you drove the car to get to
where you wanted so it sort of augmented your information and gave you a plan.
Whereas self-driving cars are actually saying,
well, we're not just gonna now know the route,
we're gonna drive the route.
And if things happen on the route,
we'll make decisions and we'll change our plans.
And you know, it says vehicles to avoid or people to avoid.
The agentic self-driving car, for better description,
is gonna make all those decisions on your behalf.
And it sounds to me that that's a reasonable comparison
about the difference between what was old style automation
and the agentic architectures.
Yeah, I think, Chris, in case you're right,
so I think the key is gonna be building those sets of agents
that do specific tasks
and then be able to assemble them together.
So yeah, I think analogy makes total sense.
Great, okay, let's talk about data then,
because this is what your company does.
And this is, I think, boiling down to the crux of our discussion today because clearly
AI is the electricity that gets us to that point, but data is the real value here as
it always has been.
So what can we describe and how should we say what good data really looks like within
an organization?
Yeah.
So there's a couple of aspects of it, right?
So typically when people talk about good data,
a lot of time, they talk about quality of data, right?
So quality of data, we've been talking about it for a while,
which are things like, okay, is the data complete?
You have missing data, like you're okay,
address information is missing, right?
Is it accurate or correct, right? match the real world? Yes, the address
is there for a person, but is the address correct for that person? Do they really live
where it says it lives? Is it consistent? One record to the other, your data coming
from multiple systems. If my record in this system, does it match? Which one is the most
accurate? Is it valid? Is it formatted
correctly? Let's take an example. If a person's driver's license number is populated, their age
is less than 15. That's not really valid. So there are things like that in the data, which says,
okay, what is good data? It has all has all those characteristics but at Click we use a broader term called trust in the data which goes beyond just the quality aspect
of it.
Okay, you can describe that in a second but I just wanted to highlight one example of
that.
So we moved into the house we live in now 25 years ago and in the UK we have something
called the Post Office Address which offers post office address file sorry
which tracks all of your post codes I use it codes and has your address
encoded so when that was set up our address was encoded incorrectly so it
said that we were actually living in a bigger town called Bedford although
that's nearest biggest town so for for about the first six, twelve months every delivery we ever had would go to somewhere 17 miles out of the way in the
wrong direction and then we get phone calls saying well we can't find you,
where are you and we're like oh yeah so every time I go in to put an address
into any system now I always re-edit our address to correct it to put that back
in and I've been doing that for like 25 years
because that system-
That must have been super painful,
I can only believe it.
It's crazy, but you know,
I think that's a good example of where
one bit of data goes into a system incorrectly,
and now that data's in that system
pretty much forever, incorrectly.
And if it had been tidied and cleaned up
at the very beginning,
it wouldn't have turned out to be a problem.
So certainly quality like that is key.
And I'm guessing, you know, the governance around how somebody even puts that data
into the, into that system is another one.
And, you know, it also makes me wonder about things like security and what happens
if somebody hacks in and changes something or, you know, how do I protect against all
of those sorts of things?
There's got to be a degree of what you just described as trust.
So how do we think of trust in that sort of concept?
Yeah, so the way we look at trust, we look at trust more holistically, right?
So there are different aspects of trust and we have actually come up with a framework
which consists of six dimensions of trust and we call it actually the click talent,
click trust score for AI, right? Because those
six dimensions are important. So let's kind of break it down what those things mean. So
first thing we talk about is your data has to be diverse, right? That means you don't want to build
your analytics and models on narrow and siloed data. So you want a wide range of data that has
different variations, patterns, perspectives, scenarios relevant to the problem being solved because if you have bias in your data,
you're gonna have bias in your AI system
is gonna make unfair decisions.
So first dimension we talk about is kind of data diversity.
Make sure your data, Derek.
The second aspect of is data timeliness, right?
So timely data ensures that the decision
that are made by AI system are current and relevant, right?
Because outdated data will lead to inaccurate predictions.
Right, so you want the freshest data available.
So let's think of it, right?
If you build a chatbot and you put it out on the website
which a customer can interact with,
and you're an insurance company
and somebody is asking him what claims,
but if your data behind the chat bot is
not the latest and the greatest you're going to get old answer which is you don't want so
so timeliness of data is becoming more and more critical in this world of of AI so people are
moving from these batch oriented systems to more more real time to get more timely data for AI.
So it's kind of the second dimension.
The third dimension we talk about is data accuracy, which we just discussed, which is all about quality of data. Make sure the data is of good quality, right? Things we talked about complete,
it's accurate and so forth. The fourth dimension we talk about is the security of data, because
more and more we're seeing the data that is being fed to AI systems
has sensitive information. It could have PII information, financial records, proprietary
business information, and this information needs to be protected because there is a chance
that this information could leak through the models. So you need to make sure your data is secure, right?
That only right people have access to it, right?
Models have access to it and doesn't get leaked, right?
So I see the security from two angles.
I mean, I hadn't thought you entirely right,
security of how that data is exposed through the model
to the customer, I guess.
And my view of that was of course more about
how do I make sure that my data doesn't get polluted by somebody injecting something in that shouldn't be in there.
You know, for instance, adding something in that says if my name's Fred Smith, yeah, when I put a claim in, you're always going to approve my claim in the insurance company.
I mean, an obtuse angle, you know, an example, but you could imagine somebody doing something like that with any sort of system where they
going through something that needs to give permissions or makes decisions and you influence that decision by injecting bad data. So I guess it could work in both directions.
Yeah, so part of that is data protection has multiple aspects of it. So one is detecting
things like sensitive elements and how do you protect it? Maybe you need to tokenize it, maybe you need to mask it. Then there are things like access control, authentication,
and authorization. That every time somebody comes in to access the data, they have to first
authenticate themselves, who they are. Most of the companies have very robust things around,
are you who you claim to be? Authenticating. And then authorization,
do you have access to data? And again, that's where you come up with fine-grained access control.
Like you were saying, somebody may have access to read the data, which a lot of people would,
but very few people can modify the data. So a lot of the data platforms where data is stored
has pretty extensive policy- based access control. So
you build good practices are that as you load data, you build these policies to protect
the data who can access it, who can read, who can write, who can update what kind of
data is visible. If you have sensitive data like social security, you mask it, maybe only
very certain few people can look at it in the clear.
Most of them will see a mass version of it.
So I think putting those kind of practices in place allows you to make sure your data
is protected both from the access perspective as well as to the algorithms and doesn't get
leaked.
Okay.
So two other then because we've done four out of the six, I think so far.
Yeah. Yeah. Yeah. Yeah. So the fifth one is your data needs to be what I call is contextual
and discoverable, right? So data should have meaning. It should have business context around
what does this data mean? It should be relevant. It should be findable. Look, I could do all the
things I talked about in the previous four, but if it's not understandable and discoverable, whether
it's by your data scientists who are looking to build models or fine-tune models, or your machines
can't access and tap into the data, so discoverability of data in the right context, I think,
is absolutely critical. And then the sixth principle we talk about
is data should be in a form, in a shape
that can be consumed by AI.
So when we were in the analytics world
and we are building kind of BI and visualization
for a while, you had to shape the data a certain way.
You had to put it into a data warehouse.
You have to model this at fact and dimension
because that's how BI systems understood. But in this new world of generative AI, if you're talking earlier,
we're talking about RAG, RAG type of use cases, to enable RAG type of use cases, you have to take
your data and you have to put it into a vector form. So you have to do multiple types of processing,
you have to take your structured and structured data, you have to chunk it, you have
to call an NLM to create embeddings, and then you have to store the embeddings in a vector store,
which is a specialized type of store. And then during the rag, during the inference time,
you would call this vector store to get your context to pass along with the prompt to the machine.
So you have to build data and shape data. So we sort of this new set of requirements around consumability of data. So the sixth principle we talk about the data
should be in a form which can be consumed by these AI. So those six principles to us
really talk about building trust in the data for AI.
And as a customer, how would you go about actually evaluating or measuring your data against
those metrics? Do you have a tool that would do that? Does somebody come in and help you
with that? Or is it automated? And that's, I think, quite an important thing to understand.
Yeah, so that's an interesting question, because as you're probably where you're getting to,
it's very hard to just productionalize this thing because each company has so and so.
We have created this framework.
We also behind it, we have some tooling in terms of how we can measure along these dimensions.
Then it would require some kind of, in my mind, services around it to work with the
customer to put this in their environment and actually configure.
Because as you can imagine, not every customer has the same type of data. These dimensions, the weightage of each dimension could be different in
the scoring. What makes up quality of the data, we just talked about multiple things, each could
vary. To be able to customize that scoring for each customer, a lot of it you could capture
probably automatically, some could be more subjective, right?
In terms of how they're doing.
So I think to us, it's a framework
with set of toolings behind it.
And then some, I would say services along with that
to really go into a customer environment
and be able to say, okay,
what is your readiness of your data for AI?
Okay, all right.
We'll talk about your tools and products
a bit more in a second,
but I guess I'd just like to sort of move on. And now having listened to that, for AI. Okay. All right. We'll talk about your tools and products a bit more in a second, but
I guess I'd just like to sort of move on. And now having listened to that, the one thing that's going to, you know, comes to mind to me, if I was a data owner and I'm sitting there as a customer is,
is my data in the right format? Do I need to have a better strategy about how I manage my data?
Do you know, do I need to secure it better? Do I need to process it better? Do I need to have better pipelines
and workflow around it?
So how is this the whole AI sort of evolution
changing the way that businesses are approaching
storing and managing their data?
Yeah, so we just did a study click and ESG,
using ESG we did a study
and kind of a couple of interesting things
came out, right?
So as you would expect, we found that 94% of businesses
are investing more in AI, which is, it makes sense.
But only 21% of successfully operationalize it.
Like we talked earlier, right?
There's a big gap between experimenting with AI
and operation.
And one of the things that came out driver of that
was having data ready for AI.
What we're seeing is companies
putting renewed focus on the data strategies.
They say, before I go to dive and start building
all these models and rack type architecture,
I need to make sure my data foundation is correct.
They're investing more in data strategies, right?
And another stat we found was 83% of the customers
reported increased focus on data management,
which loosely at data management talks about all those things,
quality, security, availability of data, right?
And it's been demonstrated that there's a direct correlation
between high maturity of your data management
and productionized deployment of JNI solutions.
So customers are saying, okay, I have to get down there.
I got to focus on my data strategy again.
I've got to look at my data from these different lens.
So I need to now data storage platforms are similar.
I'm storing my data on the cloud, in a warehouse, in a data lake, things like that, but really
is the data management around it?
Am I be able to acquire that broad set of data to make sure I have diversity in it?
Do I have the tooling to make sure my data is captured in more real time?
Things like that.
I think they're putting strategies in place, right?
They're trying to find tooling that they need
and the processes that they need around it
to make sure they're ready for AI.
So do you think that's driving people to, for example,
look at the applications that exist today
that are the sources for the data and say to themselves,
actually, maybe we should engineer our applications
to be a bit more precise when they capture data
from a customer or we should have an extra set
of validation stages in before we even commit that data
into our AI platform.
What do you think a lot of that's going on post sort of work?
Because from my perspective, I think I could see
both happening where actually what I'd want to do
is I'd want to go back to the application owners and say,
do you know what, you're feeding me this data and it's wrong in the first place. So can we tighten up around the accuracy about it in the first place?
Can we add the stuff into the source application so that as it comes into the AI, I'm not having to constantly rejig it and change it?
Because that sounds to me like that's A, an overhead and B, a time factor in various other things.
Yeah, so that's a good point. So if you look at typically how it works, you extract data out of a source system
and you may put it into a data lake. And first thing you do is you profile your data, right?
Understand the shape of the data, which is where you'll find out, well, 30% of the records are, let's say, missing address information.
Or then you do further quality checks on the data,
you may find there's duplicate records.
So that's where two things happen.
One, you make a decision on, do we fix it here
as we flow the data to more downstream AI type applications?
So things could be, so let's say if my address is wrong,
to your point, like you said, your address was wrong.
But maybe before pushing it for AI,
I could go check into other systems and correct the address,
so like before flowing, so that's one way.
The second way would could be, but like you said,
it always will take cycles to fix it.
Now at the same time, I could put plans in place that,
look, I need to fix this upstream
where the data is getting
generated. So why incorrect? Why is the address field not captured in 30% of the records when
I need it? Is it a method and procedure problem? Is my application as an issue that this field is
not required? So you go back and in certain cases, you can make upstream changes in process
or the application itself,
or decide to make the change as part of the flow.
And we see both, right?
Because certain times you can fix things upstream
or sometimes it may take time to change things upstream.
So as part of your processing, data pipelining,
you can, lot of cleansing you can do.
So it sounds like ultimately the need for good data within AI will drive the sort of better
behavior if you like from application owners and probably an iterative approach to correcting and
improving the quality. And I guess if I was a customer, I'd be looking at it and saying,
well, here's where we are today. And then where are we in six months? And I'd maybe
try and find a way of measuring that and saying, what's the difference in my quality to compare
to where I was six months ago? Have we improved where we were? And is that the sort of thing
that your products and services are helping customers with? I mean, is that what your
essential services would do for somebody?
Yeah, so it's a key thing.
So whenever I talk to customers,
end of all my life, I've been kind of in an advisory role.
So before CLIC, I've been spent a lot of years
in the consulting and the services side.
So I've been a practitioner myself and advisory role.
So what I always tell the customers is,
you need to measure your quality, but also define your service level
objectives. Do you measure again? So quality is not a one-time thing that data comes in and measure.
So you've got to build an environment where you're constantly measuring data quality,
and you're measuring it against a service level objective that you define, because again, not for everybody,
100% is never, that's like timeliness
or quality completeness.
You can define service level objectives.
I want to be at this level,
and then be able to measure and track yourself against it.
So like to your point you made earlier,
I could check is my quality improving over time, right?
Because I'm taking the right step,
whether to change my methods and
procedures or make sure writing is captured upstream, or I have data cleansing steps in my
pipeline, which cleans the data. So am I improving the quality of the data? So it's more of a
program in place. It's just tools are part of it, but really the rigor and the discipline to measure
yourself. And like you're saying, if we create this trust score, six-dimensional
trust score day one, you know where you are, then you put practices and pro-exam in place,
take, okay, how do I improve along each of these dimensions? What do I need to do? Right? And it's
going to be a series of steps and then you build a roadmap to get there
because it's not an overnight thing.
So you put things in place in a phased approach
to improve the trustworthiness of your data over time.
I think it's quite interesting because, you know,
we would have used KPIs quite heavily.
And I did when I was working in the infrastructure
side of things, we'd have KPIs and as you said,
service at either objectives or perhaps agreements,
depending on if it was an external provider,
to actually look at the infrastructure and say,
something as simple as uptime and availability
and performance and all that sort of stuff.
But really it's interesting to see it being applied
to the quality of data passing through a system
and the data that you store and applying the same sort of rules and logic to data as much as
you would do to anything else because ultimately I guess data is an asset to the business and
therefore you want that asset to be as valuable as possible.
Yeah, especially if you look at from a consumer perspective, we look at a lot of it from not
the producer side but also the consumer perspective. So we look at a lot of it from not the producer side, but also the consumer side. So if I'm the consumer of data, which you could think its analytics group
is, if I'm building, if I'm an application developer, I'm going to build the chat bot
against the data, right? I'm a consumer of data for building a BI report, I'm a consumer of data.
So what do they need, right? So they need to be able to find the data very easily.
They need to be able to understand the data, make sure the right business context.
They need to be able to build trust in the data, which comes with multiple things we
talked about.
They need to be able to know, well, is this data, when was it last refreshed?
Is the most recent data?
Is it good quality?
What's the quality level of this data?
Where did it come from?
What is the lineage of the data and the provenance because that builds trust in the data, right? Where did it come from? What is the lineage of the data and the provenance
because that builds trust in the data. So I think we always look at it from the lens of the consumer
side and what do they need in data to have trust so they can use it, have confidence in the data
and use it for building their AI applications. And that's what we enable from the data side of it to
make sure the data consumers can have trust in that data.
Yeah, interesting. I always think we hear the discussion about data being the new oil.
You know, that's a cliche we've heard for about 10 or 15, maybe 20 years.
But I always think actually data isn't the new oil. Actually, data is the new mining environment.
It's the it's that entire, you know entire hole in the ground that you want to dig.
And actually the oil is the good data, is the valid data,
is the curated and managed data, because ultimately that's
the oil.
You could drill a hole somewhere and get nothing.
And if your data is rubbish, you could drill through your data
and still find you've got nothing,
because the data isn't good enough quality. So really good data is the oil, not necessarily just
data. I think there's a very subtle difference there to be taken.
Yeah. And one more thing we talk about is really bringing, which is I think interesting
to talk about is bringing the product thinking to data. Because I feel like a data has always
been treated as a byproduct of code.
Really, ownership is lacking ownership. I go to a lot of customers and I ask who owns your data?
Chris, mostly we hear crickets. Nobody raises their hand or a data engineer from a corner of the
room will raise their hand, I own it. So, what data products are good, because if you look at
products in real life, products have
owners, they have product managers who are responsible for the data to make sure it has
the right features, make sure it's used, they iterate upon it, its version, it's trusted.
So, those are the qualities. So, we practice a lot in our platform. we apply these principles to what we call productizing data. And productizing data
means applying those principles. There's an owner, there's somebody who's accountable,
but inherently products need to be good quality so it has quality, they're easy to use, easy to
consume, they're reusable. So I think those are all the principles product thinking that's coming to the world of data,
and looking at it from the lens of the consumers of data.
Right.
Okay.
So let's move on then and talk about where your customers are on that AI journey.
Because I think it sounds, the way we've talked so far, that there's lots of potential, but
I'm not sure whether you're saying yet that customers are all the way there.
You've talked about some customers starting that journey,
but in your experience and what are you finding in your experience
going out and talking to customers, where are they in terms of this journey?
Yeah, so I think I would characterize most of them still
in the experimentation stage. Look, definitely some of them have rolled out very special use cases in certain areas
like customer service or sometimes for internal workforce optimization, but largely I would
characterize it's a work in progress.
Customers, so I would say customers are looking at it in parallel, right?
So one stream I have is I'm experimenting with AI while learning
about drag, how can I do that? But people who are doing it right are also saying in parallel to that,
I'm going to make sure that my foundation for data is correct. Because I can't spend all this time
experimenting and say, okay, now I know how to build this, but now my data is not correct. So I
can move forward anyway.
So I think the ones who are doing it right
are doing it in parallel.
So, okay, let me experiment with AI
and really work on things we talked about earlier,
like AI type of guardrails,
because they know even if I build a model,
I build a rag, how do I make sure
I can put it into production?
At the same time in parallel,
make sure that the data foundation is correct
and all those things we talked about today.
So I think we're seeing customers who are doing it right
and doing those both things in parallel, right?
And I would say most of them are still,
I would say on their journey.
Right, so, okay, so let's just sort of dive into that
in a bit more detail because the obvious
thing that when you describe that to me is that, you know, there's that whole desire to improve the
quality of your data. But if you haven't improved the quality of your data, can you really go for
the more advanced options like the agentic type stuff where you're almost autonomously allowing
something to go off and make decisions on its own? Or are you sort of hamstrung in terms of the level of AI you can roll out in the
sense that if my data isn't as good enough quality yet, I can only do very limited things
with my AI rollout because I could end up going down the wrong route and making lots
of mistakes. So it's a sort of a parallel that says, as I improve the quality of my data, so I can improve
the quality of the AI features I'm deploying and do more advanced sort of technology, techniques
or functions within my environment.
It seems like the two would probably go hand in hand.
Yes, absolutely.
That's truly the case.
Sometimes it could be without good quality, you can't roll anything out, or with certain
quality you can roll out something. But what your analysis is absolutely correct,
that as you improve your quality of data, more and more advanced AI use cases could be addressed.
Now, one thing I want, I think you mentioned the word agentic architecture is one interesting thing
we are seeing is really applying agents and AI to the data problem itself.
We've always talked about let's create data for AI use, but at the same time around the plane,
this AI applied to data. I'll give you a couple of examples. In the future, I think of it applying
the agent architecture to data quality. Right right now we all talk about, well,
data quality is more checking for quality of data. And then somebody has to create the rule,
hand code the rule, how do you fix the data? And that's very human intensive today. So think of
a future world if we could have an agentic architecture for that, where an agent is checking
for quality of data, and then making a decision based on that to where an agent is checking for quality of data and then making a decision
based on that to invoke another agent which automatically starts to cleanse and fix the
data and flowing through.
I think that's what we see data for AI.
Yes, data has to be cleansed in good quality to enable use cases and more use cases like
we talked about, the thing is applying AI to data itself.
That's an interesting angle, isn't it?
Yeah, that's an interesting angle because I guess
what you're saying is that there's lots of very sort of
low level pieces of work there that really,
it doesn't take much effort to do,
but actually it's just repetitive and it's almost predictable
in terms of what the answer is to fix that.
Go and look the data up from somewhere else
and get the more accurate version
and now insert it into this data.
So I suppose that makes sense to do.
And then it becomes a bit of a virtuous circle,
I guess, that constantly you're improving the quality
of the source data as part of that whole process.
So yeah, that sort of makes sense.
So, okay, I get all of that and I sort of see
where we're headed and that idea know, that idea of an AI
agent sort of fixing the dates quite interesting. So, let's go back and talk exactly what Click does
though and what you can offer a customer and how you'd offer that customer those, you know, your
tools and services because clearly we can see the scale of the problem coming up for businesses.
It would be good to understand exactly how you help that. Yeah. So we provide them software solutions, right?
To one, build a trusted foundation of data,
which has all the capabilities around,
the breadth of capabilities around movement of data.
You can get the data from databases, mainframes,
supply chain application, SaaS base,
or wherever your data is,
whether it's in databases, files, streams,
we can help them acquire the data, move the data, transform the data, join it, shape it
right into these different forms for BI, for J&AI, and then build trust in it, things like
improve the quality of the data, protect sensitive elements, all that creates usable ready data. So that's one part of our
portfolio. The second part of our portfolio is now be able to build analytics on the data. BI
visualization was kind of a heritage of CLIC, but from there we have evolved a lot, be able to build
AI on it. Rather than requiring data scientists to build AI models, we have something called AutoML,
which allows you to build AI models on the data very quickly, and then be able to get answers from the data. So maybe apply generative AI, and be able
to build these agents very quickly on top of the data. So we have an entire platform that allows
you to do that and then further automate it, embed them into application. So that's an end-to-end
platform we offer. And then we are cloud-first, but not cloud-only. So that's an end to end platform we offer and then we are cloud first but not
cloud only. So some of our customers who want to operate in a customer managed environment,
we offer that also.
Okay, brilliant. So where should we direct people to go and look at? Just your main website?
Yeah, so I think the main thing is website, right? We are pretty, Click as a company,
we are pretty active on LinkedIn. So a lot of time if pretty, Click as a company, we are pretty active on LinkedIn. So a lot of
time if people follow Click as a company on LinkedIn, a lot of announcements, a lot of
updates are happening on LinkedIn, but also on our website. We have this thing, if somebody wants to
follow us, we have this thing called Click Insider. It's a webinar series. We do it pretty
article on different topics. For example, we just had one yesterday, which was called a roadmap edition, which talks about where we are, what are we doing,
what's new we are bringing to the mix, what can they look to in the next three to couple of
quarters. So we do this periodically. So I would say our website, follow us on LinkedIn,
and look for these click-insider webinars and other events.
And then we also have our user conference coming up in Maine, Orlando.
That's great for where our customers kind of get together.
Excellent.
And I just should just spell for people that is Q-L-I-K.
Yes.
In case people are listening to this and hearing us say the word click and thinking, oh, okay.
But it's a slightly different spelling.
So it's worth it just qualifying Q-L-I-K.
Yes, Chris.
Okay, brilliant.
I will put links to all of those things in the show notes
so that people can go find them through the show notes
and don't have to do the searches.
So we make sure we give people plenty of information.
This has been a really interesting discussion
because it really opened my mind to understanding
the data aspect of this a lot more.
And I think for me within the AI model,
the data side of it is more interesting
because the infrastructure side is just grunt
to get you to where you are.
And it sounds like the way you're describing AI
in terms of the models, they're very much,
as you said, the electricity.
So they're the engine of this,
but they're not necessarily the core of it and the data is the core. So this has been a really useful discussion for
me to sort of understand this. So thank you for your time. I really appreciate it. I hope
to get an opportunity that we can chat further and talk about some other aspects of this
because this has been really good. But for now, Sharad, just thank you for this and look
forward to catching up with you soon.
Yeah. Thank you, Chris. Thanks for the engaging conversation.