Investing Billions - E146: The 92% AI Failure: Unmasking Enterprise's Trillion-Dollar Mistake
Episode Date: March 14, 2025In this episode of How I Invest, I sit down with Matt Fitzpatrick, CEO of Invisible Technologies and former head of McKinsey’s QuantumBlack Labs. Matt shares his deep insights on enterprise AI adopt...ion, model fine-tuning, and the challenges businesses face when integrating AI into their workflows. We explore why only 8% of AI models make it to production, how enterprises can overcome friction points, and the future of AI-powered enterprise solutions. If you’re curious about the intersection of AI and business strategy, this episode is a must-listen.
Transcript
Discussion (0)
An AI native solution. The framework is not just that AI is replacing what a human is doing, but how would you design the model with AI in mind?
I think most of the material benefit you're going to see is when you clean sheet any process to be like,
how would I design this process knowing all the AI tools I have from scratch, and how do I use both technology and humans?
And by the way, I think the example for that is going to involve both for a long, long time.
In fact, I think humans are a core part of this solution. I think Invisible, we believe
that's the human machine interface where all the value sits. But it's not necessarily just
giving all your people on an existing process and a tool. It's redesigning the process to
use all the tools you can get disposed of.
Let's talk about Invisible. Give me some specifics on how the company is doing today.
I joined in mid-January. We ended 2024 at
$134 million in revenue. Profitable. We were the third fastest growing AI business in America over
the last three years. So how will DeepSeek affect Invisible? The viral story was that it was $5
million to build the model they did. The latest estimates that have come out since in the FT
and elsewhere would say it's closer to $1.6 billion. I think the number that's been cited from a compute standpoint is like 50,000 GPUs.
So if you had just told that narrative as the exact same story, but with 1.6 billion of compute,
I don't even think it would have been a media story.
The fact that it costs over a billion dollars to build that model means it is just a continuation of the current paradigm.
Look, there are some interesting innovations they've had, mixture of experts.
They did some interesting stuff around've had, mixture of experts,
maybe some interesting stuff around data storage
that does have some benefits on reducing compute costs.
But I think those are things we've seen
other model builders experiment with already.
If I think about types of data,
they basically went around things that are base truth logic
like math, where there's a fair amount
of synthetic data available.
That's a fairly small percentage
of the overall training tasks
that I'd say most model builders are focused on. Tell me more about that. Think about training as kind of three main
vectors. So you have base truth information where a lot of synthetic or kind of internet broad
based data exists. So math is a really good example of that. Then you have tasks like creative writing
where there is no real kind of AI feedback, but there's no synthetic data that's existing.
There's no way to train those models without human feedback. But the most interesting one is you have a whole set of base
truth information where you also don't have enough synthetic data. So an example of that I would give
would be computational biology in Hindi. The corpus of that is just not broad enough. Each
branch of that tree and each topic you train off of will have a different approach.
And tell me about what Invisible Technologies does exactly.
We have two big components of our business. What I call reinforced learning and feedback,
which is the process on any topic
where a model is being trained,
we can spin up a mix of expert agents
on that particular topic.
So that could be everything from,
I'm gonna use the example of computational biology in Hindi.
Our pool has a 1% acceptance rate
and about 30% of the pool is PhDs and masters.
So these are very high-end specific experts.
The funniest one, I talked about recently,
somebody who's like falconry in the 1800s.
Things where there's just not a lot of good existing data.
And look, I think models are going to be built on the full corpus of information
that is matter to humanity.
So there's a lot of branches of that tree and we bring all of the different
experts to help train those models.
But that's only half the business where we're seeing increased focus and demands
on the enterprise side. The big challenge today and the kind of chasm that exists between, let's call
Silicon Valley and the enterprises.
There's, there's a demand for broad based model development, which is really
important, but I think what a lot of the enterprise is looking for is how do I
get that those models to then work at 99% accurate accuracy in my specific context?
Tell me about some examples of enterprise models that have worked.
Therein lies a great question.
The step that I've seen most, most frequently cited is that about 8% of
models today make it to production.
The two largest high profile public enterprise cases I've seen are Moody's
had a chain of thought reasoning example.
And then probably the most often cited one is Klarna had a contact center
where they basically built up an entirely GNI center contact center to replace
the old contact center
they had.
They realize the impact in the enterprise has not materialized
the way people expected it would.
I am very bullish on where it will go.
But to date those are the only two examples I can set.
I can cite some pretty public struggles, but there have not
been many other realized examples I've seen.
So there's hundreds of billions of dollars being put into this
problem set only two successful examples.
Where are the main frictions and how do you see that evolving over the next five, 10 years?
Most of that money to date has gone into building the models that are extensible, generalizable
and moving towards greater levels of intelligent change of thought reasoning.
We've seen unbelievable progress phase of the model building process.
The challenge is let's take, let's say you're an insurer and you need to build a claims
model.
What you need to know is that your model works with perfect accuracy.
You're not 99% accuracy.
The investments have been, have led to material improvements.
It's that the motion of then taking those models and fine tuning them in an enterprise context
has not been standardized yet.
The motion of how do I deploy a machine learning model with accuracy,
you've seen a bunch of really examples of that,
like straight through processing of mortgage loans is one example
where those are being productionized, they're working, and there's a ton of really examples of that, like straight through processing of mortgage loans is one example where those are being
productionized, they're working, there's a ton of examples of impact coming from
machine learning deployments.
AI has not really figured out it's what I call production paradigm yet.
The open AI's, the Anthropics, the XAIs of the world, developing these
incredible generalized models.
And then you only have really two use cases for enterprises.
You mentioned fine tuning.
What are the other steps that a company needs to go through in order to make their AI work?
Let's take an asset manager that is going to build a system to do ongoing reviews of
its assets based on its internal investments, right?
The first step you need is you need all your internal data organized and structured and
in a place where you can use it and access it.
That's probably the biggest challenge most people face is there's a joke I like to say
that when good AI meets bad data, the data
usually wins. And I think the challenge is if your internal data environments, if you don't have a
clear definition of your your assets, your products, if you don't have kind of what I'd call organized
core data domains, it's very hard to even use AI until you've got that. That's probably the biggest
challenge I think the enterprise faces right now is most of the data is on systems that are that are
a decade or later old. It's not organized or mastered across those systems
in a way that they can use it in GNI models.
So that's one big problem.
I think the bigger than issue is,
so let's say you get that all organized.
David, I'll put you on the spot to give you an example.
So let's say that you built a,
let's say you built a model, a GNI model,
to produce summaries of investments
in the financial services space
and just kind of look at new investment ideas.
How would, you built a chatbot to do that,
you spend money to do it, and at the end of that, you have a model that will start generating these kind
of investment memos. How would you define a good memo or a bad memo at scale? So let's say it
generates 10,000 memos. How do you know it works? If you're on a fund, then you know,
having instant access to fund insights, modeling, and forecasting are critical for producing
returns, which is why I recommend Tactic by Carta.
With Tactic by Carta, fund managers can now run complex fund construction, round modeling
and scenario analysis, meaning for the first time ever, you could use a single centralized
platform to do everything from analyzing performance to modeling returns with real industry benchmarks.
Most importantly, communicate it all with precision to your LPs.
Discover the new standard in private fund management with access to key insights
and data-driven analysis that could change the trajectory of your fund.
Learn more at carda.com slash how I invest that's C a R T a.com.
Slash H O W I I N V E S T.
It's difficult to do at scale at 10,000, but I think on an individual basis, a
good model is a model that hits all the points and then has more clarity and more
details on the sub points.
So I would evaluate it based on, did it get all the main key points of the
investment thesis at a high level?
And then were the sub level points sufficient or covered the main topics?
What you're, what you're saying makes complete sense, which is you have a set
of parameters or set of outcomes you're looking for in memo, even if you have
that though, the question that becomes, how do you evaluate that consistently
across 10,000 memos?
And I think this is the difference between back testing of ML data set versus
gen AIs, you need a way to actually go back and validate that what is produced
works.
And I think that has been the real challenge that the enterprise has struggled with is
you may have a sense for what good looks like.
You might say, for example, the definition of a good investment memo would be like at least a
paragraph summary of competitive set, some context on the market, including growth rates.
Like you could set a set of parameters that you're looking for to answer, but then you have to wait
through and kind of assess all that. And so what we've spent the last eight years doing for the
model builders and others
is building what's called semi-private custom evals,
where we effectively set parameters like that
that would say, these are the definitions of good,
this is the outcomes we're looking for.
And then we use human feedback to score those parameters.
So we could go at big scale and say,
does this outcome cover what you're looking for?
And we bring subject matter experts to Bayer
to actually do that scoring.
I think that's actually been the big gap is
these are often things you can't score
with a random person in the street.
You can't just put it into market and hope it works.
You need a subject matter expert to say,
this looks generally good
before any organization gets comfortable launching it.
One way I've seen enterprises do that,
is I've seen a couple of customers experiments already,
is they'll actually have their own employees
evaluating this at huge scale.
But if you think about the time suck of like having large numbers of people
just reviewing general outputs, that's very hard to do.
So I think a lot of what we've now evolved to is on the enterprise side, a
mix of these kinds of evals and assessments of the models that are happening
that we then help customers fine tune and improve their models.
Is there a gap between what a generalist searcher might want
and somebody domain specific?
In other words, if I'm making a hundred million dollar investment decision based on a memo,
that has to be much better than if I want to find out if, you know, dogs could eat a
certain type of food and, you know, what's the best practice for raising a healthy dog.
Post earnings reports are more than just a data dump.
They're goldmine of opportunities waiting to be unlocked.
With the Lupa, you could turn those opportunities into actionable insights. DeLupa's dynamic scenario building tools integrate updated earnings data,
letting you model multiple strategic outcomes such as what happens if a company revises guidance.
With automated sensitivity analysis, you could quickly understand the impact of key variables
like cost pressures, currency fluctuations, or interest rate changes.
This means you'll deliver more actionable insights for your clients, helping them navigate
risks and seizing opportunities faster.
Ready to enrich your post earnings narratives?
Visit dalupa.com slash how that's DALOOPA.com slash how today to get started.
One of the questions you're asking here is what is the bar or the risk bar for production
depending on the use case?
And I do think it's different.
As an example, if the goal of a chat bot is just to say something like review restaurants
and it's a consumer facing, the risk bar on that is, does it say anything toxic?
Is there any bias?
You can put some risk parameters around it, but you don't really need kind of subject
matter expert feedback on it.
I'll give an interesting one, legal, like law, the bar for accuracy on that is materially
different than a consumer example.
And many law firms are experimenting with this, but it's hard to assess something like
a debt covenant agreement without subject matter experts weighing in on its scale.
The outputs are consistently good.
How do AI models evolve when the parameters change? Is this something
that will always be needed to be refreshed? There are two ways that models generally be consumed.
Many models that will be consumed by consumers that are effectively just going to be what the
model builders produced. Usually the way the enterprise is using these models is they're
tailoring those models to their corpus of information. I'll give an example. Let's say
that you have two wealth managers. Let's say I'm going to make up to Fidelity and T-Row Price, and they want to use, you
know, experiment with things like Roe would buy through your question answering.
They're not going to just use an off the shelf framework for that.
They're going to tailor it off of all of the information that exists in their communications
and their training documentation.
Any model that's being trained at the enterprise is usually being trained off of the internal
knowledge management corpus that that institution has.
And so you're using the large language model from the model builder and you're tailoring
it to your specific context. That process is called fine tuning.
Prior to becoming CEO of Invisible, you headed McKinsey's Quantum Black Labs, which is their
AI labs. What did you do at McKinsey?
I focused on three main things. One, have all of our large scale data transformation,
data lake house, data warehouse builds. So the first thing I mentioned, which is if your data is messy, it's very hard to use AI.
I spent a lot of time focusing on that.
I spent a lot of time doing custom application development.
So building all sorts of different applications, whether that be for retention, pricing, contact centers, kind of software,
custom software that people could use to deploy models.
And I do think that's an understated part of a lot of this is there is what a model does,
but then there is the way that somebody can understand it and interpret it. And a lot of that is
the user interface by which they consume it. And so I think that's something the enterprise
is spending a lot of time on is what is the user interface by which people consume and
think about and make decisions around these models.
And then the third area, I oversaw the Gen. AI lab, which is McKinsey's kind of global
Gen. AI tool build. When I was there, we were doing anything from 220, 240 Gen.i builds at the top.
I want to double click on the enterprise side of what you did at McKinsey.
You mentioned those two high profile use cases for successful enterprises built.
Did you build any successful enterprise use cases while you were at McKinsey?
We definitely did.
And there's a public case you can reference that a couple of folks in
Quantum Black build for ING, where it's effectively a chat bot.
And one of the things they mentioned in it, very similar to what I'm saying now is a lot of what
was required to put that in production was getting it to 99% accuracy. So they had a lot
of parameters and fine tuning that you get around testing, quality controlling it, building audit
LLMs to make sure that the outcome was good. We definitely did a lot of that, but the rough math
you see across the industry is about 8% of GNI models
make it to production. And that's broad-based. The amount that kind of stall around the proof
of concept pilot phase is pretty material. I think it will get better over time, but to date,
the challenge has been a lot of things I mentioned, challenging data, an unclear definition of good,
and an unwillingness from the folks in the field to actually use and believe the outcomes of the
models. And I think that's going to take time. There's this concept in the productivity space, which is 80% done is 100% good.
Is there like an 80-20 rule here where you could use AI to solve many things
and dramatically decrease your need for sales representatives for customer support?
And does it have to be a hundred percent good?
That is a really complicated question.
So there's another analogy we shall use, which
would be manufacturing lines in a factory.
And so if I ask you the question, if I have
10 people on a manufacturing line,
and every one of them saves
5% of their time, what is the
line savings?
I'm guessing half a person.
Zero. Because you can't take out a line,
and you can't take, no person
can be taken off that line. You effectively just move to a world where everyone has a little bit more free time. And I think that's the challenge. The 20 years.
Things like co pilots have had a very.
Interesting kind of last 2 years in that they are helpful coding co pilots, legal co pilots, all these things, but it's unclear the degree to which they actually save any work. They kind of tweak a lot of things on the margin.
And I think the difference I'd say in 80-20 is,
I think to do that well,
you actually have to re-engineer processes.
You have to say, what does my end-to-end workflow look like
for claims processing or whatever that might be?
And how do I take out two full steps
to actually get to a better level of efficiency?
That's hard to use a software tool for.
You need kind of people on your team
to think about the workflow design. You need to redesign the actual process flow.
That's been a bit of the challenge of the last two years is a lot of people have just focused
on all different types of co-pilots across all different industries. And I think that's helpful.
But I think the next phase of this is actually process redesign and moving to ways where you
can actually totally restructure the way a line works as an example. An AI native solution.
The framework is not just that AI is replacing what a human is doing, but how
would you design the model with AI in mind?
Most of the material benefit you're going to see is when you clean sheet any
process to be like, how would I design this process knowing all the AI tools I
have from scratch and how do I use both technology and humans?
And by the way, I think the example for that is going to involve both for a long
long time and humans are a core part of this solution. I think Invisible, we believe
that's the human machine interface where all the value sits, but it's not necessarily just giving
all your people on an existing process and a tool. It's redesigning the process to use all the tools
at your disposal. Thank you for listening. To join our community and to make sure you do not
miss any future episodes, please click the follow button above to subscribe.
So let's talk about Invisible. Give me some specifics on how the company is doing today.
We ended 2024 at 134 million in revenue. Profitable. We were the third fastest growing AI business
in America over the last three years.
You just joined as CEO. What is your strategy for the next five to 10 years? And how do
you even conceptualize a strategy given how fast the industry is changing?
We've had explosive growth in the current kind of core of the business, which is AI training.
And we plan to continue to focus on that.
Our goal is to work with all the model builders to get these models as accurate as possible and support them any way we can with lots of human feedback.
So if you think about what Invisible has there, we have this kind of AI process platform where we trot out an individual task into a set of stages and then insert kind of feedback analytics
and all of those different steps. We then have the AI training and evals motion I described,
which is a set of modules. On the back of that, we have a labor marketplace where we can source
all of those 5,000 different expert agents on any given topic. The core of that will remain
our focus. The shift I envision are kind of twofold. One, deepening our focus on using that
for fine tuning the enterprise.
This is something I think all the model builders
are hopeful as before as well,
is the more that we can help all of the enterprise clients
figure out how to get the most of their model builds
they're focused on, how to get those working,
that's better for everyone.
Everyone is hoping to see many more examples.
And I, by the way, am very, very optimistic
that over the next five, six years,
we're gonna see many, many more examples of great GNI use cases in production. It's just been,
I think, a period of learning. The last two years have been kind of a proof of concept phase for the
enterprise. Really helping the enterprise get many of those in production is a core focus for us.
The other big area that I'm going to evolve Invisible into is the analogy I would use is we're
going to build a modern service now anchored in GNI. So Invisible invisible process platform will include much more data infrastructure.
It'll include an application development environment and process builder tools and it'll include our eight kind of our really, really good
services delivery team around that.
So one belief I have is that it's very hard to do any of this with push of a button.
I think the age of software has kind of relied on the idea that you build something and people take that as is and I think AI is much more around
configuring and customizing different workflows exactly for any given customer wants. You can envision what Invisible will evolve into as kind of an AI process platform with lots of process builder tools
where people can build very sector-specific applications like claims for insurance or onboarding for food and beverage or fund admin for private equity.
So you'll have a bunch of different verticalized use cases we'll go after and a lot of really
interesting core data infrastructure tools like data ontology, master data management,
things like that to help people get their data working.
How do you avoid being the victim of your own success?
So you come in enterprises, you streamline their AI models using the services model.
How do you avoid making yourself obsolescent?
The funny piece of context outside there is 70%
of the software in America is over 20 years old.
The rate of modernization of that has been glistening slow.
I know there's been a lot of kind of hype
that says suddenly the whole world is gonna be hyper modern,
everything's gonna work in two years.
I think this is a long journey over the next two decades
where we get to a world where every enterprise runs off
of modern infrastructure, modern tech stacks and functions much like the digital
company, digitally native companies and built up over the last five years.
That will take time to get to, but I'm very excited about what our
platform can do to enable that.
I said another way, your total addressable market is every enterprise
for minus the two that have built models.
Even in those two companies, I'm sure they're, they're looking to streamline
other parts of the business.
I think that's right. The interesting thing, if you look at what I would call the application modernization market, so all of the modernization of legacy systems, it happens annually.
No player right now is more than 3% of that.
So it's actually a very fragmented market that is painfully slow in how it moves.
And it's the main frustration points for most enterprises.
Like if you ask the average CEO in any company
that's over 10 years old, how happy are they
with their core data, the kind of tools they use
on a daily basis, most are pretty frustrated.
So I don't think this is something where the existing
is really good and everyone's really happy.
There's a lot of frustration that we are hoping
to help fix.
And I think Gen. AI will be the root of doing a lot of that.
I think there's a lot of tooling you can do
to generate insights faster, to pull up reporting faster. And so we will be a Gen.ai native kind of application development platform.
You have a very unique vantage point in that you're the CEO of one of the fastest growing AI companies. You ran McKinsey's lab. Walk me through the AI ecosystem today in terms of how you look at the ecosystem.
ecosystem today in terms of how you look at the ecosystem?
I've talked to a bunch of VCs about this in the past couple of days, the infrastructure layer, which is where most of the capital is going today.
And that's a mix of kind of things like data centers, as well as the model
builders, and you asked about the gap to kind of investment today to enterprise
modernization, the challenges that above the infrastructure layer, you have what
I call the application layer, which is individual tools for individual use cases.
Right.
And that could be, I mentioned claims, it could be legal services.
It's all the verticalized applications that exist anchored in
GenAI to solve problems.
All of those applications today, for the most part are SaaS or traditional software based.
So they are designed like all software's last 20 years to be a, a, kind of a push
button deployment of a specific use case that functions like traditional SaaS software.
I am skeptical that that is actually going to be the way that impact is realized with
GNI for a couple different reasons.
Software as a paradigm has existed that way because the idea was it took so long to get
data schemas organized and structured and it took so long to build any custom tool that
you had to invest all the money upfront in building a perfect piece of software.
Once you got data locked in on that software, it was very hard for anyone to ever migrate off of that.
The term of this is your systems of record.
Once you're locked in on any sort of a system of record,
whether that be an ERP system,
whether it be an HR database,
you basically never leave as an enterprise
because the data is really painful to move.
And so that's been the conventional wisdom
on how to build software for a long time.
You've had some really public examples.
Satya Nadella mentioned it.
What GENI-I may enable is a movement where the value moves from the system of record
layer to the agentic layer.
So you actually move to a world where people don't say on software that's sticky just because
of the data.
They actually want the best possible software for their specific institution.
So you might have a world where people are building tooling
that is much more custom to their enterprise.
You might have a world where I have a React workflow
that uses analytics that are customized to my enterprise
in a cloud environment.
And I can stand that up in a couple months.
And I think that paradigm is a very different way
forward for technology.
Now, I'm sure there are some that would dispute that.
I'm sure there's some that will say software will exist
as it always has, but I would say that the main feedback I heard from a lot of V would dispute that. I'm sure there are some that will say software will exist as it always has.
But I would say that the main feedback I heard from a lot of VCs is that most of the application
layer today focuses on the standard software paradigm.
And I think we're looking at something very different, which is we want to have kind of
an application development environment with a lot of configurability and customizability,
the ability to build verticalized applications for specific sectors.
That will allow us to say not this is our tool, take it, but much
more, what is the workflow you need?
To bring that to life, let's say a company is looking, a Fortune 500
company is looking to create a CRM.
What would an AI native CRM look like for a Fortune 500 company
versus just using a Salesforce?
What you would usually end up doing in that world is you'll look at Salesforce or Dynamics
or ServiceNow has one of these now, and you will buy out of the box functionality.
Like you'll buy, let's say, their contact center tooling, but then you will end up
customizing a fair amount of that to your enterprise.
So you'll say, my contact center is going to have this flow for services, this over calls.
And so even though you're buying the tool, you're going to spend a year customizing and configuring it for your workflow. CRM is a little bit
different in that you do have several large players, Salesforce Dynamics and ServiceNow
now, that have built fairly good builder applications for that use case.
If you're successful as CEO of Invisible, what will the company look like in 2030?
I use the analogy now because I have a ton of respect for what they've done.
My main North Star metric is that every GEN.AI model we work on will reach production.
And so I'm really excited about working with all the model builders over the next couple
of years to continue to fine tune and train their models and get that working at huge
scale in the enterprise.
And I think that's something that we will be a huge driver of.
What would you like our audience to know about you, about Invisible Technologies, or anything
else you'd like to share?
What I don't want any of what I've said to Kim McCrawls is pessimistic.
There's nobody that believes that AI will be more positive for the enterprise over the
next five to 10 years.
I think the last two years did not live up to the hype cycle, partly because there was
a belief that you could just buy a product out of the box, push a button and suddenly
all your gen.i will work.
My kind of advice or view on the path forward is I don't think that will be the paradigm
I think every enterprise will have to build some capabilities around what I want to get out of these models
How do I train and validate these models?
How do I make sure my data is adequately reflected in these models?
And that's a very doable thing when we sit here five ten years from now
There'll be some really exciting deployments in this like the ability to stand up new software new digital workflow companies based on
Genie is gonna expand significantly like the ability to stand up new software, new digital workflow, companies based on GNI
is going to expand significantly.
But I do think it's been a bit of a reality check for the last two years that, you know,
this is not like I just stand up a piece of software, push a button and everything works.
How should people follow you and Invisible?
You can add me on LinkedIn.
I'll be posting about some of the updates we'll be adding there.
We're building kind of a, I call it data insights function, Invisible as well.
We're going to start to bring as much of the truth that we're seeing and what's exciting
and what we recommend to our enterprise clients so we can help them navigate what is a very
complex and difficult world.
Thank you for listening to my conversation with Matt.
If you enjoyed this episode, please share with a friend.
This helps us grow and also provides the best feedback when we review the episode's analytics.
Thank you for your support.