Investing Billions - E146: The 92% AI Failure: Unmasking Enterprise's Trillion-Dollar Mistake

Episode Date: March 14, 2025

In this episode of How I Invest, I sit down with Matt Fitzpatrick, CEO of Invisible Technologies and former head of McKinsey’s QuantumBlack Labs. Matt shares his deep insights on enterprise AI adopt...ion, model fine-tuning, and the challenges businesses face when integrating AI into their workflows. We explore why only 8% of AI models make it to production, how enterprises can overcome friction points, and the future of AI-powered enterprise solutions. If you’re curious about the intersection of AI and business strategy, this episode is a must-listen.

Transcript
Discussion (0)
Starting point is 00:00:00 An AI native solution. The framework is not just that AI is replacing what a human is doing, but how would you design the model with AI in mind? I think most of the material benefit you're going to see is when you clean sheet any process to be like, how would I design this process knowing all the AI tools I have from scratch, and how do I use both technology and humans? And by the way, I think the example for that is going to involve both for a long, long time. In fact, I think humans are a core part of this solution. I think Invisible, we believe that's the human machine interface where all the value sits. But it's not necessarily just giving all your people on an existing process and a tool. It's redesigning the process to use all the tools you can get disposed of.
Starting point is 00:00:38 Let's talk about Invisible. Give me some specifics on how the company is doing today. I joined in mid-January. We ended 2024 at $134 million in revenue. Profitable. We were the third fastest growing AI business in America over the last three years. So how will DeepSeek affect Invisible? The viral story was that it was $5 million to build the model they did. The latest estimates that have come out since in the FT and elsewhere would say it's closer to $1.6 billion. I think the number that's been cited from a compute standpoint is like 50,000 GPUs. So if you had just told that narrative as the exact same story, but with 1.6 billion of compute, I don't even think it would have been a media story.
Starting point is 00:01:15 The fact that it costs over a billion dollars to build that model means it is just a continuation of the current paradigm. Look, there are some interesting innovations they've had, mixture of experts. They did some interesting stuff around've had, mixture of experts, maybe some interesting stuff around data storage that does have some benefits on reducing compute costs. But I think those are things we've seen other model builders experiment with already. If I think about types of data,
Starting point is 00:01:34 they basically went around things that are base truth logic like math, where there's a fair amount of synthetic data available. That's a fairly small percentage of the overall training tasks that I'd say most model builders are focused on. Tell me more about that. Think about training as kind of three main vectors. So you have base truth information where a lot of synthetic or kind of internet broad based data exists. So math is a really good example of that. Then you have tasks like creative writing
Starting point is 00:01:58 where there is no real kind of AI feedback, but there's no synthetic data that's existing. There's no way to train those models without human feedback. But the most interesting one is you have a whole set of base truth information where you also don't have enough synthetic data. So an example of that I would give would be computational biology in Hindi. The corpus of that is just not broad enough. Each branch of that tree and each topic you train off of will have a different approach. And tell me about what Invisible Technologies does exactly. We have two big components of our business. What I call reinforced learning and feedback, which is the process on any topic
Starting point is 00:02:28 where a model is being trained, we can spin up a mix of expert agents on that particular topic. So that could be everything from, I'm gonna use the example of computational biology in Hindi. Our pool has a 1% acceptance rate and about 30% of the pool is PhDs and masters. So these are very high-end specific experts.
Starting point is 00:02:43 The funniest one, I talked about recently, somebody who's like falconry in the 1800s. Things where there's just not a lot of good existing data. And look, I think models are going to be built on the full corpus of information that is matter to humanity. So there's a lot of branches of that tree and we bring all of the different experts to help train those models. But that's only half the business where we're seeing increased focus and demands
Starting point is 00:03:02 on the enterprise side. The big challenge today and the kind of chasm that exists between, let's call Silicon Valley and the enterprises. There's, there's a demand for broad based model development, which is really important, but I think what a lot of the enterprise is looking for is how do I get that those models to then work at 99% accurate accuracy in my specific context? Tell me about some examples of enterprise models that have worked. Therein lies a great question. The step that I've seen most, most frequently cited is that about 8% of
Starting point is 00:03:30 models today make it to production. The two largest high profile public enterprise cases I've seen are Moody's had a chain of thought reasoning example. And then probably the most often cited one is Klarna had a contact center where they basically built up an entirely GNI center contact center to replace the old contact center they had. They realize the impact in the enterprise has not materialized
Starting point is 00:03:48 the way people expected it would. I am very bullish on where it will go. But to date those are the only two examples I can set. I can cite some pretty public struggles, but there have not been many other realized examples I've seen. So there's hundreds of billions of dollars being put into this problem set only two successful examples. Where are the main frictions and how do you see that evolving over the next five, 10 years?
Starting point is 00:04:08 Most of that money to date has gone into building the models that are extensible, generalizable and moving towards greater levels of intelligent change of thought reasoning. We've seen unbelievable progress phase of the model building process. The challenge is let's take, let's say you're an insurer and you need to build a claims model. What you need to know is that your model works with perfect accuracy. You're not 99% accuracy. The investments have been, have led to material improvements.
Starting point is 00:04:32 It's that the motion of then taking those models and fine tuning them in an enterprise context has not been standardized yet. The motion of how do I deploy a machine learning model with accuracy, you've seen a bunch of really examples of that, like straight through processing of mortgage loans is one example where those are being productionized, they're working, and there's a ton of really examples of that, like straight through processing of mortgage loans is one example where those are being productionized, they're working, there's a ton of examples of impact coming from machine learning deployments.
Starting point is 00:04:49 AI has not really figured out it's what I call production paradigm yet. The open AI's, the Anthropics, the XAIs of the world, developing these incredible generalized models. And then you only have really two use cases for enterprises. You mentioned fine tuning. What are the other steps that a company needs to go through in order to make their AI work? Let's take an asset manager that is going to build a system to do ongoing reviews of its assets based on its internal investments, right?
Starting point is 00:05:13 The first step you need is you need all your internal data organized and structured and in a place where you can use it and access it. That's probably the biggest challenge most people face is there's a joke I like to say that when good AI meets bad data, the data usually wins. And I think the challenge is if your internal data environments, if you don't have a clear definition of your your assets, your products, if you don't have kind of what I'd call organized core data domains, it's very hard to even use AI until you've got that. That's probably the biggest challenge I think the enterprise faces right now is most of the data is on systems that are that are
Starting point is 00:05:42 a decade or later old. It's not organized or mastered across those systems in a way that they can use it in GNI models. So that's one big problem. I think the bigger than issue is, so let's say you get that all organized. David, I'll put you on the spot to give you an example. So let's say that you built a, let's say you built a model, a GNI model,
Starting point is 00:05:56 to produce summaries of investments in the financial services space and just kind of look at new investment ideas. How would, you built a chatbot to do that, you spend money to do it, and at the end of that, you have a model that will start generating these kind of investment memos. How would you define a good memo or a bad memo at scale? So let's say it generates 10,000 memos. How do you know it works? If you're on a fund, then you know, having instant access to fund insights, modeling, and forecasting are critical for producing
Starting point is 00:06:22 returns, which is why I recommend Tactic by Carta. With Tactic by Carta, fund managers can now run complex fund construction, round modeling and scenario analysis, meaning for the first time ever, you could use a single centralized platform to do everything from analyzing performance to modeling returns with real industry benchmarks. Most importantly, communicate it all with precision to your LPs. Discover the new standard in private fund management with access to key insights and data-driven analysis that could change the trajectory of your fund. Learn more at carda.com slash how I invest that's C a R T a.com.
Starting point is 00:06:58 Slash H O W I I N V E S T. It's difficult to do at scale at 10,000, but I think on an individual basis, a good model is a model that hits all the points and then has more clarity and more details on the sub points. So I would evaluate it based on, did it get all the main key points of the investment thesis at a high level? And then were the sub level points sufficient or covered the main topics? What you're, what you're saying makes complete sense, which is you have a set
Starting point is 00:07:28 of parameters or set of outcomes you're looking for in memo, even if you have that though, the question that becomes, how do you evaluate that consistently across 10,000 memos? And I think this is the difference between back testing of ML data set versus gen AIs, you need a way to actually go back and validate that what is produced works. And I think that has been the real challenge that the enterprise has struggled with is you may have a sense for what good looks like.
Starting point is 00:07:49 You might say, for example, the definition of a good investment memo would be like at least a paragraph summary of competitive set, some context on the market, including growth rates. Like you could set a set of parameters that you're looking for to answer, but then you have to wait through and kind of assess all that. And so what we've spent the last eight years doing for the model builders and others is building what's called semi-private custom evals, where we effectively set parameters like that that would say, these are the definitions of good,
Starting point is 00:08:12 this is the outcomes we're looking for. And then we use human feedback to score those parameters. So we could go at big scale and say, does this outcome cover what you're looking for? And we bring subject matter experts to Bayer to actually do that scoring. I think that's actually been the big gap is these are often things you can't score
Starting point is 00:08:30 with a random person in the street. You can't just put it into market and hope it works. You need a subject matter expert to say, this looks generally good before any organization gets comfortable launching it. One way I've seen enterprises do that, is I've seen a couple of customers experiments already, is they'll actually have their own employees
Starting point is 00:08:44 evaluating this at huge scale. But if you think about the time suck of like having large numbers of people just reviewing general outputs, that's very hard to do. So I think a lot of what we've now evolved to is on the enterprise side, a mix of these kinds of evals and assessments of the models that are happening that we then help customers fine tune and improve their models. Is there a gap between what a generalist searcher might want and somebody domain specific?
Starting point is 00:09:04 In other words, if I'm making a hundred million dollar investment decision based on a memo, that has to be much better than if I want to find out if, you know, dogs could eat a certain type of food and, you know, what's the best practice for raising a healthy dog. Post earnings reports are more than just a data dump. They're goldmine of opportunities waiting to be unlocked. With the Lupa, you could turn those opportunities into actionable insights. DeLupa's dynamic scenario building tools integrate updated earnings data, letting you model multiple strategic outcomes such as what happens if a company revises guidance. With automated sensitivity analysis, you could quickly understand the impact of key variables
Starting point is 00:09:41 like cost pressures, currency fluctuations, or interest rate changes. This means you'll deliver more actionable insights for your clients, helping them navigate risks and seizing opportunities faster. Ready to enrich your post earnings narratives? Visit dalupa.com slash how that's DALOOPA.com slash how today to get started. One of the questions you're asking here is what is the bar or the risk bar for production depending on the use case? And I do think it's different.
Starting point is 00:10:10 As an example, if the goal of a chat bot is just to say something like review restaurants and it's a consumer facing, the risk bar on that is, does it say anything toxic? Is there any bias? You can put some risk parameters around it, but you don't really need kind of subject matter expert feedback on it. I'll give an interesting one, legal, like law, the bar for accuracy on that is materially different than a consumer example. And many law firms are experimenting with this, but it's hard to assess something like
Starting point is 00:10:35 a debt covenant agreement without subject matter experts weighing in on its scale. The outputs are consistently good. How do AI models evolve when the parameters change? Is this something that will always be needed to be refreshed? There are two ways that models generally be consumed. Many models that will be consumed by consumers that are effectively just going to be what the model builders produced. Usually the way the enterprise is using these models is they're tailoring those models to their corpus of information. I'll give an example. Let's say that you have two wealth managers. Let's say I'm going to make up to Fidelity and T-Row Price, and they want to use, you
Starting point is 00:11:07 know, experiment with things like Roe would buy through your question answering. They're not going to just use an off the shelf framework for that. They're going to tailor it off of all of the information that exists in their communications and their training documentation. Any model that's being trained at the enterprise is usually being trained off of the internal knowledge management corpus that that institution has. And so you're using the large language model from the model builder and you're tailoring it to your specific context. That process is called fine tuning.
Starting point is 00:11:30 Prior to becoming CEO of Invisible, you headed McKinsey's Quantum Black Labs, which is their AI labs. What did you do at McKinsey? I focused on three main things. One, have all of our large scale data transformation, data lake house, data warehouse builds. So the first thing I mentioned, which is if your data is messy, it's very hard to use AI. I spent a lot of time focusing on that. I spent a lot of time doing custom application development. So building all sorts of different applications, whether that be for retention, pricing, contact centers, kind of software, custom software that people could use to deploy models.
Starting point is 00:11:59 And I do think that's an understated part of a lot of this is there is what a model does, but then there is the way that somebody can understand it and interpret it. And a lot of that is the user interface by which they consume it. And so I think that's something the enterprise is spending a lot of time on is what is the user interface by which people consume and think about and make decisions around these models. And then the third area, I oversaw the Gen. AI lab, which is McKinsey's kind of global Gen. AI tool build. When I was there, we were doing anything from 220, 240 Gen.i builds at the top. I want to double click on the enterprise side of what you did at McKinsey.
Starting point is 00:12:30 You mentioned those two high profile use cases for successful enterprises built. Did you build any successful enterprise use cases while you were at McKinsey? We definitely did. And there's a public case you can reference that a couple of folks in Quantum Black build for ING, where it's effectively a chat bot. And one of the things they mentioned in it, very similar to what I'm saying now is a lot of what was required to put that in production was getting it to 99% accuracy. So they had a lot of parameters and fine tuning that you get around testing, quality controlling it, building audit
Starting point is 00:12:56 LLMs to make sure that the outcome was good. We definitely did a lot of that, but the rough math you see across the industry is about 8% of GNI models make it to production. And that's broad-based. The amount that kind of stall around the proof of concept pilot phase is pretty material. I think it will get better over time, but to date, the challenge has been a lot of things I mentioned, challenging data, an unclear definition of good, and an unwillingness from the folks in the field to actually use and believe the outcomes of the models. And I think that's going to take time. There's this concept in the productivity space, which is 80% done is 100% good. Is there like an 80-20 rule here where you could use AI to solve many things
Starting point is 00:13:34 and dramatically decrease your need for sales representatives for customer support? And does it have to be a hundred percent good? That is a really complicated question. So there's another analogy we shall use, which would be manufacturing lines in a factory. And so if I ask you the question, if I have 10 people on a manufacturing line, and every one of them saves
Starting point is 00:13:54 5% of their time, what is the line savings? I'm guessing half a person. Zero. Because you can't take out a line, and you can't take, no person can be taken off that line. You effectively just move to a world where everyone has a little bit more free time. And I think that's the challenge. The 20 years. Things like co pilots have had a very. Interesting kind of last 2 years in that they are helpful coding co pilots, legal co pilots, all these things, but it's unclear the degree to which they actually save any work. They kind of tweak a lot of things on the margin.
Starting point is 00:14:27 And I think the difference I'd say in 80-20 is, I think to do that well, you actually have to re-engineer processes. You have to say, what does my end-to-end workflow look like for claims processing or whatever that might be? And how do I take out two full steps to actually get to a better level of efficiency? That's hard to use a software tool for.
Starting point is 00:14:43 You need kind of people on your team to think about the workflow design. You need to redesign the actual process flow. That's been a bit of the challenge of the last two years is a lot of people have just focused on all different types of co-pilots across all different industries. And I think that's helpful. But I think the next phase of this is actually process redesign and moving to ways where you can actually totally restructure the way a line works as an example. An AI native solution. The framework is not just that AI is replacing what a human is doing, but how would you design the model with AI in mind?
Starting point is 00:15:11 Most of the material benefit you're going to see is when you clean sheet any process to be like, how would I design this process knowing all the AI tools I have from scratch and how do I use both technology and humans? And by the way, I think the example for that is going to involve both for a long long time and humans are a core part of this solution. I think Invisible, we believe that's the human machine interface where all the value sits, but it's not necessarily just giving all your people on an existing process and a tool. It's redesigning the process to use all the tools at your disposal. Thank you for listening. To join our community and to make sure you do not
Starting point is 00:15:42 miss any future episodes, please click the follow button above to subscribe. So let's talk about Invisible. Give me some specifics on how the company is doing today. We ended 2024 at 134 million in revenue. Profitable. We were the third fastest growing AI business in America over the last three years. You just joined as CEO. What is your strategy for the next five to 10 years? And how do you even conceptualize a strategy given how fast the industry is changing? We've had explosive growth in the current kind of core of the business, which is AI training. And we plan to continue to focus on that.
Starting point is 00:16:15 Our goal is to work with all the model builders to get these models as accurate as possible and support them any way we can with lots of human feedback. So if you think about what Invisible has there, we have this kind of AI process platform where we trot out an individual task into a set of stages and then insert kind of feedback analytics and all of those different steps. We then have the AI training and evals motion I described, which is a set of modules. On the back of that, we have a labor marketplace where we can source all of those 5,000 different expert agents on any given topic. The core of that will remain our focus. The shift I envision are kind of twofold. One, deepening our focus on using that for fine tuning the enterprise. This is something I think all the model builders
Starting point is 00:16:49 are hopeful as before as well, is the more that we can help all of the enterprise clients figure out how to get the most of their model builds they're focused on, how to get those working, that's better for everyone. Everyone is hoping to see many more examples. And I, by the way, am very, very optimistic that over the next five, six years,
Starting point is 00:17:04 we're gonna see many, many more examples of great GNI use cases in production. It's just been, I think, a period of learning. The last two years have been kind of a proof of concept phase for the enterprise. Really helping the enterprise get many of those in production is a core focus for us. The other big area that I'm going to evolve Invisible into is the analogy I would use is we're going to build a modern service now anchored in GNI. So Invisible invisible process platform will include much more data infrastructure. It'll include an application development environment and process builder tools and it'll include our eight kind of our really, really good services delivery team around that. So one belief I have is that it's very hard to do any of this with push of a button.
Starting point is 00:17:38 I think the age of software has kind of relied on the idea that you build something and people take that as is and I think AI is much more around configuring and customizing different workflows exactly for any given customer wants. You can envision what Invisible will evolve into as kind of an AI process platform with lots of process builder tools where people can build very sector-specific applications like claims for insurance or onboarding for food and beverage or fund admin for private equity. So you'll have a bunch of different verticalized use cases we'll go after and a lot of really interesting core data infrastructure tools like data ontology, master data management, things like that to help people get their data working. How do you avoid being the victim of your own success? So you come in enterprises, you streamline their AI models using the services model.
Starting point is 00:18:23 How do you avoid making yourself obsolescent? The funny piece of context outside there is 70% of the software in America is over 20 years old. The rate of modernization of that has been glistening slow. I know there's been a lot of kind of hype that says suddenly the whole world is gonna be hyper modern, everything's gonna work in two years. I think this is a long journey over the next two decades
Starting point is 00:18:41 where we get to a world where every enterprise runs off of modern infrastructure, modern tech stacks and functions much like the digital company, digitally native companies and built up over the last five years. That will take time to get to, but I'm very excited about what our platform can do to enable that. I said another way, your total addressable market is every enterprise for minus the two that have built models. Even in those two companies, I'm sure they're, they're looking to streamline
Starting point is 00:19:03 other parts of the business. I think that's right. The interesting thing, if you look at what I would call the application modernization market, so all of the modernization of legacy systems, it happens annually. No player right now is more than 3% of that. So it's actually a very fragmented market that is painfully slow in how it moves. And it's the main frustration points for most enterprises. Like if you ask the average CEO in any company that's over 10 years old, how happy are they with their core data, the kind of tools they use
Starting point is 00:19:30 on a daily basis, most are pretty frustrated. So I don't think this is something where the existing is really good and everyone's really happy. There's a lot of frustration that we are hoping to help fix. And I think Gen. AI will be the root of doing a lot of that. I think there's a lot of tooling you can do to generate insights faster, to pull up reporting faster. And so we will be a Gen.ai native kind of application development platform.
Starting point is 00:19:52 You have a very unique vantage point in that you're the CEO of one of the fastest growing AI companies. You ran McKinsey's lab. Walk me through the AI ecosystem today in terms of how you look at the ecosystem. ecosystem today in terms of how you look at the ecosystem? I've talked to a bunch of VCs about this in the past couple of days, the infrastructure layer, which is where most of the capital is going today. And that's a mix of kind of things like data centers, as well as the model builders, and you asked about the gap to kind of investment today to enterprise modernization, the challenges that above the infrastructure layer, you have what I call the application layer, which is individual tools for individual use cases. Right.
Starting point is 00:20:23 And that could be, I mentioned claims, it could be legal services. It's all the verticalized applications that exist anchored in GenAI to solve problems. All of those applications today, for the most part are SaaS or traditional software based. So they are designed like all software's last 20 years to be a, a, kind of a push button deployment of a specific use case that functions like traditional SaaS software. I am skeptical that that is actually going to be the way that impact is realized with GNI for a couple different reasons.
Starting point is 00:20:50 Software as a paradigm has existed that way because the idea was it took so long to get data schemas organized and structured and it took so long to build any custom tool that you had to invest all the money upfront in building a perfect piece of software. Once you got data locked in on that software, it was very hard for anyone to ever migrate off of that. The term of this is your systems of record. Once you're locked in on any sort of a system of record, whether that be an ERP system, whether it be an HR database,
Starting point is 00:21:15 you basically never leave as an enterprise because the data is really painful to move. And so that's been the conventional wisdom on how to build software for a long time. You've had some really public examples. Satya Nadella mentioned it. What GENI-I may enable is a movement where the value moves from the system of record layer to the agentic layer.
Starting point is 00:21:34 So you actually move to a world where people don't say on software that's sticky just because of the data. They actually want the best possible software for their specific institution. So you might have a world where people are building tooling that is much more custom to their enterprise. You might have a world where I have a React workflow that uses analytics that are customized to my enterprise in a cloud environment.
Starting point is 00:21:54 And I can stand that up in a couple months. And I think that paradigm is a very different way forward for technology. Now, I'm sure there are some that would dispute that. I'm sure there's some that will say software will exist as it always has, but I would say that the main feedback I heard from a lot of V would dispute that. I'm sure there are some that will say software will exist as it always has. But I would say that the main feedback I heard from a lot of VCs is that most of the application layer today focuses on the standard software paradigm.
Starting point is 00:22:13 And I think we're looking at something very different, which is we want to have kind of an application development environment with a lot of configurability and customizability, the ability to build verticalized applications for specific sectors. That will allow us to say not this is our tool, take it, but much more, what is the workflow you need? To bring that to life, let's say a company is looking, a Fortune 500 company is looking to create a CRM. What would an AI native CRM look like for a Fortune 500 company
Starting point is 00:22:40 versus just using a Salesforce? What you would usually end up doing in that world is you'll look at Salesforce or Dynamics or ServiceNow has one of these now, and you will buy out of the box functionality. Like you'll buy, let's say, their contact center tooling, but then you will end up customizing a fair amount of that to your enterprise. So you'll say, my contact center is going to have this flow for services, this over calls. And so even though you're buying the tool, you're going to spend a year customizing and configuring it for your workflow. CRM is a little bit different in that you do have several large players, Salesforce Dynamics and ServiceNow
Starting point is 00:23:14 now, that have built fairly good builder applications for that use case. If you're successful as CEO of Invisible, what will the company look like in 2030? I use the analogy now because I have a ton of respect for what they've done. My main North Star metric is that every GEN.AI model we work on will reach production. And so I'm really excited about working with all the model builders over the next couple of years to continue to fine tune and train their models and get that working at huge scale in the enterprise. And I think that's something that we will be a huge driver of.
Starting point is 00:23:41 What would you like our audience to know about you, about Invisible Technologies, or anything else you'd like to share? What I don't want any of what I've said to Kim McCrawls is pessimistic. There's nobody that believes that AI will be more positive for the enterprise over the next five to 10 years. I think the last two years did not live up to the hype cycle, partly because there was a belief that you could just buy a product out of the box, push a button and suddenly all your gen.i will work.
Starting point is 00:24:01 My kind of advice or view on the path forward is I don't think that will be the paradigm I think every enterprise will have to build some capabilities around what I want to get out of these models How do I train and validate these models? How do I make sure my data is adequately reflected in these models? And that's a very doable thing when we sit here five ten years from now There'll be some really exciting deployments in this like the ability to stand up new software new digital workflow companies based on Genie is gonna expand significantly like the ability to stand up new software, new digital workflow, companies based on GNI is going to expand significantly.
Starting point is 00:24:26 But I do think it's been a bit of a reality check for the last two years that, you know, this is not like I just stand up a piece of software, push a button and everything works. How should people follow you and Invisible? You can add me on LinkedIn. I'll be posting about some of the updates we'll be adding there. We're building kind of a, I call it data insights function, Invisible as well. We're going to start to bring as much of the truth that we're seeing and what's exciting and what we recommend to our enterprise clients so we can help them navigate what is a very
Starting point is 00:24:51 complex and difficult world. Thank you for listening to my conversation with Matt. If you enjoyed this episode, please share with a friend. This helps us grow and also provides the best feedback when we review the episode's analytics. Thank you for your support.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.