The AI Daily Brief: Artificial Intelligence News and Analysis - Why Data is the Biggest Barrier to AI Readiness (And What to Do About It)

Starting point is 00:00:00 Welcome back to the AI Daily Brief. Today we are once again doing a special Operators Cut edition of the AI Daily Brief, and it is part two of the Agent Readiness series that we started last week. You might remember the inspiration for this was that I recently did an episode that was all about everything that we've learned as part of the thousands of surveys we've done across just an absolute boatload of agent readiness audits as part of Superintelligent. People were super interested in some of the learnings there, and so I invited Super Intelligence's head of research, Newfar, onto the show to go a little

Starting point is 00:00:30 bit deeper on some of the key topics. Last week, in the kickoff of the three-part series, we talked about culture and what it takes to make an AI-ready culture in your organization. This week, we're plumbing into the single biggest barrier that we see, which is around data and technology. Once again, we're trying to make this extremely practical, useful, and applicable, and I hope it's super useful for you. If you want to dig in deeper and figure out how to discover what parts of this are relevant for your organization and how you might think about your AI plans going forward, shoot me a note at NLW at B-Supert.aI and I will get you in touch with the right people. For now, let's take into part two of the Agent Readiness Series.

Starting point is 00:01:04 Newfar, welcome back to the show for part two of this Agent Readiness Series. Thanks for having me. You know, in the previous session of this series, we discussed the cultural readiness for agent adoption, and we emphasize that it's one of the most important and often under-traded aspect of Agent Readiness. This time, we will focus on the data and the technology readiness for Agent Adoption. Our data shows that there is one clear universal truth, and that is that motivation and ideas and desire for AI outpace the willingness to fix the underlying infrastructure, meaning that the technical readiness scores are frequently the lowest across all the dimensions of our audit.

Starting point is 00:01:42 The question is why most companies get stuck. And sometimes it's around motivation, ideas, and FOMO that gets them not to take the right infrastructure actions, both from data, and technology perspective. And we try to categorize the main archetypes of companies or executives getting these decisions wrong. And the first archetype that we identify is the magpie. These are companies that are, after all the fun and games of building agents, they want to brag it in marketing and social media.

Starting point is 00:02:15 They just can't be bothered with the drudge work of sorting through data and systems. And this is where companies get stuck in pilot hell. and unfortunately there are many, many companies that fall into this category. The other archetype that we're sometimes saying, those are the one that look at all the issues in a typical company, especially one with many legacy processes and systems, and realizing how much they need to do in order to fix the full-blown agent adoption, infrastructure, they become overwhelmed,

Starting point is 00:02:47 sometimes to the point of getting stuck or even paralyzed in analysis paralysis. And then the last one, the monk, like the mom, they look at everything they need to do, and they go about it in a very, very structured manner. And this leads them to initiate long and sometimes very comprehensive kind of foundational projects in order to sort out the data and the processes as a gateway to even start the very first agent use case. And the reality is that in today's market, there is no time to wait for data or infra foundational projects before you start working on a use case. And by the way, these foundational projects didn't really work even when it was prior to the days of Gen AI.

Starting point is 00:03:26 So I didn't believe them then and I don't believe them now. And it's very clear that these three archetypes will not get you far with the desired results for data and technology readiness. And the question is what will? And I think that what you should do here is introduce an approach that does work. And I call it intentional opportunism. And for what I've seen working here, I recommend to take a very pragmatic. approach that blends both opportunistic kind of low-hanging fruit ROI projects with a more structured

Starting point is 00:03:57 vision for the future. So it's about starting now, but starting smart. And the reason why this is the approach that I recommend is, from one hand, I want you to go on the agent a will sooner rather than later, because there is so much to learn just by doing. And also, if a use case is chosen well, the value to be realized fairly quickly will get you way more motivation. And that's the best way for you to get all of your stakeholders on board and everyone to understand what the technology can and cannot do. So for these first couple of use cases, what I can allow you is to get like a free pass or a waiver for serious foundational work and just get the use cases out of the door,

Starting point is 00:04:37 just start working. And this is why I want you to choose the use case that is a good combination of enough feasibility and enough value. But in parallel, and as you start to accumulate more understanding of agents and your shortcomings, you should identify some areas where there is a good justification and clear line of sight to start building more foundations and reusable building blocks to get ready for full-blown agent adoption. So it's not either or it's a both, but being very, very opportunistic at the beginning and then intentionally and gradually getting your infrastructure in order.

Starting point is 00:05:07 So let's walk through the key challenges and the solutions that are related to the data and the technology readiness. And here I'm going to draw a lot on the audit findings, as well as my experience with multiple companies and many, many years working in this field. And we're going to start here with the data readiness. The big issue here is that data fragmentation and quality and access are typically the most common and severe limiters to scaling AI and agents. And this is where many initiatives basically die. In literally all the companies that we've audited, there is some version of this statement.

Starting point is 00:05:41 The internal systems rarely connect well to one another. and it goes way beyond almost the cliche of garbage in, garbage out. If your sales data appears in three different systems and there is no easy way to unify it automatically, not just because the systems are separate, but because the same entity has three different names, this can become a showstopper. So there are two major challenges that I see in most companies. The first is compliance and data privacy concerns. Also, there are many risks because a new technology being adopted by the masses,

Starting point is 00:06:12 often in our case, like a grassroots movement, present multiple challenges. So the data security and privacy are among the most significant risks. But on the flip sides, many companies are so afraid of data leakage that they just want to prove anything. And that also create a problematic stagnation. And moreover, in many cases, the customer contract will include a don't use by AI clause, making it also legally risky and very hard to navigate. Another challenge that we often see with companies is that the business process know-how

Starting point is 00:06:41 is a tribal knowledge that is being maintained by a selected few, leaving others extremely puzzled and dependent on them. And with no documentation and the knowledgeable people often being too busy to share with others, we're left with a process. We cannot teach an agent how to execute. But don't despair. I have a playbook exactly for you. So these are my top five recommendations on what you should do as intentional opportunist to accelerate the data readiness. And pay attention because these are all very important. First and foremost, And fortunately for us, we can now use AI for many data challenges. We can use AI to connect the data entities to a natural language.

Starting point is 00:07:19 We can refine and improve the data. And we can also use or build augmented generation, aka RAG systems, and semantic similarity systems for fetching related pieces of data. This is something that we didn't have prior to the days of Gen AI. And it's extremely powerful way to overcome many of these challenges that I just mentioned. Second is to help you tackle some undocumented data processes. The best thing that you can do for your group or your company is to have the subject matter expert work in front of a recording

Starting point is 00:07:49 whichever device or method you want to record and just work regularly and narrate what they work. Then what you do, you take this video recording of their screen and their narration into your LLM of choice and you get it to output a standard operating procedure. SOP, and then you give it back to the expert just to review. By doing that, you can save significant amount of time of documented the processes, and I've seen multiple companies utilizing this process very well.

Starting point is 00:08:18 With a handful of hours being spent by the experts, they now have fully documented processes and a great starting point for any agent to work on. Next, I want you to identify the common foundational data sources and focus on creating a dedicated solution and easy access, especially for them. So don't go for the niche ones or the very complicated ones. I identify the ones where there will be the highest ROI if you go in and make them accessible. Sometimes even to a third party, I'm seeing many companies taking these extremely important data sources into a unified third-party vendor and giving the access to that.

Starting point is 00:08:56 But focus on them. You don't need to connect everything because it's just going to be a never-ended story. And then I want you to invest in the data cleanup or the radical changes only if the ROI is extremely large. From what I've seen historically, and I mentioned that, most data-like project failed since they try to do everything, which is way too expensive versus what you're getting as a return. But if you are very opportunistic here or selecting the ones that will yield the return on your investment of cleanup and processing, you should get it much more manageable. And lastly, for the newer systems or companies, you can just build it right from the get-go.

Starting point is 00:09:35 Like, you know that agents are coming. We're talking about it all the time. If your company is new or your data source is new or you just have the ability to do a clean slate option, just build it right such that agents will have access. This means data is accessible, logically organized with a lot of metadata everywhere you can put it. And ideally, with as many structured standard operating procedures such that they just will be able to work from that. So these are the main things that you should do in order to be data ready. Let's talk a little bit with regards to the security and governance that we just talked about.

Starting point is 00:10:12 And there are a few kind of non-negotiables when it comes to security and privacy. The main one will be you have to be very, very clear and strict about the data access roles. And in many cases, you don't have even an option but to anonymize the data and use it as best you can. That's unfortunate, but that's the reality in many cases. And finally, and we mentioned it before, the tool unlock is often to create a secure sandbox where employees know that they can experiment within well-defined boundaries without any risks to the company data. And that's the best way for you to make sure that whatever your building does not create an unexpected data leakage. So that's the security part.

Starting point is 00:10:51 And now let's switch gears and talk technology readiness. And here I want to show you that there is a lot of nuance in the key decisions that companies face. And in many cases, it's not like an either or kind of dilemmas. There are dials that you should intentionally set. So the first question that we're often being asked is do we need to centralize the team that is building or purchasing for everyone? or should we let each team build for themselves? And the answer, not surprisingly, probably both. And with how easy it is to build with AI,

Starting point is 00:11:27 teams and individuals want and need to make agent choices in their day-to-day. So that's the reality of the technology, and I don't believe that you should take this power off them. However, there are shared problems, often complex and important that individual teams cannot address on their own. And these are the type of things that the centralized team should do for them. And you should probably define some kind of a boundary condition of what will be handled centrally versus locally based on the value, the complexity and how widespread use case is. And of course, with regards to data access.

Starting point is 00:12:01 And in general, you need to define a very clear read and write permissions to all the systems. In this way, you ensure that sensitive actions are not easily accessible to an unauthorized agent. And for example, only a solution initiated or approved by a central AI team would drive. into, let's say, a finance system or address a sensitive customer, and that's how you're being safe, but also allowing for widespread, spread, build, and more democratization. All right. So the second dilemma is should you allow a point solution or should you have a unified platform that everything is built or purchased on top of that or deployed on top on that?

Starting point is 00:12:40 And here I want to provide a blueprint of how I believe you should structure your agent text. So I think that you should buy or build or wrap a horizontal agent building platform. And ideally, you should have three types of these horizontal platforms such that you can cater to all type of use cases and all type of populations. So you should start with a prompt-based platform. This is going to be the easiest way to build agents by less technical teams. As an example, relevance or Linde or some of the other simpler platforms and announcing some of the major players also getting into this field. You then need to have more automation or low-code platform to allow more flexibility and easy integration with coding.

Starting point is 00:13:22 So those will be the NA-10 and the ZAPIRs and the makes and others. And finally, if you have relevant developers, you need to allow them full flexibility. You want them to be able to integrate packages or frameworks built by others, whether it's like the Google ADK or the Open AI or other packages that make sense to your developers. If you have all three of these platforms enabled, then everyone who wants to build an agent have a platform in which they can work. And these horizontal platforms are very good for a general purpose and a company-wide or bespoke solutions.

Starting point is 00:13:54 But for very specific verticals, these are cases where you have a vertical that is probably not unique and for which there are proven solution out there, I believe that you should customize as needed, rather than competing with the vendor that will probably do a better job than you in something that they do day in and day out. Examples can include legal,

Starting point is 00:14:15 customer support, coding, and so on. And finally, if you want to serve diverse use cases, I encourage you to identify the usable utilities and build them as internal, well-defined services that includes both data and tools access, monitoring, governance, and guardrails. If you serve these to everyone, then you don't just build, you build it well and safe.

Starting point is 00:14:38 There are already a few companies that nailed it, but I have to say that in most cases, companies are still grappling with the best way to do that. I guarantee that if you will create an amazing momentum. The next one that I wanted to talk about, most companies contemplate, should you build or should you buy and when? And in today's ecosystem, this is not the question, in my opinion, since in most cases, it's a build or adapt on top of something that you buy. So this applies to all mundane use cases of automating simple things as well as the more complex use cases.

Starting point is 00:15:14 And the recent data shows that hybrid is probably the most dominant strategy. This is from just a few weeks ago. My pragmatic rules for more bespoke use cases are the following. So first, if the tool covers about 80% of your use case, buy it. Don't try to build it. Second, if there is a native business system, sales force work day, everything that you work with day to day as an example, if they inevitably will build the thing that you need, wait or create a temporary patch. Don't waste precious time to build something that for sure they will come across with because I've seen many cases of glorious build that went up in flame once a major vendor got their act together. And lastly, I think that you should build when there is no solution in the market.

Starting point is 00:16:02 know in the line of sight, or when building it precisely to your needs is a competitive advantage, and you have the skills to get it right. And don't forget that you can always hire a builder, so you don't have to do it alone. For the last dilemma, I wanted to talk about velocity versus quality, and the reality is that those who don't test rigorously pay a very, very heavy price, and often the quickest road to value is the one that slows down for proper evaluation. And I have to say that, having said that, it doesn't have to be endless. So, The key best practices here will be to talk about the topic that everyone is gradually talking more and more about evaluation. And those are datasets and methods to test your AI system.

Starting point is 00:16:41 And while this is an entire topic which I will not go into, I will give you one data point. And that is that most companies with good evas won't share them and view them as their secret source for getting quick and quality results. So if they don't share them, you know that they got something right. This is probably one of the highest ROI areas that you should invest in. Additionally, you can use agents to test other agents, like a QA on QA. This is an emerging practice that shows lots of promise and allows to overcome the problem of human bottleneck to test everything that the agent does. And finally, as mentioned, safe environment to test and to fail and only then to go to

Starting point is 00:17:20 production. So these are the key dilemmas and some solutions. To summarize this dense session, I want to encourage you all as you become intentional opportunist, do that as much as you can when it comes to the data and technology decisions around agent enablement. You don't have to be perfectionist, but also don't be driven by formal or technology hype. You can make the right decisions and create momentum while intentionally making more thoughtful, longer term decisions to create a safe and sustainable agent adoption. In terms of an action plan, I want to encourage you this quarter to be very opportunistic

Starting point is 00:17:55 and launch your next one or two agent projects that combine feasibility with. value. There will be our key to learning future decisions. And following this very first quarter into the next year, form a solid roadmap that takes into consideration also the nuances we discussed and structure them into a more concrete roadmap that focuses on concrete and critical gaps first. And I know that I said a lot, but as an intentional opportunist, you get my permission to aim realistically, get these suggestions about 70% right and you'll be way ahead of the curve. The next time, we'll talk you as case ready. All right. So this is obviously the densest section we have. I mean, not only each slide could be its own 20-minute separate section, each bullet of a slide, I think, could be. So there's a couple things that I wanted to double-click on. The first of which is, this really is over and over and over again. Every time we run analysis across all of our surveys and interviews that we've done, the most common thread is challenges with data. And interestingly, it's data specifically. It's not like people have outdated technology systems.

Starting point is 00:18:59 or things like that. Most people have moved over to modern cloud providers. It's not like there's some massive difference between Azure versus AWS or anything like that that's holding people back. It really is data, right, of issues that you were talking about. And one of the things that I noticed

Starting point is 00:19:13 is that the subtext of basically everything that you said is a posture of moving intentionally, but also not getting stuck by perfection. And you have these three archetypes, but I think the two ends of the spectrums are the ones that stand out the most to me, the folks who are just looking for, kind of the race to pilot demonstration case, first shiny agent to show off. And then on the other

Starting point is 00:19:35 end of the spectrum, frankly, I think there are actually more over on this side than there are even on the shiny side is the folks who are so overwhelmed or so focused on kind of perfection that they will constantly add intermediate steps that aren't actually doing the thing in order to feel like they're making progress without actually moving the ball down the field, right? Just like endless rounds of assessments and revisions and meetings and planning and strategy. versus just ripping off the band-aid and getting in. And so I think it's a valuable framework to try to avoid both ends of that. I think that particularly heading into next year, one of my big beliefs is that, to use your phrase, that magpie idea, the idea of collecting the shiny object and just focusing on your demonstration case, I think that is so out the window. I think that corporations and enterprises are going to have permission next year, even from a trend perspective, to really invest in data, to invest in

Starting point is 00:20:29 context engineering. We've even got this new set of terminology around it, right? Context engineering instead of just data readiness, which is such a much more boring term. We have new tooling and resources, MCP. So I really think that next year is, it's a really good year to invest in some of these foundational things because I think a lot of people are going to be doing it. A lot of the case studies that people are going to be sharing are going to be around this, again, as opposed to just look at how much time we save by replacing people with customer service agents and things like that. I agree. I think a few thoughts. First of all, I've seen within the same company, the pendulum of moving from Magpie to Mountain Area, because they start with something, they realize that data is a big issue, and then they get overwhelmed and go all the way to let's stop everything or almost everything and go create a foundation. So that's also not a good response that some companies have. So find a good place in the middle. Don't sway back and forth just because you can. With regards to the theme of 2026, I'm thinking both, like, a year of foundations, but also a year of value realization. So companies will no longer buy or

Starting point is 00:21:36 build just for formal. They will buy or build because they believe that there is enough value, because they've learned a lot during 2014 and 2025, and now they want to get their act together. So I think there will be a lot of pressure to show value, and thereby they will not be able to just do foundation. But like you said, because of MCP, because of some of these recent developments, Unlike in historical days where data foundation or building a data lake was a daunting task and a very expensive one, you can now have many, many workarounds that will let you do that in a more sustainable way. Having said that, there is no way around doing the drudge work of cleaning your data, connecting your data, and collecting the evils, at least to some extent.

Starting point is 00:22:18 I couldn't agree more, by the way, on the value realization. It's why we're adding products around implementation and deployment benchmarking, not just agent preparation. I think that's going to be a huge theme. So maybe to talk a little bit about this idea of one of the things you had on the data plan slide with sort of these five different bullets is the big theme was don't boil the ocean, right? And this is going to be an enormously overwhelming problem. One of the things that I've seen as an interesting consequence of model context protocol existing MCP, right? So MCP is like a discrete API for a particular data set that plugs easily into agents, right? So if you get a set of data in an MCP server, anyone who's building an agent or anyone who has access to that MCP server can plug in an agent to that really quickly.

Starting point is 00:23:03 It makes things faster. And one of the things that's interesting that I'm noticing enterprises can run with is that it's almost allowing them to look at the data problem incrementally rather than, again, thinking about everything at once. So you'll start to see projects where organizations say, okay, what are our handful of most important sort of buckets of data to get organized and structured in a way that they can be put into an MCP server because from an 80-20 perspective, there are going to be a lot of where agents and AI are going to be useful for us. And I think things like those sort of standards coming on, which again, allow people to be more, still intentional, but more incremental about data organization could be a really valuable part of this journey for them. Yes, and, but they'll still need to, in some cases, do some work around the

Starting point is 00:23:48 data. So the fact that it's connectable more easy doesn't mean that if you have loads of garbage or that none of your data entities are defined the same, it won't work like out the door. No, absolutely. But at least it becomes a manageable project for doing that rather than looking at 287 disconnected data sources that you somehow have to all bring together and rationalize before you feel like you can make progress. Right. And probably don't put them all on one sheet because you will be overwhelmed and we want to switch jobs if you look at it like that. One of the other sort of trends that I think is worth noting just because it was a point. part of what you said, and we didn't even have that much of a chance to dig into it, is there is a

Starting point is 00:24:28 very broad realization that evals are going to be an important part of the next phase and actually understanding how these tools perform in reality. There was a study, I'm going to get it wrong, it was maybe by Iconic, which is an investment firm and an asset manager out of San Francisco. It started as a wealth management firm. It's morphed into sort of more broad asset management. But they did a study of basically how the company's building AI use AI. And one of the really interesting and revealing parts of that was people were asked about e-vals and more than a quarter of the people who were surveyed. And these are senior leaders in these organizations. Again, that are building AI said that they either weren't using any sort of evals in terms of their

Starting point is 00:25:11 own internal use cases or they didn't know what it was. So even the companies that are building AI, who you would think are like most forward are behind when it comes to evals. And that's slowly starting to catch up. This is a major trend. You heard Open AI talk about it a little bit at their Dev Day announcement. So it's a big theme. So for people who are thinking about how to evaluate these systems, you're not going to be off on your own in the wilderness. There is going to be an increasingly large set of people thinking about this and companies building towards it, I think. Yeah, it is. And I think that up until now, the ones that got it right, it was like the remote. So they didn't talk about it a lot. And now that kind of people are opening this

Starting point is 00:25:48 black box of some of these more successful companies that are able to deliver very quickly and change models and make decisions very quickly, they now realize that they just have an amazing evals, an amazing evaluation system, and others are trying to catch it. Amazing. Well, I think what I'm most excited for with this episode is to see where people reach out and ask for kind of the deeper dive on some part of this topic. But for now, we'll wrap there and move over into the third part of this series, which is all about use cases. Thank you again. Thanks.

The AI Daily Brief: Artificial Intelligence News and Analysis - Why Data is the Biggest Barrier to AI Readiness (And What to Do About It)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.