The AI Daily Brief: Artificial Intelligence News and Analysis - The Models Trying to Fill the Fable Gap

Starting point is 00:00:00 Today on the AI Daily Brief, the models trying to replace Fable. Before that in the headlines, what we learned about AI and global politics at the G7. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Section, Assembly, and Out Systems. To get an ad-free version of the show, go to patreon.com. or you can subscribe on Apple Podcasts. To learn more about sponsoring the show, send us a note at sponsors at aidelebrief.a.i.

Starting point is 00:00:39 You should also check out the new AIDilybrief.A.I.0.A. Anyways, one of the big things that I have heard from folks is that they want easier ways to share specific parts of these episodes with folks inside their organizations. So that's what we've tried to build with a new website. It divides every episode up into dozens of short, easily shareable cards. Lastly today, there is a link down in the show notes to check out a preview of something that is coming soon, training.b.b.a.a.a.a. If you've been following along with the AIDB learning program journey, keep an eye there for some more announcements to come soon. With that, let's talk G7. Now, spoiler alert, we do not have any particularly big updates when it comes to when we're getting

Starting point is 00:01:17 Fable 5 back and the resolution between Anthropic and the U.S. government. However, what we did have was a number of the key players all in the same room as a slew of AI leaders joined the usual heads of state at this year's G7 meeting in France. Sam Altman, Demis Hasabas, met as Alexander Wang, and yes, Dario Amadeh were all present as part of the U.S. contingent. France brought along Mistral CEO Arthur Mench, while co-hear CEO Aidan Gomez attended as part of the Canadian delegation.

Starting point is 00:01:45 Another half-dozen executives from regional AI champions also attended. Now, it is not unusual for corporate executives to attend this sort of diplomatic and trade meeting, but it is certainly the first time that G7 has seen such a heavy representation from the AI industry. and frankly, their attendance makes even more sense in the context of the U.S. government's effective banning of mythos and fable. At a meeting that is all about international cooperation, for the first time, the global community is reckoning with the idea that access to U.S.-made frontier models is not a given. Now, the pivotal discussion came to a closed-door lunch meeting focused on AI and innovation, flanking Donald Trump on either side were Google DeepMind CEO Demis Asabas and OpenAI Sam Altman, with Anthropics Dario Amade being on the exact opposite side of the table,

Starting point is 00:02:28 next to France is Emmanuel Macron. At the meeting, Dario Amadeh and Demis Hesavas reportedly led the call for international cooperation on AI risk with the U.S. taking the lead. In his address, Amade said that international cooperation should include structured access to frontier models, chip trade deals that exclude China, and a unified approach to AI risks, including cyber attacks and bioterrorism. Amade implored G7 leaders to, quote, resist the temptation to splinter over the deployment of advanced AI. Meanwhile, while Canadian Prime Minister Mark Carney, who has recently pushed sovereign AI policies and called for cooperation between AI middle powers, agreed that the U.S. could lead an AI coalition. Francis McGrane voiced the concerns of European leaders warning that the Trump administration had

Starting point is 00:03:09 now made it clear that the U.S. government holds the AI kill switch. He told reporters after the meeting that he made a forceful plea for the U.S. to not keep frontier AI to themselves. Macron said the U.S. and Europe have shared interest in keeping the technology from authoritarian regimes. So let us move forward together, he commented. our relevant agencies must first cooperate so that in the areas of security and cybersecurity, we have a smooth government-to-government relationship.

Starting point is 00:03:32 Sam Altman was aligned on this view that AI is now in the domain of government and regulation should not just be left to corporate policy alone. He said that the technology must be shaped by people, democratic institutions, and society as a whole, not just in his words by the companies building the most capable systems. Altman added, We need an international forum for discussion that establishes globally accepted standards for testing, provides expert in impartial analysis of capabilities and risks, and serves as a venue for cooperation among nations.

Starting point is 00:03:59 Open AI head of global affairs, Chris Lehane, framed the discussion as moving towards international regulation. He said there really is a coalescing around a forum or a space for the different democratic countries to be able to work together to ultimately see if there's a way to establish some type of AI safety standards. Lahain said the U.S. would lead this body, adding, the ability to generate or create standards would be an avenue or pathway helping to ensure ongoing and continued access to frontier models. Now, all of this is well and good. Kind of platitudinal, but what do you expect from a G7 meeting?

Starting point is 00:04:27 But when it came to the rubber hitting the road, i.e. global axos to mythos, it doesn't appear that the U.S. government gave any ground. Now, there's essentially no commentary on what the U.S. delegation said during the meeting itself. President Trump made some classically generic comments in a press conference stating that the meeting was, quote, excellent, and that AI is, quote, going to be the biggest thing ever. We have to be careful with it. It's both great and could be bad. We have to be careful with it, but we're leading China. We're leading the world on that. The European commentary struck a very different tone. Euro News framed the mood in EU policy circles as particularly sour.

Starting point is 00:05:00 They noted that European leaders expected to be discussing the need to form a united front against China and they need to rebuild AI supply chains to route around that Eastern superpower. Instead, they found themselves pleading for access to frontier models that are viewed as critical to securing shared financial infrastructure. Thomas Rainier, the European Commission spokesperson for tech sovereignty, said, We are a trusted partner. I would challenge you to find a more trusted partner than Europe. We got news that UK Prime Minister Kier Starmor had requested a carve out for British nationals and companies from the export control restrictions and was denied.

Starting point is 00:05:31 And perhaps for that reason, even as they try to get access to Fable and Mythos, there is clearly a shift in European thinking as well. Italian politician Brando Benefay said it plainly commenting, The Anthropic kill switch shows that tech sovereignty was never abstract. The G7 should not lock allies into competing AI dependencies. Europe must cooperate with the U.S., Canada and democratic partners, but from a position of strength. Still, ultimately, it's becoming very clear that strength in the AI era comes from one thing, putting GPUs on racks. And on that front, Europe is lagging badly behind. In April, the European

Starting point is 00:06:04 Commission unveiled a grand plan to build up to five AI gigafactories to support the training of frontier models. The only problem is that only 20 billion euros were committed to the project, which expects to deploy around 100,000 GPUs. For comparison, the hyperscalers are on track to spend three times as much every month building out AI data centers in the U.S. Now, when it came to the mythos situation specifically, when President Trump was asked by a reporter about the negotiations with Anthropic, he said simply, they're going fine. Looking over at Commerce Secretary Howard Lutnik, Lutnik reiterated, going fine. Now, meanwhile, as this was all happening, reporting from Wired added some context to the China

Starting point is 00:06:42 dimension of the export control ban. The TLDR was that when Anthropic expanded access to mythos a few weeks ago, one of the companies that got access was Korean Telecom giant SK Telecom. The U.S. government concerned about ties to China, ordered the company to revoke SK Telecom's access a few days before the ban, adding at least some credence to the China reasoning for the ban, as opposed to just personality politics. Now, what one thinks of SK Telecom's supposed connections to China is a different matter. Satrini analyst Jukin writes,

Starting point is 00:07:10 My God, I'm honestly beyond disappointed with the Trump administration. SK Telecom is absolutely nothing to do with Huawei or China. In fact, the only Korean telecom operator that uses Huawei equipment is LG. In D.C., China-linked company is sometimes a real thing and sometimes an utterly thought-terminating process. When I heard it was a Korean company, I immediately thought SK Telecom and went, ah, yes, the network where some of the most valuable IP in the entire AI hardware field is being transacted daily. When all is said and done, I don't think anyone was particularly surprised at the amount of talk versus action from a G7 meeting. I think what people found notable was the extent to which

Starting point is 00:07:43 the White House's actions around Anthropic have shifted the tone globally, and how little we got from the White House about any sort of timeline or sense of how things are actually going when it comes to the anthropic issue. I would say that on average, most people who are watching this just had their timelines for when we get Fable back extended, not shortened. Now, moving back into the AI industry itself, we got a really big personnel move with legendary AI researcher Nome Shazir leaving Google to join OpenAI.

Starting point is 00:08:12 In 2017, Shazir was one of the lead authors on the seminal research paper, Attention is All You Need, which introduced the transformer architecture and kicked off the entire LLM revolution. In 2021, after Google refused to release a chatbot of his design, Shazir left the company to found Character AI. In 2024, Google rehired Shazir as the technical lead on the Gemini project, and in order to retain Shazir, Google spent 2.7 billion licensing Character AIs technology in one of the first big aquahire deals of the modern AI era. In short, Shazir is one of an elite group of

Starting point is 00:08:45 AI researchers that can command a multi-billion dollar investment, right up there with Andre Carpathy, Noam Brown, and Ilya Sutskiver. And yet, less than two years after Google paid up for Shazir, he's already out the door. Sam Altman has said this move has been a long time coming posting. Noam is one of the people I have most wanted to work with since the very beginning of Open Open AI. It only took 10 years. I think it will be worth the wait. Open AI reportedly told employees that Shazir would be working creating new architectures for AI models. Google, meanwhile, was magnanimous in losing one of the world's preeminent researchers, a spokesperson said,

Starting point is 00:09:15 We're grateful for Nome's meaningful contributions to Google over the years, and we wish him well. Still, for many of the news raises even more questions about the future of Google's AI roadmap. The rumor mill has been awfully quiet about the release of Gemini 3.5 Pro, which they said would be coming in June. Uchenjin wrote, Noam's leaving Google makes Gemini's future feel uncertain. More than one deep-mind person has told me Nome saved Gemini. There's even lore that he tweaked a few lines of training code and Gemini's quality instantly jumped. Gemini's coding ability still feels behind.

Starting point is 00:09:45 I really hope Gemini can find its way back to its former glory. We need more model choices. Now, one more little product update from OpenAI. The quest to remove the side quest continues, as Chatchapit announces that they'll be suns setting Pulse. Pulse was introduced last year and served as a daily AI briefing. Users could tune Pulse to generate relevant daily content based on their interests. Open AI said the feature would be removed within the next two weeks. and encouraged users to build their own daily briefing using scheduled tasks.

Starting point is 00:10:12 Now, OpenAI is presenting this as an expansion of the feature set, coupling the removal of Pulse to the expansion of the more generalized scheduled tasks feature. As part of that expansion, scheduled tasks will now be available to all paid chat GPT subscribers, even those on the cut price go tier. Now, for some, this paints a very clear picture of what type of users OpenAIs prioritizing now. On the announcement thread, ChatGPT subscriber Diav wrote,

Starting point is 00:10:35 After sunsetting 4-5 and Pulse, will there be any reason to keep pro-subscription for someone who is not a coder and has zero interest in codex? And the short answer is, I'm not sure that OpenAI cares right now. That's going to do it for this slightly extended edition of the headlines. Next up, the main episode. One of the most important AI questions right now isn't who's using AI. It's who's using it well. KPMG in the University of Texas at Austin just analyzed 1.4 million real workplace AI interactions, and found something surprising.

Starting point is 00:11:10 The highest impact users aren't better prompt engineers. They treat AI like a reasoning partner. They frame problems, guide thinking, iterate, and push for better answers. And the good news? These behaviors are teachable at scale. If you're trying to move from AI access to real capability, KPMG's research on sophisticated AI collaboration is worth your time. Learn more at KPMG.com slash us slash sophisticated.

Starting point is 00:11:34 That's KPMG.com slash sophisticated. Here's a harsh truth. Your company is probably spending thousands or millions of dollars on AI tools that are being massively underutilized. Half of companies have AI tools, but only 12% use them for business value. Most employees are still using AI to summarize meeting notes. If you're the one responsible for AI adoption at your company, you need Section. Section is a platform that helps you manage AI transformation across your entire organization. It coaches employees on real use cases, tracks who's using AI for business impact, and shows you exactly where AI is and isn't creating value. The result, you go from rolling out tools to driving measurable AI value. Your employees

Starting point is 00:12:14 move from meeting summaries to solving actual business problems, and you can prove the ROI. Stop guessing if your AI investment is working. Check out section at sectionaI.com. That's S-E-C-T-I-O-N-AI.com. You know Assembly AI for having the most accurate streaming speech-to-text out there, but they just want a step further and launched a full voice agent API. The idea is simple, One connection and they handle everything, the listening, the thinking, the speaking. You just stream audio in and get your agent's voice response back. We're talking about things like outbound sales calls that actually qualify leads, customer support that handles complex requests without a script,

Starting point is 00:12:52 scheduling agents that sound like a human assistant, and you can build one in five minutes with one API. And importantly, their streaming model is the best at catching all the stuff that breaks on other voice agents, things like phone numbers, emails, names, and medical terms. And for those of you who are still in experimentation mode, there are no contracts and unlimited concurrency so you can actually test it out without any friction. Head to assemblyaI.com slash brief

Starting point is 00:13:12 and try the live voice agent demo right there on the site, no sign up needed. This episode of the AI Daily Brief is brought to you by OutSystems, a leading Agendic Systems platform built for the enterprise. Organizations all over the world are building, orchestrating, and governing agentic systems

Starting point is 00:13:28 on the OutSystems platform and with good reason. OutSystems Open and Unified Platform allows teams to architect, deliver, and scale governed agentic systems with agility. Teams of any size and technical depth can use OutSystems to build, deploy, and manage AI apps and agents quickly and cost-effectively without compromising reliability and security. Without Systems, you can rapidly launch ideas from concept to completion.

Starting point is 00:13:51 It's the leading Agendic Systems platform that is unified, agile, and enterprise proven, allowing you to accelerate growth, reduce operational friction, and deliver real enterprise impact with AI. OutSystems. Build your Agentic Future. As the discussions between Anthropic and the U.S. government continue, we are firmly in the fallout phase of the Fable 5 loss. Indeed, even by Monday, when Fable hadn't come back online by the beginning of the work week and the markets opening, it was pretty clear that this was going to end up being a bigger fight than just a weekend annoyance. Over at the G7 meetings this week, we started to see the

Starting point is 00:14:27 geopolitical ramifications of the Fable shutdown, with Europe in particular and other U.S. allies trying to figure out both where they fit within the U.S.'s prioritization and what they needed to do to, on the one hand, retain access to U.S. models while also not being totally reliant on the U.S. Over among AI builders, meanwhile, while the first couple days were disbelief in mourning, since then it's been all about what sort of systems we can McGiver together to get close to Fable-type performance. Now, for organizations and enterprises, the question is even more interesting. While few, if any, enterprises had actually shifted any sort of meaningful workflows over to Fable, it was yet more fuel to the fire of needing to think beyond just blindly

Starting point is 00:15:05 using whatever the most powerful state-of-the-art model is. Now, up until the Fable Banning, the reason that that conversation had started among enterprises was not a question of access, but a question of cost. As agentic workloads actually came online, people's AI bills were going up in meaningful ways, and that led many, if not most organizations using AI extensively to start to think about more comprehensive strategies that, again, weren't just slapping the most powerful model on top of every single use case they had. And yet what's very clear, as we now come up on a week of Fable being gone, is that the question of the very very value of the state of the art is higher than ever.

Starting point is 00:15:37 Now, for some, this is a market question. For a couple of months now, one of the lurking bare narratives among investors has been that if American frontier models remain comparatively over expensive compared to cheaper Chinese models, at some point the concern was, buyers would just shift their behavior, and all of a sudden that revenue that seemed at least for a time to be justifying the big infrastructure buildout might no longer be as durable. For individual organizations, though, who aren't thinking about market implications, it is a moment in which many are considering alternative.

Starting point is 00:16:05 Chubby on X pointed to headlines in Bloomberg and CNBC and wrote, All the major news outlets agree. The biggest winner in the anthropic controversy is open source. Whether it's Bloomberg, Fortune, or CNBC, the consensus is clear. As Bloomberg put it, making the model open means that companies, governments, or organizations with sufficient hardware can run it locally and never have to worry about it being yanked on a whim. In short, what companies are recognizing is that using open weight or open source models is potentially not just a cost issue, but also one of predictability around access. If we're getting to the point where the power of AI is such that governments are going to have

Starting point is 00:16:40 kill switches, that makes it very hard to build mission-critical workflows around. As Satrini Research put it, the risk of the government deciding that a model is too dangerous should only add to the reasons why open-source models running on local hardware can be a reasonable alternative. So what are some of the new model solutions, China or otherwise, that companies are starting to look towards. Well, the first thing that we should note is the Chinese labs themselves are certainly taking advantage of this particular moment. Right around the same time that Fable came out, we got Kimmy 2.7 Code. The official account wrote, Kimi K2.7 Code, our latest coding model, is now

Starting point is 00:17:15 released in open-sourced. Compared to 2.6, it saw about a 22% improvement on Kimmy CodeBench v2, 11% on Program Bench. They also argued that it had new reasoning efficiency, with 30% lower reasoning token usage compared to 2.6. Clearly, cost is a consideration, both in terms of raw per token cost, but also in terms of efficiency and how many tokens a model uses to solve a problem. Unfortunately, for many who tried it, it seemed like the benchmarks didn't totally match the reality. As opposed to some past Kimmy models, people aren't really raving about this one yet and are finding some issues. Venturebeat argued that while teams who are using K2.6 in production right now can swap in 2.7 code and immediately

Starting point is 00:17:54 expect lower inference costs, for people who aren't using Kimi, this won't necessarily be the reason to switch over. Putting a fine point on that, on the Agent Arena Leaderboard, 2.7 code ranked 19th overall and only sixth among open models. Another model that's getting some buzz in the last 24 hours or so is called ViBThinker 3B from Waybo AI. And this one has people talking because of the ease with which you can actually run it on local hardware. Orcus 108 wrote, What is happening in AI? A 3 billion parameter model just put up coding benchmark scores in the same league as Claude Opus 4.5. 3 billion. The weights are on hugging face, anyone can test it.

Starting point is 00:18:30 I genuinely don't know if this is a breakthrough or if the benchmarks are broken. Now, I don't want to assume everyone knows what a $3 billion parameter model is, but TLDR, it's very, very small. The frontier models are now well into the trillions of parameters, so you're talking about something that is a tiny fraction of the size. Now, it seems like what's going on here is that this is super tuned for reasoning and really bad at knowledge. Software engineer Drew Black wrote,

Starting point is 00:18:53 This is territory I started researching too. It's great to see something out in the wild. Take a small model and crank its reasoning power up to a small model. 11. Then knowledge can live outside the model in a database. Something like this reduces the hardware and power requirements needed to run it. Humans don't know everything all the time, so why should an AI model? It just needs the intelligence to figure things out. So once again, here we have something that is not really at this stage for enterprises, but is all pointing in this similar trajectory of the viability of smaller and open models running on local hardware. The Chinese open model

Starting point is 00:19:24 getting the most buzz right now by far is ZAI's GLM 5.2. Bridgemind AI wrote, Two days ago, the U.S. banned Claude Fable 5. Yesterday, China dropped GLM 5.2. Today, GLM 5.2 is number one on Bridgebench and number one on reasoning beating Fable 5 at 1 tenth of the cost and 300 tokens per second. You cannot export control your way out of an open source race. The ban didn't slow China down. Indeed, a lot of the early coverage has been around GLM 5 to beating models like GPD-5

Starting point is 00:19:54 on a variety of highly valuable tasks, including long horizon coding tasks, for a fraction of the cost. On the front-end code arena, arena.a.I found GLM 5.2 behind Fable 5, but ahead of all the opus models. And on design arena, GLM 5.2 even went ahead of Fable 5. Now, in some tests, there was evidence at least a little bit of benchmark maxing. In other words, where a model is specifically tuned to try to do well on benchmarks for exactly the sort of first impression. AI entrepreneur Bindu Reddy wrote, GLM 5.2 is mind-blowingly good on benchmarks. Yes, it even beats Opus 48 and GPT-55 on some of them.

Starting point is 00:20:33 However, it is also bench-maxed. Internal evals have it behind them. Still, a huge win for open-source AI. And when it came specifically to certain use cases like design, a lot of folks are sharing examples where you don't have to trust the benchmarks and you can just see it with your own two eyes. Hassan from Together wrote, This model is insane at design.

Starting point is 00:20:52 I asked GLM-5-2 and Opus 48 to build me a landing page and you can't even tell the difference. GLM costs 6 cents while Opus costs 49 cents. More than 6x cheaper while being faster and more token efficient. Another win for open source AI. Now, obviously, there is a lot of talk about this model being distilled from Anthropic. Pete Cooper wrote, GLM 5.2 is absolutely convinced that it is actually clawed.

Starting point is 00:21:14 When I tell it that it's GLM 5.2, it refuses to believe me. So the argument here is not that all of a sudden everyone is going to run out and start using GLM 5.2, but that there is more consideration for doing so than they're has been because of the gap left by Fable 5. Gmoney wrote, Is anyone running GLM 5.2 locally on a Mac Studio? How is it?

Starting point is 00:21:34 Seriously considering a Mac Studio for the first time now. Now, when it comes to actual changes in enterprise behavior, the maybe bigger news is that Microsoft is apparently considering using a locally hosted fine tune of DeepSeek V4 to power co-pilot co-work. Microsoft, as we know, has moved co-work to usage-based pricing and is thus looking for a way to provide cheaper access to their enterprise customers. Based on Axios reporting, it seems like Microsoft isn't just fixated on Deepseek. They're trying a variety of open source models as lower cost alternative to the models

Starting point is 00:22:06 that are coming out of Anthropic and OpenAI. At the same time, this doesn't seem like a theoretical plan, with Axios reporting that Microsoft says it expects to make a lower cost model available in the coming weeks. Reporter Deer Dribosa wrote, Was only a matter of time, Deepseek already hosted on Hyperscaler Cloud since last year. Microsoft moving closer to it for Enterprise just normalizes Deepseek and gives cover for others to embrace and adopt it, which is already happening. Bigger question, does this give the Chinese stack a foot in the door in the U.S. since Deep Seek is optimizing for Huawei chips? As an aside, Gail Weiner pointed out the

Starting point is 00:22:37 absolute irony of this when it comes to U.S. policy. The U.S. government bans Fabal 5 and Mythos 5 worldwide because frontier models are too dangerous to let foreigners touch, won't even exempt the U.K. because the threat model says the weights themselves are a national security asset. And simultaneously, the most deeply embedded U.S. enterprise software company on Earth quietly fine-tunes a Chinese model and prepares to ship it inside the productivity stack of every Fortune 500 that runs Microsoft 365. Importantly, though, it's not just raw Chinese models that are potentially changing the business AI landscape. One of the models that we've talked about most recently, specifically around the token efficiency question, is cursor's Composer 2.5. Now, this one was built on a foundation of one of the Kimi models, but was post-trained to be

Starting point is 00:23:18 specifically good at coding tasks, with very impressive results in the benchmarks, scoring up in the range of Opus 47 and GPT-55. Now, of course, the most important thing is not just the benchmark scores, but the fact that it does so at a fraction of the cost of either those comparable models. And now that it's been a couple of weeks, we're starting to get reports from the ground around how good composer is in practice, not just on the benchmarks. Talking about his experience with Composer 2.5, Ryan Shaw wrote, I haven't bothered with anything else in weeks.

Starting point is 00:23:45 Stronger than 5-5 medium oftentimes, even though I know nobody believes me. Engineer Yasser writes, Composer 2.5, for a dollar it scored 65%, Fable, for $12, it scored 70%. Why would I use Fable for only a 5% increase in pay 12x the price? And yet, of course, this has not been everyone's experience. Ethan Novak wrote, We're trying out Cursor's Composer 2.5, results are not what I expected. I found the model in directly changing files and items without my approval.

Starting point is 00:24:11 Opus 48 doesn't go on a rogue UI overhaul off one prompt versus Composer. Many people told me to try it for UI, but I'm not seeing effective results. And after artificial analysis updated its benchmarks to be more focused on agendic coding tasks, throwing out some of the more saturated benchmarks, Composer 2.5 fell fairly significantly, being closer to the open Chinese models like GLM 5.1 as compared to where it previously was around GPD-55 and Opus 47. Now, another interesting experiment that has a lot of people excited is OpenRourters Fusion API. They're calling it the smartest compound model in the market, achieving fable-level intelligence

Starting point is 00:24:47 at half the price. OpenRouter writes, We benchmarked Fusion on 100 hard research tasks and found one, panels of model consistently outperform individual models, two, beyond frontier performance can be achieved with frontier panels, and three, panels of budget models can surpass frontier models at a much lower cost. So basically what you have here is an API that routes model tasks more automatically and performs better or comparable to state-of-the-art models based on that routing. Explaining how it works, they write,

Starting point is 00:25:16 When you send a prompts to Fusion, we fan it out to a panel of models in parallel, each with web search and bash tools enabled. A judge model reads every response and extracts the structure. Then a synthesizer writes the final answer grounded in that analysis. Shawnee-Matthew wrote, This seems to be pretty huge and validates the future where each model will be called upon to do the specific tasks that they excel at for intelligence versus cost tradeoffs versus all other models. As each lab further gets better at specific tasks, this will become more of the default

Starting point is 00:25:42 assuming labs and model providers allow for this. Multimodel becomes the default and model panels or councils are the way. Investor Inesha Chari writes, Been saying for a bit that this compound architecture makes sense for both aggregators and labs. Model consumers want the right capability and cost controls on a per task and even perhaps per token basis. Labs need to protect otherwise depreciating model assets and one way to do it is to selectively expose new features via specific token paths. E.G. I've long thought slash ultra code was a mythos class model exposed via 4.X. The compound workflow matches the way many of us work, which is using adversarial

Starting point is 00:26:15 models to generate, review, iterate, and test. More generally, where all this is going feels like it will be labs vertically integrating to sell capabilities so that they can capture the downstream economics from their models without exposing them to distillation. Which brings me to one of the most interesting experiments where some of these ideas are being put into practice, which is around Harvey. explaining the background, Harvey President and co-founder Gabe Perriero wrote, The belief a few months ago was that model costs were having every six months,

Starting point is 00:26:39 meaning tokens would get cheap and so application layer companies would need to find a way to charge for the value of tokens by selling the worker services. What actually happened is AI got much more expensive than people realized at the time. The shift from chat to agents led to an explosion in costs. One user could trigger hundreds of agents, and each of those agents could trigger more agents. Agents started running longer and more autonomously. On top of that, frontier models like Mythos are getting more expensive, not less. The problem for application layer companies, like Harvey, is how do you take that large token cost and convert it into something useful for your customers? A rough analogy as every company is about to get the ability to hire infinite employees. The main challenge is going to be

Starting point is 00:27:13 figuring out how to manage those employees and make your business model work the same way it did with human employees. For Harvey, this means we don't have to become a services company. The infrastructure for every law firm to deploy, train, and manage a large number of agents is going to be so complex that model and cloud providers and law firms likely won't build all of it. Now, this dovetails perfectly with an experiment that Harvey recently discussed, where they worked with fireworks to build a worker-advisor agent. The idea, as described by head of applied research, Nico, is that an open weight, in this case, GLM 5.1 worker delegates high stakes in complex tasks to a closed frontier advisor. With their test, this was Opus 4.7. This combination of models not only allowed them to do

Starting point is 00:27:52 things much more cheaply than just using Opus 4.7, but actually got increased performance as well. Patrick Gojo writes, The insight isn't that open source beat frontier, it's that smart routing beat brute force. Using the most expensive model for every task is not a quality strategy. It's a laziness task. The team's building routing layers that send each task to the right model at the right cost are now demonstrably ahead on both dimensions simultaneously. Infference optimization just became a first-class competitive advantage.

Starting point is 00:28:18 Now, Harvey says that this is just the beginning of their experimentation with models, and I think that they are going to be the vanguard for something that lots of others start experimenting with as well. We don't know when Fable is coming back. What's clear is that as powerful as it is, in a world where frontier costs continue to go up, companies are simply going to have to start to get more sophisticated about the combinations of models they use to get the best results. If we are looking for a bright side to what is an incredibly confusing and chaotic situation,

Starting point is 00:28:46 it's that it puts more emphasis on the point that this sort of inference optimization and token efficiency exploration was coming for us no matter what. And now companies have more of a chance to get ahead of it than they might have otherwise had, when everyone could just get lost in the sauce of the glory of Fable Five. Anyways, interesting trends to continue to watch. For now, that's going to do it for the AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The Models Trying to Fill the Fable Gap

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.