The AI Daily Brief: Artificial Intelligence News and Analysis - The Models Trying to Fill the Fable Gap
Episode Date: June 18, 2026As the fallout from the Fable shutdown continues, the AI world is racing to figure out what comes next: Chinese open models, Cursor’s Composer, OpenRouter Fusion, and new routing strategies that pro...mise frontier-level performance at lower cost. NLW looks at why the loss of Fable may accelerate the shift toward token efficiency, model diversity, and smarter enterprise AI architecture. In the headlines: G7 leaders debate frontier model access, Noam Shazeer leaves Google for OpenAI, and ChatGPT sunsets Pulse.Sneak preview: http://training.besuper.ai/Brought to you by:KPMG – Research from KPMG and the University of Texas at Austin shows the highest-impact AI users treat AI like a reasoning partner — and those skills can be taught at scale. Learn more at kpmg.com/us/SophisticatedSection - Section turns AI investment into workforce transformation and ROI - https://www.sectionai.com/Outsystems - Stop wondering how AI will change your business and start building the agents that will lead it - http://outsystems.com/Scrunch - The AI customer experience platform - https://scrunch.com/Zenflow Work - Agents for knowledge work - https://zenflow.free/Blitzy - Want to accelerate enterprise software development velocity by 5x? https://blitzy.com/MissionCloud - Eliminate AWS complexity with end-to-end cloud and AI services https://www.missioncloud.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefRobots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Our Newsletter is BACK: https://aidailybrief.beehiiv.com/Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, the models trying to replace Fable.
Before that in the headlines, what we learned about AI and global politics at the G7.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, KPMG, Section, Assembly, and Out Systems.
To get an ad-free version of the show, go to patreon.com.
or you can subscribe on Apple Podcasts.
To learn more about sponsoring the show, send us a note at sponsors at aidelebrief.a.i.
You should also check out the new AIDilybrief.A.I.0.A. Anyways, one of the big things that I have
heard from folks is that they want easier ways to share specific parts of these episodes with
folks inside their organizations. So that's what we've tried to build with a new website.
It divides every episode up into dozens of short, easily shareable cards.
Lastly today, there is a link down in the show notes to check out a preview of something that is
coming soon, training.b.b.a.a.a.a. If you've been following along with the AIDB learning program
journey, keep an eye there for some more announcements to come soon. With that, let's talk G7.
Now, spoiler alert, we do not have any particularly big updates when it comes to when we're getting
Fable 5 back and the resolution between Anthropic and the U.S. government. However, what we did have
was a number of the key players all in the same room as a slew of AI leaders joined the usual heads of state
at this year's G7 meeting in France.
Sam Altman, Demis Hasabas,
met as Alexander Wang, and yes,
Dario Amadeh were all present as part of the U.S. contingent.
France brought along Mistral CEO Arthur Mench,
while co-hear CEO Aidan Gomez attended as part of the Canadian delegation.
Another half-dozen executives from regional AI champions also attended.
Now, it is not unusual for corporate executives to attend this sort of diplomatic and trade meeting,
but it is certainly the first time that G7 has seen such a heavy representation from the AI industry.
and frankly, their attendance makes even more sense in the context of the U.S. government's effective banning of mythos and fable.
At a meeting that is all about international cooperation, for the first time, the global community is reckoning with the idea that access to U.S.-made frontier models is not a given.
Now, the pivotal discussion came to a closed-door lunch meeting focused on AI and innovation,
flanking Donald Trump on either side were Google DeepMind CEO Demis Asabas and OpenAI Sam Altman,
with Anthropics Dario Amade being on the exact opposite side of the table,
next to France is Emmanuel Macron. At the meeting, Dario Amadeh and Demis Hesavas reportedly led the call
for international cooperation on AI risk with the U.S. taking the lead. In his address, Amade said
that international cooperation should include structured access to frontier models, chip trade deals that
exclude China, and a unified approach to AI risks, including cyber attacks and bioterrorism. Amade implored
G7 leaders to, quote, resist the temptation to splinter over the deployment of advanced AI. Meanwhile,
while Canadian Prime Minister Mark Carney, who has recently pushed sovereign AI policies and called for
cooperation between AI middle powers, agreed that the U.S. could lead an AI coalition.
Francis McGrane voiced the concerns of European leaders warning that the Trump administration had
now made it clear that the U.S. government holds the AI kill switch.
He told reporters after the meeting that he made a forceful plea for the U.S. to not keep
frontier AI to themselves.
Macron said the U.S. and Europe have shared interest in keeping the technology from authoritarian
regimes.
So let us move forward together, he commented.
our relevant agencies must first cooperate so that in the areas of security and cybersecurity,
we have a smooth government-to-government relationship.
Sam Altman was aligned on this view that AI is now in the domain of government
and regulation should not just be left to corporate policy alone.
He said that the technology must be shaped by people, democratic institutions, and society
as a whole, not just in his words by the companies building the most capable systems.
Altman added,
We need an international forum for discussion that establishes globally accepted standards for testing,
provides expert in impartial analysis of capabilities and risks,
and serves as a venue for cooperation among nations.
Open AI head of global affairs, Chris Lehane, framed the discussion as moving towards international
regulation. He said there really is a coalescing around a forum or a space for the different
democratic countries to be able to work together to ultimately see if there's a way to establish
some type of AI safety standards.
Lahain said the U.S. would lead this body, adding,
the ability to generate or create standards would be an avenue or pathway helping to ensure
ongoing and continued access to frontier models.
Now, all of this is well and good. Kind of platitudinal, but what do you expect from a G7 meeting?
But when it came to the rubber hitting the road, i.e. global axos to mythos, it doesn't appear that
the U.S. government gave any ground. Now, there's essentially no commentary on what the U.S.
delegation said during the meeting itself. President Trump made some classically generic comments
in a press conference stating that the meeting was, quote, excellent, and that AI is, quote,
going to be the biggest thing ever. We have to be careful with it. It's both great and could be bad.
We have to be careful with it, but we're leading China. We're leading the world on that.
The European commentary struck a very different tone.
Euro News framed the mood in EU policy circles as particularly sour.
They noted that European leaders expected to be discussing the need to form a united front
against China and they need to rebuild AI supply chains to route around that Eastern superpower.
Instead, they found themselves pleading for access to frontier models that are viewed
as critical to securing shared financial infrastructure.
Thomas Rainier, the European Commission spokesperson for tech sovereignty, said,
We are a trusted partner. I would challenge you to find a more trusted partner than Europe.
We got news that UK Prime Minister Kier Starmor had requested a carve out for British nationals and
companies from the export control restrictions and was denied.
And perhaps for that reason, even as they try to get access to Fable and Mythos, there is clearly
a shift in European thinking as well.
Italian politician Brando Benefay said it plainly commenting,
The Anthropic kill switch shows that tech sovereignty was never abstract.
The G7 should not lock allies into competing AI dependencies.
Europe must cooperate with the U.S., Canada and democratic partners, but from a position of
strength. Still, ultimately, it's becoming very clear that strength in the AI era comes from one thing,
putting GPUs on racks. And on that front, Europe is lagging badly behind. In April, the European
Commission unveiled a grand plan to build up to five AI gigafactories to support the training of
frontier models. The only problem is that only 20 billion euros were committed to the project,
which expects to deploy around 100,000 GPUs. For comparison, the hyperscalers are on track to spend
three times as much every month building out AI data centers in the U.S.
Now, when it came to the mythos situation specifically, when President Trump was asked by a
reporter about the negotiations with Anthropic, he said simply, they're going fine.
Looking over at Commerce Secretary Howard Lutnik, Lutnik reiterated, going fine.
Now, meanwhile, as this was all happening, reporting from Wired added some context to the China
dimension of the export control ban.
The TLDR was that when Anthropic expanded access to mythos a few weeks ago,
one of the companies that got access was Korean Telecom giant SK Telecom.
The U.S. government concerned about ties to China, ordered the company to revoke SK Telecom's access
a few days before the ban, adding at least some credence to the China reasoning for the ban,
as opposed to just personality politics.
Now, what one thinks of SK Telecom's supposed connections to China is a different matter.
Satrini analyst Jukin writes,
My God, I'm honestly beyond disappointed with the Trump administration.
SK Telecom is absolutely nothing to do with Huawei or China.
In fact, the only Korean telecom operator that uses Huawei equipment is LG.
In D.C., China-linked company is sometimes a real thing and sometimes an utterly thought-terminating
process. When I heard it was a Korean company, I immediately thought SK Telecom and went,
ah, yes, the network where some of the most valuable IP in the entire AI hardware field is being transacted daily.
When all is said and done, I don't think anyone was particularly surprised at the amount of talk
versus action from a G7 meeting. I think what people found notable was the extent to which
the White House's actions around Anthropic have shifted the tone globally,
and how little we got from the White House about any sort of timeline
or sense of how things are actually going when it comes to the anthropic issue.
I would say that on average, most people who are watching this
just had their timelines for when we get Fable back extended, not shortened.
Now, moving back into the AI industry itself,
we got a really big personnel move with legendary AI researcher Nome Shazir
leaving Google to join OpenAI.
In 2017, Shazir was one of the lead authors on the seminal research paper,
Attention is All You Need, which introduced the transformer architecture and kicked off the
entire LLM revolution.
In 2021, after Google refused to release a chatbot of his design, Shazir left the company
to found Character AI.
In 2024, Google rehired Shazir as the technical lead on the Gemini project, and in order
to retain Shazir, Google spent 2.7 billion licensing Character AIs technology in one of the
first big aquahire deals of the modern AI era. In short, Shazir is one of an elite group of
AI researchers that can command a multi-billion dollar investment, right up there with Andre Carpathy,
Noam Brown, and Ilya Sutskiver. And yet, less than two years after Google paid up for Shazir,
he's already out the door. Sam Altman has said this move has been a long time coming posting.
Noam is one of the people I have most wanted to work with since the very beginning of Open
Open AI. It only took 10 years. I think it will be worth the wait. Open AI reportedly told
employees that Shazir would be working creating new architectures for AI models.
Google, meanwhile, was magnanimous in losing one of the world's preeminent researchers,
a spokesperson said,
We're grateful for Nome's meaningful contributions to Google over the years, and we wish him well.
Still, for many of the news raises even more questions about the future of Google's AI roadmap.
The rumor mill has been awfully quiet about the release of Gemini 3.5 Pro, which they said would be coming in June.
Uchenjin wrote,
Noam's leaving Google makes Gemini's future feel uncertain.
More than one deep-mind person has told me Nome saved Gemini.
There's even lore that he tweaked a few lines of training code and Gemini's quality instantly jumped.
Gemini's coding ability still feels behind.
I really hope Gemini can find its way back to its former glory.
We need more model choices.
Now, one more little product update from OpenAI.
The quest to remove the side quest continues, as Chatchapit announces that they'll be suns setting Pulse.
Pulse was introduced last year and served as a daily AI briefing.
Users could tune Pulse to generate relevant daily content based on their interests.
Open AI said the feature would be removed within the next two weeks.
and encouraged users to build their own daily briefing using scheduled tasks.
Now, OpenAI is presenting this as an expansion of the feature set,
coupling the removal of Pulse to the expansion of the more generalized scheduled
tasks feature.
As part of that expansion, scheduled tasks will now be available to all paid chat GPT
subscribers, even those on the cut price go tier.
Now, for some, this paints a very clear picture of what type of users OpenAIs prioritizing
now.
On the announcement thread, ChatGPT subscriber Diav wrote,
After sunsetting 4-5 and Pulse, will there be any reason to keep pro-subscription for someone who is not a coder and has zero interest in codex?
And the short answer is, I'm not sure that OpenAI cares right now.
That's going to do it for this slightly extended edition of the headlines.
Next up, the main episode.
One of the most important AI questions right now isn't who's using AI.
It's who's using it well.
KPMG in the University of Texas at Austin just analyzed 1.4 million real workplace AI interactions,
and found something surprising.
The highest impact users aren't better prompt engineers.
They treat AI like a reasoning partner.
They frame problems, guide thinking, iterate, and push for better answers.
And the good news?
These behaviors are teachable at scale.
If you're trying to move from AI access to real capability,
KPMG's research on sophisticated AI collaboration is worth your time.
Learn more at KPMG.com slash us slash sophisticated.
That's KPMG.com slash sophisticated.
Here's a harsh truth. Your company is probably spending thousands or millions of dollars on AI tools that are being massively underutilized.
Half of companies have AI tools, but only 12% use them for business value.
Most employees are still using AI to summarize meeting notes.
If you're the one responsible for AI adoption at your company, you need Section.
Section is a platform that helps you manage AI transformation across your entire organization.
It coaches employees on real use cases, tracks who's using AI for business impact, and shows you exactly where AI is and isn't creating
value. The result, you go from rolling out tools to driving measurable AI value. Your employees
move from meeting summaries to solving actual business problems, and you can prove the ROI.
Stop guessing if your AI investment is working. Check out section at sectionaI.com. That's
S-E-C-T-I-O-N-AI.com. You know Assembly AI for having the most accurate streaming speech-to-text
out there, but they just want a step further and launched a full voice agent API. The idea is simple,
One connection and they handle everything, the listening, the thinking, the speaking.
You just stream audio in and get your agent's voice response back.
We're talking about things like outbound sales calls that actually qualify leads,
customer support that handles complex requests without a script,
scheduling agents that sound like a human assistant,
and you can build one in five minutes with one API.
And importantly, their streaming model is the best at catching all the stuff that breaks on other voice agents,
things like phone numbers, emails, names, and medical terms.
And for those of you who are still in experimentation mode,
there are no contracts and unlimited concurrency
so you can actually test it out without any friction.
Head to assemblyaI.com slash brief
and try the live voice agent demo right there on the site,
no sign up needed.
This episode of the AI Daily Brief
is brought to you by OutSystems,
a leading Agendic Systems platform built for the enterprise.
Organizations all over the world
are building, orchestrating,
and governing agentic systems
on the OutSystems platform
and with good reason.
OutSystems Open and Unified Platform
allows teams to architect, deliver,
and scale governed agentic systems with agility.
Teams of any size and technical depth can use OutSystems to build, deploy, and manage
AI apps and agents quickly and cost-effectively without compromising reliability and security.
Without Systems, you can rapidly launch ideas from concept to completion.
It's the leading Agendic Systems platform that is unified, agile, and enterprise proven,
allowing you to accelerate growth, reduce operational friction, and deliver real enterprise
impact with AI.
OutSystems. Build your Agentic Future.
As the discussions between Anthropic and the U.S. government continue, we are firmly in the fallout
phase of the Fable 5 loss. Indeed, even by Monday, when Fable hadn't come back online by the beginning
of the work week and the markets opening, it was pretty clear that this was going to end up being a bigger
fight than just a weekend annoyance. Over at the G7 meetings this week, we started to see the
geopolitical ramifications of the Fable shutdown, with Europe in particular and other U.S. allies
trying to figure out both where they fit within the U.S.'s prioritization and what
they needed to do to, on the one hand, retain access to U.S. models while also not being totally
reliant on the U.S. Over among AI builders, meanwhile, while the first couple days were
disbelief in mourning, since then it's been all about what sort of systems we can McGiver
together to get close to Fable-type performance. Now, for organizations and enterprises, the question
is even more interesting. While few, if any, enterprises had actually shifted any sort of meaningful
workflows over to Fable, it was yet more fuel to the fire of needing to think beyond just blindly
using whatever the most powerful state-of-the-art model is. Now, up until the Fable Banning,
the reason that that conversation had started among enterprises was not a question of access,
but a question of cost. As agentic workloads actually came online, people's AI bills were
going up in meaningful ways, and that led many, if not most organizations using AI extensively
to start to think about more comprehensive strategies that, again, weren't just slapping the
most powerful model on top of every single use case they had. And yet what's very clear,
as we now come up on a week of Fable being gone, is that the question of the very very
value of the state of the art is higher than ever.
Now, for some, this is a market question.
For a couple of months now, one of the lurking bare narratives among investors has been
that if American frontier models remain comparatively over expensive compared to cheaper
Chinese models, at some point the concern was, buyers would just shift their behavior,
and all of a sudden that revenue that seemed at least for a time to be justifying the big
infrastructure buildout might no longer be as durable.
For individual organizations, though, who aren't thinking about market implications,
it is a moment in which many are considering alternative.
Chubby on X pointed to headlines in Bloomberg and CNBC and wrote,
All the major news outlets agree. The biggest winner in the anthropic controversy is open source.
Whether it's Bloomberg, Fortune, or CNBC, the consensus is clear. As Bloomberg put it,
making the model open means that companies, governments, or organizations with sufficient
hardware can run it locally and never have to worry about it being yanked on a whim.
In short, what companies are recognizing is that using open weight or open source models
is potentially not just a cost issue, but also one of predictability around access.
If we're getting to the point where the power of AI is such that governments are going to have
kill switches, that makes it very hard to build mission-critical workflows around.
As Satrini Research put it,
the risk of the government deciding that a model is too dangerous should only add to the
reasons why open-source models running on local hardware can be a reasonable alternative.
So what are some of the new model solutions, China or otherwise, that companies are starting to look
towards. Well, the first thing that we should note is the Chinese labs themselves are certainly
taking advantage of this particular moment. Right around the same time that Fable came out, we got
Kimmy 2.7 Code. The official account wrote, Kimi K2.7 Code, our latest coding model, is now
released in open-sourced. Compared to 2.6, it saw about a 22% improvement on Kimmy CodeBench v2,
11% on Program Bench. They also argued that it had new reasoning efficiency, with 30% lower
reasoning token usage compared to 2.6. Clearly,
cost is a consideration, both in terms of raw per token cost, but also in terms of efficiency
and how many tokens a model uses to solve a problem. Unfortunately, for many who tried it,
it seemed like the benchmarks didn't totally match the reality. As opposed to some past Kimmy
models, people aren't really raving about this one yet and are finding some issues. Venturebeat argued
that while teams who are using K2.6 in production right now can swap in 2.7 code and immediately
expect lower inference costs, for people who aren't using Kimi, this won't necessarily be the reason
to switch over. Putting a fine point on that,
on the Agent Arena Leaderboard, 2.7 code ranked 19th overall and only sixth among open models.
Another model that's getting some buzz in the last 24 hours or so is called ViBThinker 3B from Waybo
AI. And this one has people talking because of the ease with which you can actually run it on
local hardware. Orcus 108 wrote, What is happening in AI? A 3 billion parameter model just put
up coding benchmark scores in the same league as Claude Opus 4.5. 3 billion. The weights are on
hugging face, anyone can test it.
I genuinely don't know if this is a breakthrough or if the benchmarks are broken.
Now, I don't want to assume everyone knows what a $3 billion parameter model is,
but TLDR, it's very, very small.
The frontier models are now well into the trillions of parameters,
so you're talking about something that is a tiny fraction of the size.
Now, it seems like what's going on here is that this is super tuned for reasoning
and really bad at knowledge.
Software engineer Drew Black wrote,
This is territory I started researching too.
It's great to see something out in the wild.
Take a small model and crank its reasoning power up to a small model.
11. Then knowledge can live outside the model in a database. Something like this reduces the hardware
and power requirements needed to run it. Humans don't know everything all the time, so why should an AI
model? It just needs the intelligence to figure things out. So once again, here we have something
that is not really at this stage for enterprises, but is all pointing in this similar trajectory
of the viability of smaller and open models running on local hardware. The Chinese open model
getting the most buzz right now by far is ZAI's GLM 5.2. Bridgemind AI wrote,
Two days ago, the U.S. banned Claude Fable 5.
Yesterday, China dropped GLM 5.2.
Today, GLM 5.2 is number one on Bridgebench and number one on reasoning beating Fable
5 at 1 tenth of the cost and 300 tokens per second.
You cannot export control your way out of an open source race.
The ban didn't slow China down.
Indeed, a lot of the early coverage has been around GLM 5 to beating models like GPD-5
on a variety of highly valuable tasks, including long horizon coding tasks, for a fraction
of the cost. On the front-end code arena, arena.a.I found GLM 5.2 behind Fable 5, but ahead of all the
opus models. And on design arena, GLM 5.2 even went ahead of Fable 5. Now, in some tests,
there was evidence at least a little bit of benchmark maxing. In other words, where a model is
specifically tuned to try to do well on benchmarks for exactly the sort of first impression.
AI entrepreneur Bindu Reddy wrote,
GLM 5.2 is mind-blowingly good on benchmarks.
Yes, it even beats Opus 48 and GPT-55 on some of them.
However, it is also bench-maxed.
Internal evals have it behind them.
Still, a huge win for open-source AI.
And when it came specifically to certain use cases like design,
a lot of folks are sharing examples where you don't have to trust the benchmarks
and you can just see it with your own two eyes.
Hassan from Together wrote,
This model is insane at design.
I asked GLM-5-2 and Opus 48 to build me a landing page
and you can't even tell the difference.
GLM costs 6 cents while Opus costs 49 cents.
More than 6x cheaper while being faster and more token efficient.
Another win for open source AI.
Now, obviously, there is a lot of talk about this model being distilled from Anthropic.
Pete Cooper wrote,
GLM 5.2 is absolutely convinced that it is actually clawed.
When I tell it that it's GLM 5.2, it refuses to believe me.
So the argument here is not that all of a sudden everyone is going to run out and start
using GLM 5.2,
but that there is more consideration for doing so than they're
has been because of the gap left by Fable 5.
Gmoney wrote,
Is anyone running GLM 5.2 locally on a Mac Studio?
How is it?
Seriously considering a Mac Studio for the first time now.
Now, when it comes to actual changes in enterprise behavior,
the maybe bigger news is that Microsoft is apparently considering using a locally
hosted fine tune of DeepSeek V4 to power co-pilot co-work.
Microsoft, as we know, has moved co-work to usage-based pricing
and is thus looking for a way to provide cheaper access to their enterprise customers.
Based on Axios reporting, it seems like Microsoft isn't just fixated on Deepseek.
They're trying a variety of open source models as lower cost alternative to the models
that are coming out of Anthropic and OpenAI.
At the same time, this doesn't seem like a theoretical plan, with Axios reporting that
Microsoft says it expects to make a lower cost model available in the coming weeks.
Reporter Deer Dribosa wrote,
Was only a matter of time, Deepseek already hosted on Hyperscaler Cloud since last year.
Microsoft moving closer to it for Enterprise just normalizes Deepseek and gives cover for others to embrace
and adopt it, which is already happening. Bigger question, does this give the Chinese stack a foot in the
door in the U.S. since Deep Seek is optimizing for Huawei chips? As an aside, Gail Weiner pointed out the
absolute irony of this when it comes to U.S. policy. The U.S. government bans Fabal 5 and Mythos 5 worldwide
because frontier models are too dangerous to let foreigners touch, won't even exempt the U.K.
because the threat model says the weights themselves are a national security asset.
And simultaneously, the most deeply embedded U.S. enterprise software company on Earth
quietly fine-tunes a Chinese model and prepares to ship it inside the productivity stack of every Fortune 500 that runs Microsoft 365.
Importantly, though, it's not just raw Chinese models that are potentially changing the business AI landscape.
One of the models that we've talked about most recently, specifically around the token efficiency question, is cursor's Composer 2.5.
Now, this one was built on a foundation of one of the Kimi models, but was post-trained to be
specifically good at coding tasks, with very impressive results in the benchmarks, scoring up in the
range of Opus 47 and GPT-55.
Now, of course, the most important thing is not just the benchmark scores, but the fact
that it does so at a fraction of the cost of either those comparable models.
And now that it's been a couple of weeks, we're starting to get reports from the ground around
how good composer is in practice, not just on the benchmarks.
Talking about his experience with Composer 2.5, Ryan Shaw wrote,
I haven't bothered with anything else in weeks.
Stronger than 5-5 medium oftentimes, even though I know nobody believes me.
Engineer Yasser writes,
Composer 2.5, for a dollar it scored 65%, Fable, for $12, it scored 70%.
Why would I use Fable for only a 5% increase in pay 12x the price?
And yet, of course, this has not been everyone's experience.
Ethan Novak wrote,
We're trying out Cursor's Composer 2.5, results are not what I expected.
I found the model in directly changing files and items without my approval.
Opus 48 doesn't go on a rogue UI overhaul off one prompt versus Composer.
Many people told me to try it for UI, but I'm not seeing effective results.
And after artificial analysis updated its benchmarks to be more focused on agendic coding tasks,
throwing out some of the more saturated benchmarks, Composer 2.5 fell fairly significantly,
being closer to the open Chinese models like GLM 5.1 as compared to where it previously was
around GPD-55 and Opus 47.
Now, another interesting experiment that has a lot of people excited is OpenRourters Fusion API.
They're calling it the smartest compound model in the market, achieving fable-level intelligence
at half the price.
OpenRouter writes,
We benchmarked Fusion on 100 hard research tasks and found one, panels of model consistently outperform
individual models, two, beyond frontier performance can be achieved with frontier panels,
and three, panels of budget models can surpass frontier models at a much lower cost.
So basically what you have here is an API that routes model tasks more automatically
and performs better or comparable to state-of-the-art models based on that routing.
Explaining how it works, they write,
When you send a prompts to Fusion, we fan it out to a panel of models in parallel,
each with web search and bash tools enabled.
A judge model reads every response and extracts the structure.
Then a synthesizer writes the final answer grounded in that analysis.
Shawnee-Matthew wrote,
This seems to be pretty huge and validates the future where each model will be called upon
to do the specific tasks that they excel at for intelligence versus cost tradeoffs versus all other models.
As each lab further gets better at specific tasks, this will become more of the default
assuming labs and model providers allow for this. Multimodel becomes the default and model panels or
councils are the way. Investor Inesha Chari writes,
Been saying for a bit that this compound architecture makes sense for both aggregators and labs.
Model consumers want the right capability and cost controls on a per task and even perhaps per token
basis. Labs need to protect otherwise depreciating model assets and one way to do it is
to selectively expose new features via specific token paths.
E.G. I've long thought slash ultra code was a mythos class model exposed via 4.X.
The compound workflow matches the way many of us work, which is using adversarial
models to generate, review, iterate, and test.
More generally, where all this is going feels like it will be labs vertically integrating
to sell capabilities so that they can capture the downstream economics from their models
without exposing them to distillation.
Which brings me to one of the most interesting experiments where some of these ideas are being
put into practice, which is around Harvey.
explaining the background, Harvey President and co-founder Gabe Perriero wrote,
The belief a few months ago was that model costs were having every six months,
meaning tokens would get cheap and so application layer companies would need to find a way to charge for the value of tokens
by selling the worker services. What actually happened is AI got much more expensive than people realized at the time.
The shift from chat to agents led to an explosion in costs. One user could trigger hundreds of agents,
and each of those agents could trigger more agents. Agents started running longer and more autonomously.
On top of that, frontier models like Mythos are getting more expensive, not less.
The problem for application layer companies, like Harvey, is how do you take that large token
cost and convert it into something useful for your customers? A rough analogy as every company
is about to get the ability to hire infinite employees. The main challenge is going to be
figuring out how to manage those employees and make your business model work the same way it did
with human employees. For Harvey, this means we don't have to become a services company. The infrastructure
for every law firm to deploy, train, and manage a large number of agents is going to be so complex
that model and cloud providers and law firms likely won't build all of it. Now, this dovetails perfectly
with an experiment that Harvey recently discussed, where they worked with fireworks to build a
worker-advisor agent. The idea, as described by head of applied research, Nico, is that an open
weight, in this case, GLM 5.1 worker delegates high stakes in complex tasks to a closed frontier
advisor. With their test, this was Opus 4.7. This combination of models not only allowed them to do
things much more cheaply than just using Opus 4.7, but actually got increased performance as well.
Patrick Gojo writes,
The insight isn't that open source beat frontier, it's that smart routing beat brute force.
Using the most expensive model for every task is not a quality strategy.
It's a laziness task.
The team's building routing layers that send each task to the right model at the right cost
are now demonstrably ahead on both dimensions simultaneously.
Infference optimization just became a first-class competitive advantage.
Now, Harvey says that this is just the beginning of their experimentation with models,
and I think that they are going to be the vanguard for something
that lots of others start experimenting with as well.
We don't know when Fable is coming back.
What's clear is that as powerful as it is, in a world where frontier costs continue to go up,
companies are simply going to have to start to get more sophisticated
about the combinations of models they use to get the best results.
If we are looking for a bright side to what is an incredibly confusing and chaotic situation,
it's that it puts more emphasis on the point that this sort of inference optimization
and token efficiency exploration was coming for us no matter what.
And now companies have more of a chance to get ahead of it than they might have otherwise had,
when everyone could just get lost in the sauce of the glory of Fable Five.
Anyways, interesting trends to continue to watch.
For now, that's going to do it for the AI Daily Brief.
Appreciate you listening or watching as always.
And until next time, peace.
