The AI Daily Brief: Artificial Intelligence News and Analysis - Anthropic Accidentally Revealed Their Most Powerful Model Ever
Episode Date: March 27, 2026Intercom and Cursor have both shown that post-training open-weight models on domain-specific interaction data can match or beat the best frontier models — cheaper and faster. It's a development ...that could reshape the business model of the major AI labs and validate the idea that experience data, not just scale, is the next frontier of model performance. In the headlines: Anthropic's Claude Mythos model leaks, Google drops a real-time voice model, Shopify launches Tinker, and OpenAI shelves adult mode.Brought to you by:KPMG – Agentic AI is powering a potential $3 trillion productivity shift, and KPMG’s new paper, Agentic AI Untangled, gives leaders a clear framework to decide whether to build, buy, or borrow—download it at www.kpmg.us/NavigateMercury - Modern banking for business and now personal accounts. Learn more at https://mercury.com/personal-bankingRecall - The API for meeting recording. Get Get started today with $100 in free credits at https://www.recall.ai/aidbAIUC-1 - Get your agents certified to communicate trust to enterprise buyers - https://www.aiuc-1.com/Blitzy - Want to accelerate enterprise software development velocity by 5x? https://blitzy.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefRobots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Our Newsletter is BACK: https://aidailybrief.beehiiv.com/Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, are we entering the era of vertical AI models?
Before that in the headlines, a big leak with Anthropic confirming the existence of Claude Mythos,
what they call by far the most powerful AI model we've ever developed.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, KPMG, Blitzy, assembly, and robots and pencils.
To get an ad-free version of the show, go to Patreon.com.
slash AI Daily Brief. And if you are interested in sponsoring the show, send us a note at sponsors
at AIdailybrief.aI.com. Late breaking one last night, a data leak revealed that Anthropic
is testing a new model referred to as Claude Mythos. Anthropic has confirmed the existence
of this model, with a spokesperson saying that it was a step change, their words, in performance,
and quote, the most capable we've built to date. They said the model is currently being trialed
by early access customers. So here's what happened. On Thursday evening, a
draft blog post describing the model was left in an unsecured publicly searchable database.
The blog post says we've finished training a new AI model, Claude Mythos.
It's by far the most powerful AI model we've ever developed.
Mythos, they write, is a new name for a new tier of model,
larger and more intelligent than our Opus models, which were until now, are most powerful.
We chose the name to evoke the deep connective tissue that links together knowledge and ideas.
Compared to our previous best model, Claude Opus 4.6,
mythos gets dramatically higher scores on tests of software coding, academic reasoning,
and cybersecurity among others. In preparing to release Claude Mythos, however, they say,
we want to act with extra caution and understand the risks it poses, even beyond what we learn
in our own testing. In particular, we want to understand the model's potential near-term risks
in the realm of cybersecurity and share the results to help cyber defenders prepare.
Mythos is also a large compute-intensive model. It's very expensive for us to serve and will be
very expensive for our customers to use. We're working to make the model much more efficient
before any general release. For those reasons, we're taking a slower, more gradual approach to
releasing Mythos than we have with our other models. We're beginning with a small number of
early access customers who will explore the model cybersecurity applications and report back what they
find. Now, this blog post is very undercooked. It ends not too long after that. Now, if you hear
the term Capybara thrown around, apparently the model was also referred to as that. I'm not sure
if Capybara was the codename and Mythos is the intended launch name. But regardless, this draft
blog post was in a cache of unsecured documents. In total Fortune reports, there appear to be
close to 3,000 assets linked to Anthropics blog that had not previously been published.
Now, there is a lot of chatter about this one, not least of which is the choice of name,
which many people associate with the Cthulhu mythos, which, given how much the AI safety folks
use those sort of literary reference points to describe their concerns about AI may not be
the most advised name. People also compared it to the recently revealed spud from OpenAI,
with Jason Botterill writing, I like how Anthropics' mysterious booky-new model is codenamed
mythos, while Open AI named there's after a freaking potato.
Still, the broader sentiment was captured by Gavin Purcell, who says,
it will only go faster from here.
Obviously, there will be a lot to watch with this one.
Unfortunately, for those of us who want to get our hands on the most powerful models
at any given time, it kind of looks like the blog post was not even announcement of the
release of the model, just an advanced warning about it, so who knows how long it'll take
before we actually see it in practice.
Now, one model that is available now, Google has dropped a small voice model that could have
big implications. The model is Gemini 3.1 Flash Live, which brings real-time dialogue to voice
models. Up until now, most voice models have been turn-based, causing awkward stumbles and terrible
interruption handling. Flash Live is designed to work more like a human conversation, with a
continuous back-and-forth rather than a jarring stilted experience. The model apparently shows a step-change
improvement on multiple audio benchmarks, including one designed to measure multi-step function calling.
That's the feature that converts voice commands into complex agentic actions. Some customers like
Home Depot have already deployed the model, and Google noted a big improvement in handling
complex details like alphanumeric product codes and noisy environments.
So the obvious implication is the quality of personal voice agents on mobile devices,
and especially given that Apple is looking to Gemini to power the new version of Siri,
the long winter of our discontent of Siri not understanding a single damn word we say
may finally be coming to an end.
One small product announcement from Shopify that I actually think could be fairly significant,
one of my weirder or more out-there predictions for 2026 was that I thought that Shopify has kind of an outsized role to play in the positive normalization of AI.
The reason for that is that Shopify is where a ton of small business entrepreneurship lives.
Shopify's tools have already, even in the pre-AI era, given people who felt overwhelmed by what they needed to do to start a business, enough help to get over the hump.
Although, as you well know, I am not a jobs doomer, I do think that we're going to see a lot of shifts in the average way that people get employed and make money.
One piece of that, I believe, will be an increase in small business entrepreneurship.
If Shopify is the home of where a lot of that new energy goes,
the way that they use AI to provide value for their people
could make a big difference in people's perceptions of it.
It's one thing when the only thing you hear about AI
is that it's going to take your job and it uses all the water.
It's another thing when you see your income rise 30% from the month before
because of the tools you were able to use through your store's hosting platform.
So what Tinker is is a free mobile app with more than 100 AI tools for e-commerce.
merchants can generate logos, product photos, advertising videos, and much more.
It's an iterative, experimental, playful canvas where you can try out all sorts of
different brand identities, product placements, and more.
The entire concept is about flattening the learning curve.
Apps are arranged by outcome, so merchants only need to select what they want to create.
Once inside an app, they can see a range of examples demonstrating what it can do and how to use
it.
They can then describe a desired outcome in natural language, drop in a reference image,
and Tinker automatically turns those inputs into high-quality prompts on the back end.
Shopify's director of product for so, Kasi said,
if you want more artists, lower the cost of paint.
And cost isn't just money.
It's the time spent keeping up, the friction of signing up for everything separately,
and the learning curve of figuring it all out.
We wanted to lower all of it.
So like I said, may seem small,
but I really do believe that Shopify potentially has an outsized role to play
in the positive integration of AI into the broader economy,
and I think Tinker, from my first glances, looks awesome.
By the way, hopefully this goes without saying, but this is a completely unsponsored opinion.
Over an OpenAI land, Codex gets a big upgrade with the integration of plugins.
The OpenAI Devs account writes, with plugins, codex can now support more real work,
including the planning, research, and coordination that happens before you write code and
the workflows that follow.
The team at OpenAI also used the occasion of the plugins launch to go for Anthropics'
throat around some controversy of recent changes from Claude, to reek from the Claude
team writes, to manage growing demand for Claude where adjusting our five-hour session limits
for free Pro Max subs during peak hours.
During weekdays between 5 a.m. and 11 a.m. Pacific time,
you'll move through your five-hour session limits faster than before.
People were not happy about that.
And OpenAI took full advantage.
Tebow from the Codex team writes,
Hello, we have reset Codex usage limits across all plans
to let everyone experiment with the magnificent plugins we just launched.
You can just build unlimited things with Codex.
Have fun.
Speaking of OpenAI,
the company has made a decision which I think is extremely the right one,
putting their erotica plans on Holt.
The Financial Times reports that OpenAI has decided to shelve plans for adult mode indefinitely
as they consolidate resources around coding and enterprise sales.
This is, to put it mildly, not all that surprising.
Earlier this month, the Wall Street Journal reported that OpenAI's independent advisory council
was unanimously against the feature.
Reportedly, their age detection system had a 12% failure rate,
and the experts on the council weren't even satisfied,
adult mode would be safe for adults,
warning it could encourage an unhealthy emotional dependence on chat GPT.
The feature was also controversial among staff, with some departing the company over the issue.
Speaking with the Financial Times, sources said that OpenAI wanted to have more long-term research
on the effects of sexually explicit chatbots and emotional attachment to AI before they released
the product.
Now, my feeling about this, as I said last fall, is that on the one hand, I have a very
socially libertarian bent that basically thinks that adults should be able to do whatever
they want as long as it's not hurting other people.
That said, viewing this question from an entrepreneur's lens, it did not make sense to me for
Open AI to be the one to offer this. There is going to be, I promise you, no shortage of adult
AI experiences that are available to any adults who want them. And I just think that all of the
costs of going down this route were so obviously going to be higher than the upside for OpenAI.
So one other thing that I did want to note about OpenAI's recent moves, there is a lot of chatter
right now about how many products are being killed by OpenAI, Instant Checkout, SORA, the erotic
chatbot, with people seeming to suggest that it's the company flailing. I either
think in many ways it's the opposite. It would be the worst business decision that OpenAI could make
to stick with something that wasn't the right move, even if it looked like the right move just a
couple of months ago. Nothing will kill a business faster than sunk cost fallacy, and OpenAI being
willing to scrap efforts, even where a lot of effort went in, is NetNet a good thing for that
company? And it couldn't come at a better time because boy, oh boy, is the competition going to do
nothing but heat up. Latest rumors suggest that Anthropic is discussing going public as soon as the
fourth quarter, with follow-up Bloomberg reporting, saying that they might be looking to IP
as soon as October. That, of course, puts open AI on the clock, as Sam Altman has reportedly said
he would prefer to go first. Meaning all in all, I think my prediction that we actually don't get
IPOs this year might be one that is wrong. Noel Moldvey writes, according to the Zodiac,
2026 is the year of the mega IPO. Indeed. For now that that is going to do it for the headlines.
Next up, the main episode. All right, folks, quick pause. Here's the uncomfortable truth.
If your enterprise AI strategy is we bought some tools, you don't actually have a strategy.
KPMG took the harder route and became their own client zero.
They embedded AI and agents across the enterprise,
how work it's done, how teens collaborate, how decisions move,
not as a tech initiative but as a total operating model shift.
And here's the real unlock.
That shift raised the ceiling on what people could do.
Humans stayed firmly at the center while AI reduced friction,
surfaced insight, and accelerated momentum.
The outcome was a more capable, more empowered workforce.
If you want to understand what that actually looks like in the real world,
go to www.kmg.org.us slash AI. That's www.kmg.comg.coms slash AI.
Blitzy is driving over 5x engineering velocity for large-scale enterprises.
A publicly traded insurance provider leveraged Blitzy to build a bespoke payments processing
application, an estimated 13-month project, and with Blitzy, the application was completed
in live in production in six weeks. A publicly traded vertical SaaS provider used Blitzy to
extract services from a 500,000 line monolith, without disrupting production, 21 times faster than
their pre-Blitzy estimates. These aren't experiments. This is how the world's most innovative
enterprises are shipping software in 26. You can hear directly about Blitsey from other Fortune 500
CTOs on the modern CTO or CIO classified podcasts. To learn more about how Blitsey can impact your
SDLC, book a meeting with an AI Solutions consultant at blitzie.com. That's BLYTZY.com.
You've heard me talk about assembly AI and their insanely accurate voice AI models,
but they just ship something big.
Universal 3 Pro is a first-of-its-kind class of speech language model that lets you prompt
speech recognition with your own domain context and vocabulary, instead of fixing transcripts
and post-processing.
It's more flexible than traditional ASR and more deterministic than LLMs, so you get accurate
output at the source and can capture the emotion behind human speech that transcripts often miss,
all without custom models or post-processing hacks.
And to celebrate the launch, they're making it free to try for all of February.
If you're building anything with voice, this one's worth a look.
Head to assemblyaI.com slash free offer to check it out.
Most companies don't struggle with ideas.
They struggle with turning them into real AI systems that deliver value.
Robots and Pencils is a company built to close that gap.
They design and deliver intelligent cloud-native systems powered by generative and agentic AI,
with focus, speed, and clear outcomes.
Robots and Pencils works in small, high-impact pods.
Engineers, strategists, designers, and applied AI specialists working together to move from
idea to production without unnecessary friction.
Powered by RoboWorks, their agentic acceleration platform, teams deliver meaningful results
including initial launches in as little as 45 days depending on scope.
If your organization is ready to move faster, reduce complexity, and turn AI ambition into
real results, robots and pencils is built for that moment.
Start the conversation at Robots and Pencils.com slash AI Daily Brief.
That's Robots and Pencils.com slash AI Daily Brief.
Robots and Pencils, Impact at Velocity.
Welcome back to the AI Daily Brief.
I noticed this really interesting story yesterday,
where Intercom announced that their new dedicated customer service-focused model, Finn,
had achieved something very significant.
CEO Ewan McCabe called it objectively the highest-performing, fastest, and cheapest model
for customer service, beating the very best models in the industry,
including GPT 5.4 and Opus 4.5.
Now, it has been a persistent question in AI about how much custom models would matter.
You might remember way back in the immediate post-ChapT fever, a number of companies figured,
well, since we have such unique proprietary data, training our own model on that data surely will outperform.
Maybe the best known of those efforts was Bloomberg GPT, which they called a 50 billion parameter
large language model purpose built from scratch for finance.
Now, it turned out that in practice, that model got absolutely smoked by the general models,
reminding everyone once again of the bitter lesson.
The bitter lesson is a very famous essay from computer scientist Rich Sutton from back in 2019.
He writes,
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation
are ultimately the most effective and by a large margin.
He gave as one of his first examples, computer chess.
He says in computer chess, the methods that defeated the world chess,
champion Kasparov in 1997 were based on massive deep search. At the time, this was looked upon
with dismay by the majority of computer chess researchers who had pursued methods that leveraged
human understanding of the special structure of chess. When a simpler search-based approach,
with special hardware and software, proved vastly more effective, these human knowledge-based
chess researchers were not good losers. They said that brute force search may have won this time,
but it was not a general strategy. And anyway, it was not how people played chess. These researchers
wanted methods based on human input to win, and were disappointed when they did not.
So basically what this essay is arguing is that throughout AI history, and as a reminder, because
this often surprises people, AI as a field, at least as a named field, is older than computer
science. If you go back to the 50s and look at the laboratories at places like MIT, there were
already back then artificial intelligence labs, but the idea of computer science as a field
wouldn't come until a little bit later. In any case, what the bitter lesson is arguing is that
Throughout AI history, researchers have tried two basic approaches. The first is encoding human
knowledge and clever tricks into systems, essentially trying to teach computers how humans think.
The second is giving computers massive amounts of data and compute and letting them figure
things out on their own through search and learning. The bitter lesson is that the second approach
wins every single time. It's bitter because it's a blow to human ego. Researchers spend years
crafting elegant, domain-specific solutions. Encoding chess strategy or linguistic rules or visual
perception models. And then a brute force method powered by more compute just steamrolls all that careful
work. Now we have that example from chess, but that example repeated across Go, speech recognition,
computer vision, and now language. The systems that scale with more computation always eventually
beat the systems built on human design shortcuts. And so taking the bitter lesson and applying it
to LLMs, kind of explains why Bloomberg's highly specialized model ultimately got beat by much bigger
and more computationally intensive models. And yet, coming into 2026,
There was an interesting question, specifically of whether a specific type of data might actually
change this equation.
That data that people were interested in is last mile usage data, basically user interaction data
at the very edge of the experience.
And the specific place where many were watching this was around AI coding.
The question was whether a company like Cursor could ultimately have some advantage in their
own proprietary model because they had such a tremendous amount of experiential data around the
actual interaction point. Now, it wasn't really so much a question of whether that data is valuable.
Obviously, it is. But there's a difference in it being valuable for product design versus model
design. Inevitably, that data is extremely useful in figuring out the right products or the right
harnesses for models. That was never in question. What was a question is whether all that
information could actually change the destiny of customized vertical models. Latent Space
wrote about this last year in November, and their piece titled the Agent Labs thesis.
The point that SWIX and latent space were making was that if it is the case that we are
close to hitting the limits of pre-training data, that perhaps shifts the future of model
performance to post-training.
The agent lab's thesis asked, can post-training make up the gap between the best open models
and the best frontier models, and how long until they start exceeding?
In other words, the tweak here is that a company like cursor isn't training a model from
scratch, they are taking the best available open weights models that are out there, which are
admittedly a little bit behind the state-of-the-art, and adding in this post-training process
with the idea of actually performing better in a specific domain than the general state-of-the-art
model can.
Now, Cursor placed a pretty high importance on this.
The company had said explicitly that they needed to train state-of-the-art coding models
to keep up with competitors, which some reports suggested was a financial imperative, with
Cursor burning too much money reselling API access to OpenAI and Anthropic.
Now, earlier this month, we got the release of their Composer 2 model.
The model was in the same ballpark as GPT 5.4 and actually beat Opus 4.6 on coding benchmarks while being
much cheaper to run, meaning, of course, that it fit cursors needs extremely well. However, an ex-user
called Flynn triggered a controversy, revealing that Composer 2 was just, and boy, is this just
doing a lot of heavy lifting, Kimmy K2.5 with some extra reinforcement learning applied. Cursor themselves
did not deny this. Dev relations rep Lee Robinson commented, yep, Composer 2 started from an
open source base. We will do full pre-training in the future. Only a quarter of the compute spent on
the final model came from the base. The rest is from our training. This is why evals are very different.
Now, some amount of the controversy was about cursor, in the eyes of some, failing to disclose their
use of an open source base model, but others seemed genuinely dismissive of the practice.
As Flynn had done, they wrote off the model as quote-unquote just Kimmy K-2.5 without a second
thought. Others thought, though, that maybe something important was going on here. Leet-LLLLM
writes, as someone who basically lives in Opus 4.6, seeing an open-weight Kimi 2.5
fine-tune actually beat it on coding benchmarks is wild. If Composer 2 could really perform
that well, cursors seem to have demonstrated that reinforcement learning on a quality
dataset can actually go quite a long way, vaulting an adequate base model into the top
tier. This, of course, in some ways, seems to run counter to the bitter lesson. But if it's correct,
would suggest that there's a lot of fertile ground for training models around particular verticals.
which gets us to the announcement yesterday from Intercom.
Intercom's chief product officer Paul Adams tweets,
We have a very significant announcement here
that will change how we think about the AI landscape.
We have built a brand new model for Finn called Apex,
which has a higher resolution rate, fewer hallucinations,
and is far cheaper than any other model provided by any other company in the world,
and it isn't close.
This is an incredibly hard thing to achieve
and is only possible with the domain-specific proprietary evals
from our billions of human and agent customer service interaction data points.
We also have a flywheel here where we will continue to get better at the edges.
This is, you might recognize exactly what we were talking about in my 2026 predictions,
when we talked about the lab loop and the importance of this last mile usage data.
Paul continues,
So what does this mean?
It means that vertical models can and will outperform general models.
It means that many successful companies in the future will need to be full stack,
app layer, AI layer, and model layer.
And critically, as it becomes much easier to copy and clone at the app layer,
durable differentiation will move down the stack and ultimately to the model layer.
Now, this got a ton of chatter.
BNAFOG writes,
The story isn't that APEG's beat frontier models.
It's the domain-specific post-training close the gap this fast.
Any vertical SaaS with enough labeled interaction data is sitting on an untapped fine-tuning asset.
The infrastructure remote is eroding faster than most realized.
Abhijid, who's on the board of Intercom but does new products at OpenAI, writes,
model quality depends a lot on judgment, and that judgment lives in proprietary evals,
real-world usage, and fast feedback loops, being close to the work.
This creates all kinds of opportunities for companies that are willing to think big and bet on themselves.
Now, while he doesn't seem worried for his main employer OpenAI,
the implications for them is certainly where many people's heads went.
Theo Bloshae writes,
Very cool feat from Intercom,
though reading this makes me wonder what value the Frontier Lab companies
actually deliver long term, if every industry, cursor for coding, now fin for CS, can build better
and cheaper specialized models from open source bases. And interestingly, this wasn't the only
story around these themes. Decagon co-founder Ashwin Shrinivas writes, over 80% of model traffic at Decagon
now runs on models we've trained in-house, structured as a network of specialized models
handling different parts of the interaction. Now, this is a little bit different because there
is actually an architectural change here. In their announcement host, they write,
Instead of relying on a single model, we built a network of specialized models each responsible for a specific part of the interaction,
detection, orchestration, response generation, and evaluation.
That separation lets us optimize each layer independently and drive better speed and quality across the system.
Regardless, though, the point is that here you have another company that is shifting off reliance on the major close foundation models and towards models that they've trained, at least in part, themselves.
Chakar says, I think this is a trend we'll see going forward.
The reliance on general-purpose frontier models will hit a wall for domain-specific tasks.
Custom post-training pipelines will be the way forward.
Clem DeLang from Hugging Face agrees, writing,
after Pinterest, Airbnb, Notion, Cursor,
today it's UN and Intercom publicly sharing that they're finding it better, cheaper, faster,
to use and train open models themselves rather than use APIs for many tasks,
and hundreds of other companies are doing the same without sharing.
Ultimately, I believe the majority of AI workflows will be in-house based on open source versus API.
It took much more time than we anticipated, but it's happening now.
Now, obviously, if this is the case, there are significant business model implications.
Adriana Sabato writes,
The API tax is starting to look like the cloud markup of 10 years ago.
Once teams realize they can run fine-tuned open models for a fraction of the cost,
the switch becomes obvious.
Ewan from Intercom agrees that this is the beginning of something bigger,
writing a companion post called The Age of Vertical Models is here.
He reinforces that the model just is better across numerous dimensions.
It has a 2.8% higher resolution rate, but he writes, importantly, it's also dramatically
faster, has fewer hallucinations, in fact a 65% reduction in hallucinations, and is far cheaper
than all other available models.
In his post, Ewan referenced the recent interview with Andre Carpathy, where Carpathy said,
I do think we should expect more speciation in the intelligences.
The animal kingdom is extremely diverse in the brains that exist, and there's lots of different
niches of nature, and I think we should be able to see more speciation.
and you don't need this oracle that knows everything, you kind of speciate it, and then you put it on a specific task.
And we should be seeing some of that because you should be able to have much smaller models that still have the cognitive core.
From there, Ewan picks up, the frontier labs still have the very best models, but the open weight models are not that far behind.
So it's not hard to see pre-training as a commodity of sorts.
Where we think the frontier will move next is to post-training.
Carpathy's prediction is exactly what we're seeing with Apex and Cursor's Composer 2,
and what we're going to see significantly going forward.
As such, the labs are in an interesting position
where on one hand, the horizontal general-purpose models
are actually over-serving the market for specific use cases,
e-g., their models are more generally intelligent than is needed for customer service,
and on the other hand, the open-weight models are more than good enough
where high-quality domain-specific post-training
can make the resulting model superior at the special-purpose jobs
and in the way that matters to that particular job.
Personally, I'm still very bullish on the labs.
We remain very heavy customers of Anthropic,
Yet classic disruption is now at their door.
The only way out is to disrupt themselves by building cheaper specialized models too.
And the only way to do that is to acquire the evals, or the companies with the evals,
needed for that specific task.
Which means there will be some interesting data partnerships or M&A consolidation,
and you're going to see some hyper-specific model providers who go it alone
and compete with the labs head-to-head.
Likely, all of the above.
Now, going back to the bitter lesson, it kind of feels at first glance,
like this would run counter to that, right?
that in the long run, the sheer additional volume of computational data
should beat out the specialized knowledge and data of the edge providers.
Except the bitter lesson isn't just about the amount of data.
It's about brute force data and compute as opposed to human knowledge.
But we're not exactly talking about human knowledge here.
Instead, we're talking about experience.
The data that a cursor has, or an intercom has,
is not the data of some human expert.
Instead, it's millions of interactions
which show how things actually happen in the real world.
It turns out that Richard Sutton himself actually discussed this very thing as an example of the next phase of the bitter lesson on the Dwarkesh podcast last year.
Will they reach the limits of the data and be superseded by things that can get more data just from experience rather than from people?
In some ways, it's a classic case of the bitter lesson.
With the more human knowledge we put into the large language models, the better they can do.
and so it feels good.
And yet, one, well, I in particular expect there to be systems that can learn from experience.
And those could well perform much, much better and be much more scalable,
in which case it will be another instance of the bitter lesson,
that the things that used human knowledge were eventually superseded by things
that just trained from experience and computation.
Putting it simply, this new model apex,
Composer 2, are post-trained from experience, exactly as Sutton said.
Now, this might feel like an inside baseball kind of story, but I think that the implications
could be massive in terms of how the whole industry evolves.
One thing I don't think that this means is that every company that has any sort of customer
data is all of a sudden going to be successfully able to spin their own model.
There are ultimately not that many people who are good at doing post-training, and so I don't
think that we're going to see this massive fragmentation of vertical models, but you
better believe that these results are encouraging enough that many, many more companies who do have
this type of data, and the post-training talent or the ability to get it are going to be doing some
experimenting in this area. It's something we will continue to watch and explore, but for now,
that's going to do it for today's AI Daily Brief. Appreciate you listening or watching,
as always, and until next time, peace.
