TBPN - Meta’s AI Comeback Moment, Claude Mythos | Diet TBPN
Episode Date: April 9, 2026Diet TBPN delivers the best of today’s TBPN episode in 30 minutes. TBPN is a live tech talk show hosted by John Coogan and Jordi Hays, streaming weekdays 11–2 PT on X and YouTube, with ea...ch episode posted to podcast platforms right after.Described by The New York Times as “Silicon Valley’s newest obsession,” the show has recently featured Mark Zuckerberg, Sam Altman, Mark Cuban, and Satya Nadella.Follow TBPN: https://TBPN.comhttps://x.com/tbpnhttps://open.spotify.com/show/2L6WMqY3GUPCGBD0dX6p00?si=674252d53acf4231https://podcasts.apple.com/us/podcast/technology-brothers/id1772360235https://www.youtube.com/@TBPNLive
Transcript
Discussion (0)
The big news today is that Meta Platforms has launched a new AI model.
Alex Wang, the chief AI officer at Meta Platforms,
announced a new large language model today.
Its first major new artificial intelligence model in more than a year,
the rollout of the model called Muse Spark is a critical moment for meta,
which is up 7.5% already,
which has spent billions of dollars hiring AI talent in a bid to catch up to OpenAI,
Anthropic, and Google DeepMind, the leading labs,
have been putting out models at an accelerating pace.
In a departure from its previous models,
which were open source, Muse Spark, is a closed model
that will power Meta's AI chatbot and AI features within it.
John Ludig has a very interesting post about open source AI
and sort of predicted this.
Predicted that meta would eventually bail.
Yeah, the future foundation models is closed source.
He said, given meta is the primary deep-pocketed large open-source model builder,
open-source AI, has become synonymous with meta-AI.
He wrote this maybe three or four years ago.
So the operative question for open source AI is, what game is meta playing?
In a recent podcast, Zuckerberg, explains meta's open source strategy.
One, he was burned by Apple's closeness for the past two decades and doesn't want to suffer
the same fate with the next platform shift.
It's a safer bet to commoditize your compliments.
He likes building cool products and cheap, performant AI enhances Facebook and Instagram.
That's 100% true.
We've seen this in the ads product and the growth there.
There's some call option value if AI assistance become the next platform.
And that makes sense in Manus and the Meta AI app.
He bought hundreds of thousands of H-100s for improving social feed algorithms across products.
And this seems like a good way to use the extras.
That all makes sense.
And Lama has been great developer marketing for Facebook.
But Zuck also suggests several times that there's some point at which open source AI no longer
makes sense, either from a cost or safety perspective.
When asked whether meta will open source the future $10 billion cost model, the answer was,
as long as it's helping us.
At some point, they'll shift their focus towards process.
And that's what John Lutig wrote.
He says, unlike the other model providers,
meta is not in the business of selling model access via API.
So while they'll open source, as long as it's convenient for them,
developers are on their own for model improvements thereafter.
That begs the question if meta is only pursuing open source insofar as it benefits themselves.
What is the tipping point at which meta stops open sourcing their AI?
Sooner than you think he says, exponential data,
frontier models trained on the corpus of the Internet,
but that data is a commodity model differentiation over the next decade
the next decade will come from proprietary data, both via model usage and private sources.
Exponential CAPEX, he highlighted this two years ago, a lagging edge model that requires just a few
percent of META's 40 billion in CAPX is easy to open source. No one will ask questions.
But when you reach $10 billion or more in CAPX spend for model training, shareholders will want
clear ROI on that spend. The Metaverse raised some question marks at a certain scale, too.
Diminishing returns on model quality within meta. There's a large upfront benefit for
meta building an open source AI model, even if it's worse.
than the frontier closed source counterpart.
There are lots of small AI workloads,
think feed algorithms, recommendations,
and image generation where meta doesn't want to rely
on a third-party provider
like they had to rely on Apple.
And so the news has been,
back in December, there was a reporting
that Alex Wang disclosed an internal company Q&A,
that his team was working on two new models.
One was this text-based LLM, code-named Avocado,
and then a separate model
that was for image and video.
Mango.
Yeah.
And so have they clarified
if this is avocado?
This feels like what avocado
should be this muse spark.
Is that what it's called?
Yeah,
I see what it is.
I don't know what else.
So the image model
should be coming soon.
The question that I had was
will a code-focused
agentic coding harness
be a separate model,
a different train?
It feels like it's not a coincidence
that this news
is dropping on the heels
of Anthropics
new model, Mythos, which sort of was announced loosely and the model card dropped yesterday,
even though the model is not available yet to play around with.
They break out Muse Spark, thinking against Opus 46, Gemini 3.1 Pro high, GPD 5.4X high, and then GROC 4.2.
The way that they position it, it looks somewhat of a chart crime, but if you look,
when you basically look at the top and you say that Muse Spark gets an 86.4.
and it's in blue.
Yep.
And then you look over and it's outperforming all the other models on that benchmark.
Okay.
You just sort of assume that like the highlighted blue means that it's outperforming.
Oh, yeah, but it's not frontier at MMLU.
Because there's a bunch of them as you get further and further down.
So like Arc AGI2, Muse Spark dramatically underperforms.
Oh, interesting.
The other leading models.
Yeah.
And there's a number of others where it is, there's plenty where it's overperforming.
Yeah.
There's plenty where it's underperforming.
I, of course, we started asking it jokes right away.
So I'll pull up the joke.
I said, tell me a joke.
Okay.
And it said, gotcha.
Here's one for you.
Why don't skeletons fight each other?
They don't have the guts.
Okay.
But then it doesn't even let me react.
It just says, want a different vibe.
I've got dad jokes, nerdy ones, or Malibu appropriate surf puns too.
And so I'm like, wait, why would you think I want Malibu appropriate surf pun?
You look in your previous chats.
I have no previous chats.
Because meta-AI crosses up Instagram, so when you search on Instagram,
sometimes that prompts Lama or previous.
Yeah, yeah.
So I'm trying to get to the bottom of it, right?
Like this feels like an oddly specific type of joke to ask me.
And so I said, why would you think I want Malibu appropriate surf puns?
Ha-ha, fair call.
Toss that in without you asking for it.
I try to mix in different flavors of jokes.
And surf puns just popped into my head as a random option.
no special reason behind it.
I just said, I don't believe it was random.
And it says, you got me.
You got me?
That wasn't my smoothest moment.
But then it denies it again.
Weird.
It says, it's just like, what kind of joke do you actually want?
No hidden motives here.
I said, you gave a hyper-specific example based on my life,
so I have to assume you were looking at my other account for inspiration.
I get why that felt weird.
I'm sorry, it came off that way.
And then it denies it again.
I said, you're clearly lying.
I mean, you have talked about being in Malibu on the internet for a full year.
It's possible it got baked into the pre-training or something.
I don't know.
But yeah, I mean, what is personal super intelligence if it doesn't even know your name?
Like that feels like they haven't dialed in the harness or whatever the tuning is to actually find tuned the responses.
And of course, like meta is going to be hyper aware.
We don't want a PR cycle.
Yeah, yeah.
Like they trained on your data, right?
Everyone's been, oh, that ad was a little bit too close to home.
And you remember every once in a while one of those like,
a screenshot that's been screenshot like a thousand times, like goes viral.
And it's like, I do not give Mark Zuckerberg.
Oh, yeah, yeah, yeah, yeah.
Like that works.
Yeah.
It's hilarious.
This is, is this a rebuttal to the bench hacking allegations that happened last week,
or last year.
So according to Meta's internal benchmark test,
Mew Spark outscored Google Gemini on some tests and was competitive with models from
Open AI and Anthropic on others, it significantly outscored XAI's GROC on most tests.
Alexander Wang's hiring followed the disappointing release of META's previous model called Lama 4.
The company was accused of and later admitted to gaming a third-party benchmark that it used to rank
various models against each other on performance.
It also delayed the rollout of its biggest model called behemoth, which it never ultimately released.
And so when I look at a model card like this where you could call it a chart crime where, you know,
it's highlighted in blue and it feels like it's the best, but it's actually.
doing better on some. It does well and health bench hard. It underperforms on
the ARCAGI 2 as you mentioned. But this maybe is the bull case here is that they have
at least moved on from the culture of like optimizing for the benchmarks, right? Isn't
that a good thing? There are rumors about them. Like there was like extra bonuses if
they if they got number one on Elm Arena. I think that was like something like the
rumor. Yeah. But yeah I mean you've seen a lot of the labs kind of move away from
benchmarks generally because I think they're just not that meaningful anymore. Like a lot of
them are like basically so saturated. It's like they're competing between 89 and 91%. Yeah.
And they're just like not very meaningful like you see. And you won't like actually feel that in
the product necessarily. Yeah. You kind of need to talk to these things for a long time before
you can actually get the vibe. Yeah. But I do think, um, uh, this news is very interesting in the context
of the, you know, clawonomic stuff. The dashboard, yeah. Because like what, okay, what does it mean
if, if, um, the entire company has been like maxing their, their cloud tokens. Yeah.
Uh, over the past month. It means that they weren't using this model. Yeah.
To me it means they need to commoditize their complements, right?
They need to bring down that cost potentially.
And if they're, I mean, we sort of dug into, are they spending a billion dollars a month?
Seems like absolutely not, but they're clearly spending a lot.
And if you can turn that OPEX into CAPEX and train your own model and then inference it
much cheaper on your own hardware, that feels like just an economic opportunity that makes
a ton of sense in the context of just 10,000, 20,000 engineers writing a lot of code using their own
I think there's basically like two ways to like square those two things happening, like either one.
This model's like not that good because the engineers aren't using it or, you know, your theory that they're just distilling cloth.
So one of those is true.
That is not my theory.
That is the schizo theory.
The news this morning, meta platforms and the information, meta platform is taken down internal employee built leaderboard tracking how many token staffers were using.
Showed total usage over a recent 30-day period.
Amounted over 60 trillion tokens.
the dashboard now displays a message that is offline.
It says we really enjoyed building this app on Ness for everyone.
It was meant to be a fun way for people to look at tokens,
but due to data from this dashboard,
being shared externally, we've made the decision to shudder it for now.
It seemed like a fun side project.
Mike Isaac was reporting on it here.
He said it's down, unclear to me if this was a homespun one by employees
or an official one, employee projects come and go frequently,
conspicuous timing, though.
But yeah, you don't want to have, you want to measure the output,
but the impact, not necessarily the input and how much is going on there.
Lysan Al-Gaib says,
META might actually be back with Mew Spark,
still behind Open AI Anthropic and Google,
but ahead of XAI and Chinese labs.
Mew Spark stores 52 on the artificial intelligence analysis
index behind only Gemini 3.1 Pro, Gemini GPD 5.4 and Claude Opus 4.6.
Mew Spark is the first new release since Lama 4 in April 2025
and also META's first release that's not open weight.
So a huge jump up in performance across a variety of benchmarks.
So all good stuff there.
The market is thrilled that META has released a close to frontier level model, right?
This is a new group.
They've been out of for less than a year.
The stock is up almost 8% today.
And again, so much of the pricing pressure, the downward pressure on META has just been kind of uncertainty on what all these tens of millions of dollars.
will actually go towards and what will be accomplished.
And still unclear, like, are they going to go after CodeGen at all?
Are they just going to try to compete on the consumer LLM side?
Very, very.
And can you economically go after CodeGen if you're just using it for internal models?
If you're not selling it externally, can you justify the CAPEX just purely on the internal usage?
Having this model be vended into all the different family of apps makes a lot of sense
because they have billions of users that will wind up interacting with this in one way.
or another.
Yeah, the question is, will they try to send meta vibes?
Again, with the new model.
All the way up to the top of the App Store charts.
Meta's new family of AI models can reach the same performance as Kimmy K2 with only 30% of compute
and only 10% of the compute to reach Lama4 Mavericks.
So a much more efficient computing frontier here.
Meta Spark is an early data point on our trajectory and we have larger models in development.
development. So the mythical 10 trillion parameter model. That is the 10T is what everyone's
working on right now, 10 trillion. Yeah, probably in that range. Yeah, it's all rumored at this
this. Yeah, rumored GPT4 was something like a trillion, right? You remember those memes where
it's like a small circle and then the big circle is a huge circle. GPD4, GPD5? Yeah. Martin
Casado has a little bit more context on like what actually unlocks new capabilities in AI models.
He says, Mythos appears to be the first class of models trained at scale on Blackwells.
Then there will be Vera Rubens.
Pre-training isn't saturated.
Narrative violation.
RL works.
And there's so much computing coming online soon.
Buckle your chin straps.
It's going to be wild.
The scaling laws.
And you know Brad Gersner had to come in with a hundred.
A hundred.
Yep.
For sure.
Yeah, there's a crazy bullcase for Nvidia in the information, arguing it should be worth, what, $22 trillion.
That is a wild move.
There's a lot going on.
The scaling laws holding is the most important part of this.
Article from the information finance.
NVIDIA worth $22 trillion.
This old school financial model says yes.
The big news on yesterday was Anthropics new model mythos.
Some really impressive statistics and anecdotes yesterday,
both the model card, the benchmarks,
and some stories about breaking out a variety of,
What do they call them?
Wald Gardens or test environments?
I don't know.
The Simulation.
The sandbox.
Yeah, breaking out the sandbox, sending emails, all sorts of stuff like that.
The model preview is only available right now to about 50 companies that maintain critical infrastructure
because the model is particularly good at finding zero days, bugs, and exploits in technical systems.
And if they, you know, they leak that out before big companies have time to go and address all the bugs,
there could be serious, you know, serious ramifications for cybersecurity.
And so key partners include Apple, Google, Microsoft, Amazon, NVIDIA, JPMorgan,
Broadcom, the Linux Foundation, Cisco, Crowdstrike, and Palo Alto Networks.
They're all listed on the cybersecurity focus page for Project Glasswing.
Chris Backy was having a little bit of fun because he noticed Anthropic put their own logo on the partner page,
which is a little bit funny, but at the same time, it's kind of smart because a lot of people are just going to see the image quickly,
and it's good to position yourself with the other companies.
So, yeah, it is interesting.
I mean, people have predicted that AI models would be particularly good at cyber attacks,
and this is one of the main sort of vectors of AI fears.
It feels like this is what maybe what Dario was referring to
when he was talking about the end of the exponential finding and exploiting software bugs.
It's sort of perfectly in the sweet spot for coding agents and reinforcement learning,
combing through piles of code, tirelessly trying different,
exploits to find bugs, having a clear verifiable reward. Did you crash the system or not? Did you
break into the system or not? It's a very clear binary signal that you can send to the model to
determine were you successful in breaking into that system. And it requires basically no
time delay. There's no lag. So there was one snarky tweet I saw that was something to the effect
of like, okay, then if it's so good, go cure cancer. But any application that requires a
real-world feedback cycle, even if it's just a few minutes of human interaction in the cancer
example, you're going to need to be testing the drugs in vitro in mice, in monkeys, in humans
at some point, or even if you're just sequencing DNA or doing anything in the lab, pipeting
anything, if it's even just a few minutes, all of a sudden every iteration, every attempt
is going to take a few minutes, and that's going to put you on just a wildly different exponential,
as opposed to being able to spin up a virtual machine with basically every single piece of software out there
and then try every single exploit against every single piece of software,
and you wind up with a ton of exploits.
And very, very bullish for cybersecurity that this is being done preemptively.
There's a whole bunch of different discussions.
Ben Thompson has a good piece on the whole decision to release the model or not
and stage it out and the go.
to market there, but even if the bio research, the other impacts are on sort of a slower
exponential, there's still so much opportunity in even a software-only singularity.
There's also risk in a software-only singularity.
We've seen this story before, though, a model that's too powerful to release but then
works its way out and has pretty moderate impact on the world.
This was the story of GPD 2, the story of chat GBT, the question of, is this the model
that's dangerous to put in the hands of people.
Yeah, a headline from February 22, 2019 by Aaron Mack.
OpenAI says it's text-generating algorithm GPT2 is too dangerous.
So there is a, I think Van Thompson called it like the boy who cried wolf syndrome,
the mythos wolf.
He says there's a lot of skepticism about Anthropics announcement.
This tweet was representative from Bucco Capital Bloch.
Anthropics marketing strategy is so funny.
Like, ah, the government is treading on me.
our models are so good, we can't release them.
It would be too dangerous.
Someone stop me, I'm going to destroy the economy.
The rolling of the eyes is exacerbated by the fact
that Anthropic has reasons to not make mythos widely available
beyond a lack of compute.
Another factor is surely trying to avoid having
mythos distilled by Chinese model makers.
So there's actually two good reasons to gate access.
And when you're looking at those logos,
when you're looking at the world's largest tech companies,
there's much more.
ability to scale rollout, demand, set pricing. These companies might be able to pay more. The model is
very expensive, but if you're justifying that against bug bounties for zero-day exploits in your most
critical system, when you look at like J.P. Morgan Chase, it's a bank. Like, what is the price of
finding an exploit in that system? It's pretty high. It probably clears the token hurdle a lot.
And if the rollout is paced like evenly across all the different companies, they'll all sort of
of understand that they're getting allocation, inference allocation at the efficient price that
clears the cost to actually serve the model.
So I do think the systems, all of these 10 trillion parameter models will be released soon broadly.
And the main reason that an AI that's smart enough to find zero-day exploits should be able
to recognize that it's being used by a bad actor to find zero-day exploits.
It's only been a few months since the last flurry of competing models from OpenAI,
Anthropic and Google. And the next cycle is already off to an aggressive start. We had meta. And then
the other news is that Elon Musk announced that he is getting ready to do another larger model with
XAI. He's got a few. He's doing seven models in training. Wow. That is a lot. Imagine V2,
two variants of one trillion, two variants of 1.5 trillion, a six trillion model and a 10 trillion model.
He says there's some catching up to do. But he says, he says,
he will never give up, never. So he is continuing to grind and train more models.
Mike from also Capital, former guest says we've decided not to release our latest investment
strategy. It's so powerful. Releasing it might end the entire venture asset class as we know it.
Yeah.
He says you should release it to a handful of trusted partners so that we can harden ourselves.
George Hot says Anthropics marketing strategy. It's amazing. It's so powerful. It's terrifying.
And the best part is you can't come. By the way, if Anthropics,
had any way to ship this, they would.
Trained AI models are the fastest depreciating asset in history.
GPD4 cost $100 million to train two years ago
and is now worth less.
Quinn 3.527B, 1 million.
Sending the FOMO back, clock is ticking, boys.
It needs something like an NBL 72 to run a decent speed
and even absurd API pricing doesn't cover it.
There's more to be made on investor hype than API access.
I just wish for honesty instead of a whole fake spiel about safety,
Who remembers when GPT2, 1.5B, was too dangerous.
And so lots of back and forth.
Dean Ball has some more thoughts on mythos.
It's a longer post, so we'll let you go and read it.
The main take is just the, you know, this is technology that whether it comes from Anthropic
or another lab, like clearly needs to go into the supply chain of the world and in the U.S.
government and the U.S. economy because no one is doubting, even though some of the exploits
were somewhat minor, no one disagrees that we need less cybersecurity.
We want the most secure systems possible, and we probably want a lot of competition between different companies to provide that service to the government.
And so hopefully if the war comes to an end and there's, you know, different discussions can happen and, you know, ice can thaw, and there's a way to, for these companies to work together.
Even if the supply chain thing doesn't go through and then anthropic can vend technology through Project Glass Wing, through CrowdStrike, through,
Oracle and other partners to Cisco so that at least the systems are secure because everyone wants
that.
So he says a lot of people, including people in positions of authority, told us recently that models
of Mythos' capability wouldn't be a thing that models with obvious national security implications
would not be forthcoming.
Those people were wrong.
There's nothing to do about it, but you should remember it.
Mythos is the first model where theft of the weights by an adversarial actor feels like it would
be a major deal.
You better believe they will try.
and if they don't succeed with Mythos, they will eventually.
We are thoroughly in the era of the lab's best models
may well not be in public the way they used to.
This is because of a combination of compute constraints,
economic reality, competitive advantage and safety concerns.
Three means the most relevant models
may be decreasingly legible to the general public.
And depending on the extent and duration of the coming compute squeeze,
we could enter a market dynamic where the best models
are only available to the highest bidder.
In other words, where compute is a seller's market
rather than a buyer's market. Interesting. Imagine competing firms in the economy, bidding against one another
for access to the best and most tokens and the frontier labs as, in essence, kingmakers.
The governance regime I have described above in four is not designed to stop that dynamic.
Scoop from Stephen Nelson. The CIA used a secret tool called ghost murmur to find airmen in Iran.
Yeah. Ghost murmur pairs long range quantum magnet magnetometri.
Sensors with AI to find human heartbeats.
I was wondering this while they were over the weekend,
there was a search going on.
How does somebody like an airman that's down send a signal
that can be picked up by one group, but not?
This is very odd.
So there are some community notes on this saying that quantum magnetometry.
Magnetometry.
I imagine that's what you pronounce it.
Detects heart magnetic fields.
And I believe this technology works in life.
labs, but only up to a few meters, not 40 miles as claims, has claimed fields decay with one over
R cubed, making long range detection implausible.
So unclear if this is what worked, but there has to be some sort of device that you could
carry on your person, like in your shoe, like an air tag that can talk to a satellite
almost.
Like you look at the Starlink receiver dish.
It would fit in a backpack, but that's very high bandwidth.
I imagine if you had something, I mean, there's sat phones that are the size of large cell phones.
That was available in the 80s and 90s.
You have to imagine that if you're just trying to put out a signal to GPS or a Starlink network,
you must be able to shrink that down significantly to the place where it could be carried on your body.
But it's probably classified.
So I would be surprised if it's just very hard to read into what's real and what's,
what's not here. There isn't, there is a different community note
pushing back saying no note needed. This new technology is a classified system
developed in secret by Lockheed Skunkworks and the CIA that was just
used, revealed publicly for the first time. Naturally, it's reported
capabilities far exceeding the known public state of the art. The note is
relevant. So it's very, very interesting. Anyway, thank you so much for
tuning in today. A bit of a shorter show. We're experimenting with different
things. Obviously, we don't have ad reads anymore and so we are going to be
mixing it up with more stories, more interviews, different timing, and more flexibility.
And so we hope you enjoyed this show, and we will see you tomorrow at 11 a.m. Pacific,
Sharp. Goodbye, smoke.
We love you.
