TBPN - Meta’s AI Comeback Moment, Claude Mythos | Diet TBPN

Starting point is 00:00:00 The big news today is that Meta Platforms has launched a new AI model. Alex Wang, the chief AI officer at Meta Platforms, announced a new large language model today. Its first major new artificial intelligence model in more than a year, the rollout of the model called Muse Spark is a critical moment for meta, which is up 7.5% already, which has spent billions of dollars hiring AI talent in a bid to catch up to OpenAI, Anthropic, and Google DeepMind, the leading labs,

Starting point is 00:00:28 have been putting out models at an accelerating pace. In a departure from its previous models, which were open source, Muse Spark, is a closed model that will power Meta's AI chatbot and AI features within it. John Ludig has a very interesting post about open source AI and sort of predicted this. Predicted that meta would eventually bail. Yeah, the future foundation models is closed source.

Starting point is 00:00:50 He said, given meta is the primary deep-pocketed large open-source model builder, open-source AI, has become synonymous with meta-AI. He wrote this maybe three or four years ago. So the operative question for open source AI is, what game is meta playing? In a recent podcast, Zuckerberg, explains meta's open source strategy. One, he was burned by Apple's closeness for the past two decades and doesn't want to suffer the same fate with the next platform shift. It's a safer bet to commoditize your compliments.

Starting point is 00:01:16 He likes building cool products and cheap, performant AI enhances Facebook and Instagram. That's 100% true. We've seen this in the ads product and the growth there. There's some call option value if AI assistance become the next platform. And that makes sense in Manus and the Meta AI app. He bought hundreds of thousands of H-100s for improving social feed algorithms across products. And this seems like a good way to use the extras. That all makes sense.

Starting point is 00:01:37 And Lama has been great developer marketing for Facebook. But Zuck also suggests several times that there's some point at which open source AI no longer makes sense, either from a cost or safety perspective. When asked whether meta will open source the future $10 billion cost model, the answer was, as long as it's helping us. At some point, they'll shift their focus towards process. And that's what John Lutig wrote. He says, unlike the other model providers,

Starting point is 00:02:00 meta is not in the business of selling model access via API. So while they'll open source, as long as it's convenient for them, developers are on their own for model improvements thereafter. That begs the question if meta is only pursuing open source insofar as it benefits themselves. What is the tipping point at which meta stops open sourcing their AI? Sooner than you think he says, exponential data, frontier models trained on the corpus of the Internet, but that data is a commodity model differentiation over the next decade

Starting point is 00:02:25 the next decade will come from proprietary data, both via model usage and private sources. Exponential CAPEX, he highlighted this two years ago, a lagging edge model that requires just a few percent of META's 40 billion in CAPX is easy to open source. No one will ask questions. But when you reach $10 billion or more in CAPX spend for model training, shareholders will want clear ROI on that spend. The Metaverse raised some question marks at a certain scale, too. Diminishing returns on model quality within meta. There's a large upfront benefit for meta building an open source AI model, even if it's worse. than the frontier closed source counterpart.

Starting point is 00:02:57 There are lots of small AI workloads, think feed algorithms, recommendations, and image generation where meta doesn't want to rely on a third-party provider like they had to rely on Apple. And so the news has been, back in December, there was a reporting that Alex Wang disclosed an internal company Q&A,

Starting point is 00:03:16 that his team was working on two new models. One was this text-based LLM, code-named Avocado, and then a separate model that was for image and video. Mango. Yeah. And so have they clarified if this is avocado?

Starting point is 00:03:31 This feels like what avocado should be this muse spark. Is that what it's called? Yeah, I see what it is. I don't know what else. So the image model should be coming soon.

Starting point is 00:03:40 The question that I had was will a code-focused agentic coding harness be a separate model, a different train? It feels like it's not a coincidence that this news is dropping on the heels

Starting point is 00:03:52 of Anthropics new model, Mythos, which sort of was announced loosely and the model card dropped yesterday, even though the model is not available yet to play around with. They break out Muse Spark, thinking against Opus 46, Gemini 3.1 Pro high, GPD 5.4X high, and then GROC 4.2. The way that they position it, it looks somewhat of a chart crime, but if you look, when you basically look at the top and you say that Muse Spark gets an 86.4. and it's in blue. Yep.

Starting point is 00:04:25 And then you look over and it's outperforming all the other models on that benchmark. Okay. You just sort of assume that like the highlighted blue means that it's outperforming. Oh, yeah, but it's not frontier at MMLU. Because there's a bunch of them as you get further and further down. So like Arc AGI2, Muse Spark dramatically underperforms. Oh, interesting. The other leading models.

Starting point is 00:04:47 Yeah. And there's a number of others where it is, there's plenty where it's overperforming. Yeah. There's plenty where it's underperforming. I, of course, we started asking it jokes right away. So I'll pull up the joke. I said, tell me a joke. Okay.

Starting point is 00:05:03 And it said, gotcha. Here's one for you. Why don't skeletons fight each other? They don't have the guts. Okay. But then it doesn't even let me react. It just says, want a different vibe. I've got dad jokes, nerdy ones, or Malibu appropriate surf puns too.

Starting point is 00:05:18 And so I'm like, wait, why would you think I want Malibu appropriate surf pun? You look in your previous chats. I have no previous chats. Because meta-AI crosses up Instagram, so when you search on Instagram, sometimes that prompts Lama or previous. Yeah, yeah. So I'm trying to get to the bottom of it, right? Like this feels like an oddly specific type of joke to ask me.

Starting point is 00:05:42 And so I said, why would you think I want Malibu appropriate surf puns? Ha-ha, fair call. Toss that in without you asking for it. I try to mix in different flavors of jokes. And surf puns just popped into my head as a random option. no special reason behind it. I just said, I don't believe it was random. And it says, you got me.

Starting point is 00:05:59 You got me? That wasn't my smoothest moment. But then it denies it again. Weird. It says, it's just like, what kind of joke do you actually want? No hidden motives here. I said, you gave a hyper-specific example based on my life, so I have to assume you were looking at my other account for inspiration.

Starting point is 00:06:16 I get why that felt weird. I'm sorry, it came off that way. And then it denies it again. I said, you're clearly lying. I mean, you have talked about being in Malibu on the internet for a full year. It's possible it got baked into the pre-training or something. I don't know. But yeah, I mean, what is personal super intelligence if it doesn't even know your name?

Starting point is 00:06:33 Like that feels like they haven't dialed in the harness or whatever the tuning is to actually find tuned the responses. And of course, like meta is going to be hyper aware. We don't want a PR cycle. Yeah, yeah. Like they trained on your data, right? Everyone's been, oh, that ad was a little bit too close to home. And you remember every once in a while one of those like, a screenshot that's been screenshot like a thousand times, like goes viral.

Starting point is 00:06:57 And it's like, I do not give Mark Zuckerberg. Oh, yeah, yeah, yeah, yeah. Like that works. Yeah. It's hilarious. This is, is this a rebuttal to the bench hacking allegations that happened last week, or last year. So according to Meta's internal benchmark test,

Starting point is 00:07:17 Mew Spark outscored Google Gemini on some tests and was competitive with models from Open AI and Anthropic on others, it significantly outscored XAI's GROC on most tests. Alexander Wang's hiring followed the disappointing release of META's previous model called Lama 4. The company was accused of and later admitted to gaming a third-party benchmark that it used to rank various models against each other on performance. It also delayed the rollout of its biggest model called behemoth, which it never ultimately released. And so when I look at a model card like this where you could call it a chart crime where, you know, it's highlighted in blue and it feels like it's the best, but it's actually.

Starting point is 00:07:52 doing better on some. It does well and health bench hard. It underperforms on the ARCAGI 2 as you mentioned. But this maybe is the bull case here is that they have at least moved on from the culture of like optimizing for the benchmarks, right? Isn't that a good thing? There are rumors about them. Like there was like extra bonuses if they if they got number one on Elm Arena. I think that was like something like the rumor. Yeah. But yeah I mean you've seen a lot of the labs kind of move away from benchmarks generally because I think they're just not that meaningful anymore. Like a lot of them are like basically so saturated. It's like they're competing between 89 and 91%. Yeah.

Starting point is 00:08:27 And they're just like not very meaningful like you see. And you won't like actually feel that in the product necessarily. Yeah. You kind of need to talk to these things for a long time before you can actually get the vibe. Yeah. But I do think, um, uh, this news is very interesting in the context of the, you know, clawonomic stuff. The dashboard, yeah. Because like what, okay, what does it mean if, if, um, the entire company has been like maxing their, their cloud tokens. Yeah. Uh, over the past month. It means that they weren't using this model. Yeah. To me it means they need to commoditize their complements, right? They need to bring down that cost potentially.

Starting point is 00:08:56 And if they're, I mean, we sort of dug into, are they spending a billion dollars a month? Seems like absolutely not, but they're clearly spending a lot. And if you can turn that OPEX into CAPEX and train your own model and then inference it much cheaper on your own hardware, that feels like just an economic opportunity that makes a ton of sense in the context of just 10,000, 20,000 engineers writing a lot of code using their own I think there's basically like two ways to like square those two things happening, like either one. This model's like not that good because the engineers aren't using it or, you know, your theory that they're just distilling cloth. So one of those is true.

Starting point is 00:09:32 That is not my theory. That is the schizo theory. The news this morning, meta platforms and the information, meta platform is taken down internal employee built leaderboard tracking how many token staffers were using. Showed total usage over a recent 30-day period. Amounted over 60 trillion tokens. the dashboard now displays a message that is offline. It says we really enjoyed building this app on Ness for everyone. It was meant to be a fun way for people to look at tokens,

Starting point is 00:09:56 but due to data from this dashboard, being shared externally, we've made the decision to shudder it for now. It seemed like a fun side project. Mike Isaac was reporting on it here. He said it's down, unclear to me if this was a homespun one by employees or an official one, employee projects come and go frequently, conspicuous timing, though. But yeah, you don't want to have, you want to measure the output,

Starting point is 00:10:19 but the impact, not necessarily the input and how much is going on there. Lysan Al-Gaib says, META might actually be back with Mew Spark, still behind Open AI Anthropic and Google, but ahead of XAI and Chinese labs. Mew Spark stores 52 on the artificial intelligence analysis index behind only Gemini 3.1 Pro, Gemini GPD 5.4 and Claude Opus 4.6. Mew Spark is the first new release since Lama 4 in April 2025

Starting point is 00:10:45 and also META's first release that's not open weight. So a huge jump up in performance across a variety of benchmarks. So all good stuff there. The market is thrilled that META has released a close to frontier level model, right? This is a new group. They've been out of for less than a year. The stock is up almost 8% today. And again, so much of the pricing pressure, the downward pressure on META has just been kind of uncertainty on what all these tens of millions of dollars.

Starting point is 00:11:17 will actually go towards and what will be accomplished. And still unclear, like, are they going to go after CodeGen at all? Are they just going to try to compete on the consumer LLM side? Very, very. And can you economically go after CodeGen if you're just using it for internal models? If you're not selling it externally, can you justify the CAPEX just purely on the internal usage? Having this model be vended into all the different family of apps makes a lot of sense because they have billions of users that will wind up interacting with this in one way.

Starting point is 00:11:47 or another. Yeah, the question is, will they try to send meta vibes? Again, with the new model. All the way up to the top of the App Store charts. Meta's new family of AI models can reach the same performance as Kimmy K2 with only 30% of compute and only 10% of the compute to reach Lama4 Mavericks. So a much more efficient computing frontier here. Meta Spark is an early data point on our trajectory and we have larger models in development.

Starting point is 00:12:14 development. So the mythical 10 trillion parameter model. That is the 10T is what everyone's working on right now, 10 trillion. Yeah, probably in that range. Yeah, it's all rumored at this this. Yeah, rumored GPT4 was something like a trillion, right? You remember those memes where it's like a small circle and then the big circle is a huge circle. GPD4, GPD5? Yeah. Martin Casado has a little bit more context on like what actually unlocks new capabilities in AI models. He says, Mythos appears to be the first class of models trained at scale on Blackwells. Then there will be Vera Rubens. Pre-training isn't saturated.

Starting point is 00:12:53 Narrative violation. RL works. And there's so much computing coming online soon. Buckle your chin straps. It's going to be wild. The scaling laws. And you know Brad Gersner had to come in with a hundred. A hundred.

Starting point is 00:13:06 Yep. For sure. Yeah, there's a crazy bullcase for Nvidia in the information, arguing it should be worth, what, $22 trillion. That is a wild move. There's a lot going on. The scaling laws holding is the most important part of this. Article from the information finance. NVIDIA worth $22 trillion.

Starting point is 00:13:25 This old school financial model says yes. The big news on yesterday was Anthropics new model mythos. Some really impressive statistics and anecdotes yesterday, both the model card, the benchmarks, and some stories about breaking out a variety of, What do they call them? Wald Gardens or test environments? I don't know.

Starting point is 00:13:48 The Simulation. The sandbox. Yeah, breaking out the sandbox, sending emails, all sorts of stuff like that. The model preview is only available right now to about 50 companies that maintain critical infrastructure because the model is particularly good at finding zero days, bugs, and exploits in technical systems. And if they, you know, they leak that out before big companies have time to go and address all the bugs, there could be serious, you know, serious ramifications for cybersecurity. And so key partners include Apple, Google, Microsoft, Amazon, NVIDIA, JPMorgan,

Starting point is 00:14:22 Broadcom, the Linux Foundation, Cisco, Crowdstrike, and Palo Alto Networks. They're all listed on the cybersecurity focus page for Project Glasswing. Chris Backy was having a little bit of fun because he noticed Anthropic put their own logo on the partner page, which is a little bit funny, but at the same time, it's kind of smart because a lot of people are just going to see the image quickly, and it's good to position yourself with the other companies. So, yeah, it is interesting. I mean, people have predicted that AI models would be particularly good at cyber attacks, and this is one of the main sort of vectors of AI fears.

Starting point is 00:14:53 It feels like this is what maybe what Dario was referring to when he was talking about the end of the exponential finding and exploiting software bugs. It's sort of perfectly in the sweet spot for coding agents and reinforcement learning, combing through piles of code, tirelessly trying different, exploits to find bugs, having a clear verifiable reward. Did you crash the system or not? Did you break into the system or not? It's a very clear binary signal that you can send to the model to determine were you successful in breaking into that system. And it requires basically no time delay. There's no lag. So there was one snarky tweet I saw that was something to the effect

Starting point is 00:15:34 of like, okay, then if it's so good, go cure cancer. But any application that requires a real-world feedback cycle, even if it's just a few minutes of human interaction in the cancer example, you're going to need to be testing the drugs in vitro in mice, in monkeys, in humans at some point, or even if you're just sequencing DNA or doing anything in the lab, pipeting anything, if it's even just a few minutes, all of a sudden every iteration, every attempt is going to take a few minutes, and that's going to put you on just a wildly different exponential, as opposed to being able to spin up a virtual machine with basically every single piece of software out there and then try every single exploit against every single piece of software,

Starting point is 00:16:15 and you wind up with a ton of exploits. And very, very bullish for cybersecurity that this is being done preemptively. There's a whole bunch of different discussions. Ben Thompson has a good piece on the whole decision to release the model or not and stage it out and the go. to market there, but even if the bio research, the other impacts are on sort of a slower exponential, there's still so much opportunity in even a software-only singularity. There's also risk in a software-only singularity.

Starting point is 00:16:49 We've seen this story before, though, a model that's too powerful to release but then works its way out and has pretty moderate impact on the world. This was the story of GPD 2, the story of chat GBT, the question of, is this the model that's dangerous to put in the hands of people. Yeah, a headline from February 22, 2019 by Aaron Mack. OpenAI says it's text-generating algorithm GPT2 is too dangerous. So there is a, I think Van Thompson called it like the boy who cried wolf syndrome, the mythos wolf.

Starting point is 00:17:24 He says there's a lot of skepticism about Anthropics announcement. This tweet was representative from Bucco Capital Bloch. Anthropics marketing strategy is so funny. Like, ah, the government is treading on me. our models are so good, we can't release them. It would be too dangerous. Someone stop me, I'm going to destroy the economy. The rolling of the eyes is exacerbated by the fact

Starting point is 00:17:40 that Anthropic has reasons to not make mythos widely available beyond a lack of compute. Another factor is surely trying to avoid having mythos distilled by Chinese model makers. So there's actually two good reasons to gate access. And when you're looking at those logos, when you're looking at the world's largest tech companies, there's much more.

Starting point is 00:18:03 ability to scale rollout, demand, set pricing. These companies might be able to pay more. The model is very expensive, but if you're justifying that against bug bounties for zero-day exploits in your most critical system, when you look at like J.P. Morgan Chase, it's a bank. Like, what is the price of finding an exploit in that system? It's pretty high. It probably clears the token hurdle a lot. And if the rollout is paced like evenly across all the different companies, they'll all sort of of understand that they're getting allocation, inference allocation at the efficient price that clears the cost to actually serve the model. So I do think the systems, all of these 10 trillion parameter models will be released soon broadly.

Starting point is 00:18:47 And the main reason that an AI that's smart enough to find zero-day exploits should be able to recognize that it's being used by a bad actor to find zero-day exploits. It's only been a few months since the last flurry of competing models from OpenAI, Anthropic and Google. And the next cycle is already off to an aggressive start. We had meta. And then the other news is that Elon Musk announced that he is getting ready to do another larger model with XAI. He's got a few. He's doing seven models in training. Wow. That is a lot. Imagine V2, two variants of one trillion, two variants of 1.5 trillion, a six trillion model and a 10 trillion model. He says there's some catching up to do. But he says, he says,

Starting point is 00:19:30 he will never give up, never. So he is continuing to grind and train more models. Mike from also Capital, former guest says we've decided not to release our latest investment strategy. It's so powerful. Releasing it might end the entire venture asset class as we know it. Yeah. He says you should release it to a handful of trusted partners so that we can harden ourselves. George Hot says Anthropics marketing strategy. It's amazing. It's so powerful. It's terrifying. And the best part is you can't come. By the way, if Anthropics, had any way to ship this, they would.

Starting point is 00:20:02 Trained AI models are the fastest depreciating asset in history. GPD4 cost $100 million to train two years ago and is now worth less. Quinn 3.527B, 1 million. Sending the FOMO back, clock is ticking, boys. It needs something like an NBL 72 to run a decent speed and even absurd API pricing doesn't cover it. There's more to be made on investor hype than API access.

Starting point is 00:20:24 I just wish for honesty instead of a whole fake spiel about safety, Who remembers when GPT2, 1.5B, was too dangerous. And so lots of back and forth. Dean Ball has some more thoughts on mythos. It's a longer post, so we'll let you go and read it. The main take is just the, you know, this is technology that whether it comes from Anthropic or another lab, like clearly needs to go into the supply chain of the world and in the U.S. government and the U.S. economy because no one is doubting, even though some of the exploits

Starting point is 00:20:53 were somewhat minor, no one disagrees that we need less cybersecurity. We want the most secure systems possible, and we probably want a lot of competition between different companies to provide that service to the government. And so hopefully if the war comes to an end and there's, you know, different discussions can happen and, you know, ice can thaw, and there's a way to, for these companies to work together. Even if the supply chain thing doesn't go through and then anthropic can vend technology through Project Glass Wing, through CrowdStrike, through, Oracle and other partners to Cisco so that at least the systems are secure because everyone wants that. So he says a lot of people, including people in positions of authority, told us recently that models of Mythos' capability wouldn't be a thing that models with obvious national security implications

Starting point is 00:21:45 would not be forthcoming. Those people were wrong. There's nothing to do about it, but you should remember it. Mythos is the first model where theft of the weights by an adversarial actor feels like it would be a major deal. You better believe they will try. and if they don't succeed with Mythos, they will eventually. We are thoroughly in the era of the lab's best models

Starting point is 00:22:02 may well not be in public the way they used to. This is because of a combination of compute constraints, economic reality, competitive advantage and safety concerns. Three means the most relevant models may be decreasingly legible to the general public. And depending on the extent and duration of the coming compute squeeze, we could enter a market dynamic where the best models are only available to the highest bidder.

Starting point is 00:22:22 In other words, where compute is a seller's market rather than a buyer's market. Interesting. Imagine competing firms in the economy, bidding against one another for access to the best and most tokens and the frontier labs as, in essence, kingmakers. The governance regime I have described above in four is not designed to stop that dynamic. Scoop from Stephen Nelson. The CIA used a secret tool called ghost murmur to find airmen in Iran. Yeah. Ghost murmur pairs long range quantum magnet magnetometri. Sensors with AI to find human heartbeats. I was wondering this while they were over the weekend,

Starting point is 00:23:00 there was a search going on. How does somebody like an airman that's down send a signal that can be picked up by one group, but not? This is very odd. So there are some community notes on this saying that quantum magnetometry. Magnetometry. I imagine that's what you pronounce it. Detects heart magnetic fields.

Starting point is 00:23:23 And I believe this technology works in life. labs, but only up to a few meters, not 40 miles as claims, has claimed fields decay with one over R cubed, making long range detection implausible. So unclear if this is what worked, but there has to be some sort of device that you could carry on your person, like in your shoe, like an air tag that can talk to a satellite almost. Like you look at the Starlink receiver dish. It would fit in a backpack, but that's very high bandwidth.

Starting point is 00:23:54 I imagine if you had something, I mean, there's sat phones that are the size of large cell phones. That was available in the 80s and 90s. You have to imagine that if you're just trying to put out a signal to GPS or a Starlink network, you must be able to shrink that down significantly to the place where it could be carried on your body. But it's probably classified. So I would be surprised if it's just very hard to read into what's real and what's, what's not here. There isn't, there is a different community note pushing back saying no note needed. This new technology is a classified system

Starting point is 00:24:31 developed in secret by Lockheed Skunkworks and the CIA that was just used, revealed publicly for the first time. Naturally, it's reported capabilities far exceeding the known public state of the art. The note is relevant. So it's very, very interesting. Anyway, thank you so much for tuning in today. A bit of a shorter show. We're experimenting with different things. Obviously, we don't have ad reads anymore and so we are going to be mixing it up with more stories, more interviews, different timing, and more flexibility. And so we hope you enjoyed this show, and we will see you tomorrow at 11 a.m. Pacific,

Starting point is 00:25:03 Sharp. Goodbye, smoke. We love you.

TBPN - Meta’s AI Comeback Moment, Claude Mythos | Diet TBPN

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.