The AI Daily Brief: Artificial Intelligence News and Analysis - The Week Where AI Changed (Or Did It?)

Starting point is 00:00:00 When the history books look back at this week in AI, they will definitely point to DeepSeek as the driving force. But did this week actually change everything as it seemed like it might at the beginning? Or was it all a bit overblown? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief. Quick note, obviously we've had a couple of days of interviews in a row and there has been a lot going on. So this episode is going to be kind of an engagement.

Starting point is 00:00:32 extended headlines mixed with the main episode. The big dominant theme continues to be deep seek and everything we've learned about it. And so we are going to pick up the story that we left off a couple of days ago. But as you'll see, we're getting into AI earnings season as well, as well as a bunch of open AI rumors. So we'll connect all the dots. But where we are going to start, like I said, is where we left off when it comes to this deep seek conversation. Now, as we were leaving off, one of the big questions was how legitimate these breakthroughs were? Was this real innovation? Was this stolen innovation? Were the costs actually that low? Were they being subsidized? Was it true that these models were actually trained for what they cost? Or were their secret Nvidia chips? One of the

Starting point is 00:01:09 notable things about this story was how fast it rose up the ranks in the White House. Clearly, there was a geopolitical dimension to this. And perhaps unsurprisingly, the White House seized upon accusations that Deepseek hadn't actually been all that innovative. Speaking with Fox News on Tuesday, AIsar David Sacks said, there's a technique in AI called distillation when one model learns from another. They can essentially mimic the reasoning process that they learn from the parent model. There's substantial evidence that what DeepSeek did hear is they distilled the knowledge out of OpenAI's models. One of the things you're going to see over the next few months is our leading AI companies taking steps to try to prevent distillation. Sacks appears to be referring to reporting

Starting point is 00:01:44 from the Financial Times, who quoted OpenAI sources as stating that they had, quote, seen some evidence of distillation, which it suspects to be from Deepseek. Now, Sacks explained it pretty well, but basically distillation allows AI labs to train a model based on synthetic data created by a larger, more performant model and retain much of the performance. OpenAI has actively encouraged model distillation in the past, even launching their own platform for carrying out the process in October. Deepseek also documented their model distillation technique in the technical paper but didn't identify the parent model. Now, OpenAI obviously isn't going to try to pursue an IP lawsuit against a China-based company, but from the perspective of the administration, at least in the short term,

Starting point is 00:02:19 it allows for some amount of narrative to downgrade Deepseek's achievement. The argument goes basically, training a capable frontier model from scratch is a difficult and time-consuming task, with assembly of the training data being one of the biggest hurdles. Distilling a new model from outputs from a more capable model is a substantially easier task. It also implies that R1 only proves that DeepSeek is capable of replicating leading U.S. models, rather than topping the benchmark with performance breakthroughs. Now, I will say that one of the more common responses to OpenAI raising questions about model distillation was a sort of pot-calling the kettle black argument, given that they are embroiled in a number of copyright lawsuits.

Starting point is 00:02:52 But ultimately, like I said, at this point, I think that this distillation idea is much more about controlling the narrative than anything else. The claim to less than $6 million training costs for DeepSeek has been another big part of the conversation. And for some technical folks, the claim seems to stand up. Jack Clark of Anthropic commented, The most surprising part of Deepseek R1 is that it only takes around 800,000 samples of good reinforcement learning reasoning to convert other models into RL reasoners. Now that Deepseek R1 is available, people will be able to refine samples out of it to convert any

Starting point is 00:03:21 other model into an RL reasoner. Accelerate harder did the math on how much compute would be needed to build DeepSeek's Foundation model posting. DeepSeek V3 has 37 billion active parameters. They trained on 14.8 trillion tokens. Flops estimate to train 37 billion parameters times 14.8 trillion tokens is 3.3E24 Flops, totally achievable with 2.8 million H800 hours. For people who don't buy this, where exactly do you think the extra compute is being spent?

Starting point is 00:03:47 Earlier in the week, investor Naval Ravicon had posted, small technical teams are already starting to confirm that the techniques and resulting cost savings are real. And of course, we have a way to actually figure this out or at least get closer to the truth. Small-scale experiments to replicate and uncover whether the breakthroughs were genuine were spun up basically as soon as the paper was available last Monday. A Berkeley lab led by GI Pan has already completed a tiny proof of concept. They trained an extremely small 1.5 billion parameter reasoning model for just $30 in compute. Junction He, an assistant professor at Hong Kong University of Science and Technology, has published a larger replication. His team added reasoning,

Starting point is 00:04:21 to the Quinn 7B model using 8,000 reinforcement learning samples. Technically, this was actually an independent concurrent discovery of the process, with his team working on the project for the last two months. A full-scale replication attempt is currently underway at HuggingFace. Their team is repeating the method described in the DeepSeek paper using their own data set as Deepseek hasn't disclosed theirs. The project uses the HuggingFace science cluster, which contains 768 Nvidia H-H-100s.

Starting point is 00:04:45 This should be roughly equivalent to the limited resources claimed by the Deepseek team. Elie Bacuch, one of the engineers on the project, said, The R1 model is impressive, but there's no open dataset, experiment details, or intermediate models available, which makes replication and further research difficult. Fully open sourcing R1's complete architecture isn't just about transparency. It's about unlocking its potential. Whatever the truth of the claims,

Starting point is 00:05:07 the fact that some people are skeptical has not at all stopped U.S.-based AI startups from beginning to adopt DeepseekR1. Generally, there are two categories that we're seeing. The first are AI startups that serve other companies' models through their own UX, and these were, of course, some of the fastest to add R1. Perplexity is a lead example, standing up access earlier in the week. Interestingly, the perplexity team have managed to set up a version of Deepseek with their own system prompts to circumvent Chinese content controls.

Starting point is 00:05:32 Practically, that means the model will now explain what happened in Tiananmen Square, or why Winnie the Pooh memes were popular in Hong Kong a few years ago. On Wednesday, Microsoft announced that they had made R1 available on Azure AI Foundry and GitHub. Amazon followed suit the next day, adding R1 to AWS Bedrock and SageMaker. Apple didn't make any integration moves, but CEO Tim Cook did say during an earnings call, I think innovation that drives efficiency is a good thing, and that's what you see in that model. As a whole separate conversation, beyond the scope of what we're doing here, this ability to plug and play different models and switch them out at will

Starting point is 00:06:03 highlights how small the mode is in many circumstances for AI model companies. The switching costs are so low that startups and enterprises can quickly plug the latest model into their existing infrastructure. Now, the second type of adoption that we're seeing is startups using R1 in their own work. Pat Gelsinger, former Intel CEO and chairman of Glu, told TechCrunch, my glue engineers are running R1 today. They could have run 01. Well, they could only access O1 through the APIs. Their team is currently working on an AI service called Com, which will offer a chatbot and related features. Gelsinger said that with the help of R1, his team

Starting point is 00:06:34 expects to have rebuilt Calm, quote, with our own foundation model that's all open source. That's exciting. Gelsinger's big picture view is that R1 has proved not only that AI will be affordable enough to be everywhere, but high-performance AI will be everywhere, commenting, I want better AI in my aura ring, I want better AI in my hearing aid, I want more AI in my phone, I want better AI in my embedded devices. Framing his view on technology, he wrote on X, Wisdom is learning the lessons we thought we already knew. Deepseek reminds us of three important learnings from computing history. One, computing obeys the gas law. Making it dramatically cheaper, will expand the market for it. The markets are getting it wrong, and this will make AI much more

Starting point is 00:07:08 broadly deployed. Two, engineering is about constraints. The Chinese engineers have limited resources and they had to find creative solutions. Three, open wins. DeepSeek will help reset the increasingly closed world of foundational AI model work. Box CEO Aaron Levy commented, anyone building Enterprise AI applications knows that the cost and quality of AI

Starting point is 00:07:26 are the only two factors that matter in AI adoption right now. This is why Deepseek's breakthroughs are such a big deal. Enterprise AI is the rare category of technology where the use case demand generally far exceeds the ability to satisfy all of these use cases well. This is fantastic. The opposite would mean that there's less demand than the tech is capable of,

Starting point is 00:07:42 but of course, is only good news if you can eventually meet the demand. Chipmaking startup Cerebris is using DeepSeek as an opportunity to demonstrate their technology. They plan to host a version of R1 on their U.S. servers and powered by their wafer-scale hardware. Traditional GPUs are built using a single chip cut out of a larger wafer during the manufacturing process. These are then networked together to construct AI training and inference clusters. The architecture Cerebris is built allows for multiple GPU cores on a larger wafer to function as one large chip, about the size of a manhole cover. This places the networking on the chip, allowing for much faster communication than external wiring.

Starting point is 00:08:14 Cerebrus says their servers can run the 70B version of DeepSeek R1 57 times faster than GPU-based solutions. This is particularly important for reasoning models, which use significantly more compute at the inference stage to generate responses. And while R1 seems to be more efficient than some U.S. models, inference demands are still very high. Remember, the market crash at the beginning of the week was largely about the fear that Deepseek meant that demand would drop for AI chips and data centers.

Starting point is 00:08:37 It seems most market analysts believe the bulk of the hundreds of billions of spending for AI labs went into infrastructure for training. MetaChief scientist, Jan Lecun, refuted this idea, commenting, major misunderstanding about AI infrastructure investments. Much of those billions are going into infrastructure for inference, not training. Running AI assistance services for billions of people requires a lot of compute. Once you put video understanding, reasoning, large-scale memory, and other capabilities in AI systems, inference costs are going to increase. The only real question is whether users will be willing to pay enough directly or not to justify the CAPX and OPEX.

Starting point is 00:09:07 So the market reactions to Deepseek are woefully unjustified. Indeed, one of the really big takeaways of this week is that much of the AI race is about inference. In other words, models from multiple labs in both China and U.S. are good enough for many tasks at this stage, and the competition is instead based on who can serve the cheapest, fastest, and most stable AI. Koho Ose, a partner in Matrix Ventures writes, under-discussed deep-seek implication. If we can turn any decent-based model into a powerful reasoning model,

Starting point is 00:09:34 compute spend shifts more dramatically to inference. Meanwhile, Perplexity CEO, Ravanshritavos, believes the implications go much deeper. Posting, TestTime compute is currently just inference with chain of thought. We haven't started doing test time training where model updates waits to go figure out new things or ingest a ton of new context without losing generality and raw IQ. Going to be amazing when that happens. Another big conclusion being drawn is that everything in the AI stack is getting commoditized at a breakneck pace.

Starting point is 00:09:59 Other than the final user experience, Peter Yang, principal product lead at Roblox wrote, soon people will care more about their favorite AI apps than the models powering them. I don't care which model is powering perplexity, granola, or replet. I care more that they have high craft and thoughtful U.S., lightning fast speed, and seamless integration into my workflows. It's a great time to build AI apps. Responding to a revelation that Cursor has 100% adoption among Stanford CS undergrads and Y Combinator founders, Suhal Doshi, the founder of Playground AI, commented, App Player will win. Everything else will get commoditized.

Starting point is 00:10:31 You won't even know what model is used under Cursor soon. It's just the best one because you trust them. And just as we got some comments from OpenAI, we also got comments from the leadership of Anthropic, specifically Dario Amade. On Wednesday, he condensed his thoughts into a blog post entitled On Deepseek and Export Controls. His central premise was that Deepseek is not significantly ahead of U.S. labs.

Starting point is 00:10:51 He noted that the media narrative has latched onto the idea that Deepseek had spent $6 million to achieve a model that would cost U.S. Labs billions to train. Amade disclosed that Claude 3.5 Sonnet didn't cost billions to train. its costs were in the tens of millions of dollar range. The expensive part is the gigantic data centers required to serve inference for the models once they're released to the public. He claimed,

Starting point is 00:11:10 DeepSeek produced a model close to the performance of U.S. models 7 to 10 months older for a good deal less cost, but not anywhere near the ratios people have suggested. Amadei explained that U.S. companies have observed an annual Forex reduction in training costs for several years, adding, Deepseek v3 is not a unique breakthrough or something that fundamentally changes the economics of LLMs. It's an expected point on an ongoing cost reduction curve. What's different this time,

Starting point is 00:11:31 that the company that was first to demonstrate the expected cost reductions was Chinese. Amadei did acknowledge that some of the compression and optimization techniques present in the Deepseek paper are genuine innovations. However, he expects these techniques to now be applied at a much larger scale by leading labs at the U.S. and China, keeping them on the same cost reduction curve. He concluded, the performance of Deepseek does not mean the export controls failed. Deepseek had a moderate to large number of chips, so it's not surprising that they were able to develop and then train a powerful model. They were not substantially more resource constrained than USAI companies and the export controls were not the main factor causing them to innovate. They are

Starting point is 00:12:03 simply very talented engineers and show why China is a serious competitor to the U.S. Now, another area that discussion of DeepSeek showed up this week was in big tech earnings calls. You heard that Apple's Tim Cook was asked about it, but Meta, who reported this week, could be more impacted than most after the company bet it all being the leader in open source AI. However, during Wednesday's earnings call, CEO Mark Zuckerberg didn't seem the least bit concern about new competition out of China. He said, I think there's a number of novel things that they did that I think we're still digesting. And there are a number of things where they have advances that we will hope to implement in our systems. And that's part of the nature of how this works,

Starting point is 00:12:39 whether it's a Chinese competitor or not. With such a short time since the world recognized the pace of Chinese development, Zuckerberg added, it's probably too early to really have a strong opinion on what this means for the trajectory around infrastructure and CAPEX and things like that. There are a bunch of trends that are happening here all at once. Note meta has committed to spending $60 billion on new data centers this year. And if anything, Deepseek has increased Zuckerberg's conviction that the investment will pay off. He commented, I continue to think that investing very heavily in CAPEX and infra is going to be a strategic advantage over time. It's possible that we'll learn otherwise at some point, but it's way too early to call that. At this point, I would bet that

Starting point is 00:13:14 the ability to build out that kind of infrastructure is going to be a major advantage for both the quality of the service and being able to serve the scale that we want to. Now, internally at meta, the mood seems a little more urgent. A recording of the company's first all hands of the year was leaked late this week. We will skip aggressively over discussions of changes to content policies, the end of the company's DEI training and impending layoffs. Suffice it to say, there seems to be a reasonable level of discontent within the company, but much of the meeting was focused on their AI strategy. Zuckerberg is gunning to get penetration with Lama's free and open source approach, stating, I'm always looking for ways that we can convert the strength

Starting point is 00:13:49 of our business model into delivering a higher quality product to people. We have a model that's competitive with the best models out there and we offer it for free. We're not charging 20 or $200 a month or whatever. Now, I think that there might be an opportunity to do even more. We can deliver even higher quality answers than other people in the industry could deliver and also make that free. Addressing Deepseek, he said, whenever I see someone else do something, I'm like, ah, come on, we should have been there, right? We've got to make sure that we're on it. Zuckerberg also tried to assure his team that they weren't going to be replaced by AI. Referencing the company's plan to build a high-quality coding agent, he said, does that mean that we're

Starting point is 00:14:20 not going to need engineers? Actually, the opposite. If an engineer can now do a hundred times more work, I want a lot more engineers, right? I would guess that we're going to be able to train AIs to do a better job than a lot of the human reviewers. It's probably not the case that that kind of flip will happen until next year. Overall, Zuckerberg left with a parting message, which I think listeners to this show this week will have no trouble understanding. It's going to be an intense year, he said, so buckle up. We've got a lot to do. I'm excited about it. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security

Starting point is 00:14:54 professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2701. Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company. Join over 9,000 global companies like Atlassian, Kora, and Factory who use Vantage to manage risk and prove security in real time.

Starting point is 00:15:39 For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nLW. That's VANTA.com slash NLW for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode. That's why Super Intelligence is offering a new product for the beginning of this year. it's an agent readiness and opportunity audit.

Starting point is 00:16:20 Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit,

Starting point is 00:16:38 reach out directly to me, NLW at B-Super.A.I. Put the word agent in the subject line so I know what you're talking about, and let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI quarterly Pulse survey. Did you know that 67% of business leaders expect AI to fundamentally transform their

Starting point is 00:17:02 businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate. these hurdles. They're not just talking about AI. They're leading the charge with practical solutions and real-world applications. For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking

Starting point is 00:17:28 to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at KPMG.com slash US. Now, from a narrative perspective, that would be a wonderful place to close, but we've got to hit a few more stories before we get out of here. First, a couple quick model updates, which, while not being about Deepseek, are clearly still about trying to advance and being seen to advance, Google appears to be on the cusp of releasing their next iteration of their flagship model, Gemini 2.0 Pro. The model showed up in a change log for the Gemini chatbot app. In the models blurb, the change log said, whether you're tackling advanced coding challenges like generating

Starting point is 00:18:05 a specific program from scratch, or solving mathematical problems like developing complex statistical models, 2.0 Pro Experimental will help you navigate even the most complex tasks with greater ease and accuracy. During this week's hype, many noted that Google already offers a model that's pretty close to Deepseek R1 and 01 Mini in quality. Gemini 2.0 Flash is priced similarly to R1 for API access once the Chinese model's introductory offer expires next month. This week, Google has made it the default model for use in the Gemini app, and their reasoning mode is also available for free via Google's AI studio. Back in China, other labs are demonstrating their capabilities as well. Alibaba released Quen 2.5 Max, claiming to outperform Deepseek R1, GPD40, and Claude 3.5 Sonnet across a range

Starting point is 00:18:45 of reasoning and knowledge benchmarks. Alibaba also highlighted that they use a mixture of experts' architecture to increase inference efficiency. This is one of the approaches to deal with resource scarcity issues that we've seen in Deepseek's V3 and R1 as well. And then there's open AI. Outside of discussions of how their O1 model stacks up compared to R1, or was the actual progenitor of R1, there were a bunch of other stories surrounding them as well. Maybe most notably, OpenAI's investors don't seem to be worried about Chinese rivals, with the Wall Street Journal reporting that OpenAI is in early talks to raise up to $40 billion in a round that would see the company valued as high as $300 billion. SoftBank is reportedly leading the round and looking

Starting point is 00:19:22 to take the bulk of the deal by investing between $15 and $25 billion. Curiously, the Wall Street Journal originally published the valuation at $340 billion, but later revised the story. They commented, after the Wall Street Journal published that figure in an earlier version of the story, our source said newer negotiations lowered the proposal valuation to as much as $300 billion. They also clarified that that $300 billion figure is a post-cash valuation, so it seems this was a genuine price drop during negotiations. Still, wherever the figure lands, it's a record-making deal. OpenAI's last round in October raised $6.6 billion at a $157 billion valuation. To double that in just a few months would be extraordinary even by OpenAI's own

Starting point is 00:20:00 standards. The deal would make OpenAI the second highest value startup in history behind only SpaceX. The Wall Street Journal also reported that the deal is intended to fund OpenAI's $18 billion share in Project Stargate as well as general operations. Part of the justification is that OpenAI's premium subscription seems to be driving a revenue boom. When OpenAI launched their $200 per month pro tier, to many, it seemed like a stretch. The subscription allowed unrestricted access to all of OpenAI's models, including the Sora video model and the 01 reasoning model, and also added a pro mode for 01 that gave more extensive answers that replicated research reports. This month's release of the operator was also exclusive to the pro-tier. Still, $200 per month is a hefty price tag for all but the power users.

Starting point is 00:20:41 In fact, Sam Alvin complained earlier this month that the pro-tier was actually being sold at a loss because, quote, people use it much more than we expected. Still, according to the information, the price tag doesn't seem to be turning that many people away. They reported that revenue from pro-tier subscriptions has now surpassed business team subscriptions, meaning that the pro-tier has hit 300 million in annualized revenue. OpenAI also launched ChatGPT for government, a new version of the chatbot platform designed, as you would expect for government use. It's similar to the enterprise tier of chat GPT, allowing users to create custom GPs and share conversations across a workspace, but also allows agencies to host a selection of OpenAI models in government cloud infrastructure. They'll be able to

Starting point is 00:21:18 configure their own security, privacy, and compliance standards, and OpenAI says that the tailored product could help expedite the approval of the company's tools to handle non-public-sensitive data. Finally, and relatedly, on Thursday, OpenAI announced one of their largest scope government projects to date. The company will provide access to their O1 reasoning model to U.S. national laboratories, the network of R&D labs operated by the Department of Energy. According to OpenAI, up to 15,000 scientists will use O1 to, quote, accelerate basic science, identify new approaches for treating and preventing diseases, enhance the cybersecurity of the U.S. power grid, and deepen our understanding of the, quote, forces that govern the

Starting point is 00:21:52 universe from fundamental mathematics to high energy physics. The most chattered about part of this is that one of the research programs partnering with OpenAI involves nuclear defense. The company framed the program as being, quote, focused on reducing the risk of nuclear war and securing nuclear material and weapons worldwide, and OpenAI capped off their announcements by stating, this is the beginning of a new era where AI will advance science, strengthen national security and support U.S. government initiatives. Still, as you might imagine, a lot of the chatter on X, followed this pattern by Pink Moon Kate, who wrote, Open AI devs, we don't know how to control superintelligence. Also, Open AI, let's give them nuclear codes. Why does every day sound more and more

Starting point is 00:22:27 like the plot of a bad sci-fi movie? Now, of course, that's not exactly what's happening here, but the concern is perhaps understandable. And so, friends, that wraps what was a crazy week. Perhaps I should say another crazy week in artificial intelligence. Maybe the craziest part about this is that I think I've probably only said the word agent once or twice. In any case, I hope you feel now up to date. We will be back over the weekend with the Long Reads episode and back to our normal approach on Monday. For now, appreciate you listening or watching as always.

Starting point is 00:22:56 And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The Week Where AI Changed (Or Did It?)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.