The AI Daily Brief: Artificial Intelligence News and Analysis - The o3-to-AGI-Hype Pipeline

Episode Date: January 22, 2025

OpenAI's upcoming O3 model has sparked widespread speculation about its capabilities and potential impact. From hints at advanced reasoning to its implications for AGI development, the excitement ...is palpable. Meanwhile, rivals like DeepSeek challenge the playing field with cost-effective, high-performance alternatives. This episode unpacks the facts, dispels the hype, and explores the broader implications for AI innovation and policy. Brought to you by: KPMG – Go to ⁠www.kpmg.us/ai⁠ to learn more about how KPMG can help you drive value with our AI solutions. Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, OpenAI's O3 Mini seems to be coming soon, but could we also get PhD-level superagents? Before that on the headlines, in one of his first acts as president, Donald Trump, has revoked Biden's executive order on AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. We kick off today with something that was expected, but is still no less significant. Kickstarting the Trump era for AI development in the United States, the incoming president has repealed Biden's AI executive order.
Starting point is 00:00:40 President Trump spent much of last night repealing executive orders from the previous administration and signing his own. Among them was the October 2023 order, which governed the, quote, safe, secure, and trustworthy development of the use of artificial intelligence. It was largely directed at government departments to begin research on things including AI safety as well as AI standards. It established the AI Safety Institute within the National Institute of Standards and Technology, which was a body tasked with analyzing safety reports from frontier labs and considering guardrails that should be established in the future.
Starting point is 00:01:10 Functionally, the EO didn't do anything like ban any research, but it still came with some more administrative process, which raised the ire of congressional Republicans. The TLDR of their point was that the rules were anti-innovation. And now it's pretty clear that the restrictors are coming off as we head into this new Trump administration. Bob Gurley, the CTO of Uda wrote, and just like that, the executive order the AI doomers and D-cells work so hard to put in place has been rescinded. We have lots of problems in AI today, most of which require an ability to innovate faster. So rescinding this is a great move. BASSO commented, total EACC victory. We're just getting started.
Starting point is 00:01:44 Others, of course, are a little bit more hesitant. Former OpenAI policy researcher Miles Brundage said, So now that the AIEO is repealed, there's no legal obligation for AI companies to give the U.S. government any kind of status updates on the technology they're building, which leaders in the field think could threaten humanity. Staying in the government theme, although in a very different dimension, the Federal Trade Commission has raised concerns about partnerships between big tech and AI startups. Most recently, in a staff report on Friday, the FTC highlighted the competition issues stemming from partnerships between Microsoft and OpenAI, as well as Google and Amazon partnering
Starting point is 00:02:17 with Anthropic. FTC Chair Lena Kahn said in a statement, the FTC's report sheds light on how partnerships by big tech firms can create lock-in, deprive startups of key AI inputs, and reveal sensitive information that can undermine fair competition. The report specifically focuses on the provision of cloud services. It claims that the partnerships could impact access to computing research and engineering talent. It was also concerned that these partnerships could create a lock-in effect by increasing switching costs for customers. For example, open AI customers might find artificial barriers imposed if they try to switch away from Microsoft. Finally, the report highlighted the risk that cloud providers could have unique access to sensitive information.
Starting point is 00:02:52 It noted that at least one agreement granted access to model output data which could be used as synthetic data for training. Now, of course, it feels like this is the FTC positioning for a new administration. In addition to everything mentioned already, the FTC also questioned the circular spending inherent in these deals, in other words, the investment coming in the form of cloud credits, or dollars that were likely to be spent on cloud services, basically giving those big tech firms protection from loss. Still, Microsoft is standing by the partnership with their deputy general counsel stating
Starting point is 00:03:19 that the deal, quote, enabled one of the most successful AI startups in the world, and spurred a wave of unprecedented technology investment and innovation in the industry. At this point, the FTC has not filed any AI-related antitrust suits. Over in another area, however, the FTC has referred its investigation into SNAP's AI chatbot to the Justice Department. The FTC's non-public complaint involves allegation that Snapchat's addition of their My AI chatbot poses, quote, risks and harms to young users. The agency noted that, quote, although the commission does not typically make public the fact that it has referred a complaint, we have determined that doing so here is in the public
Starting point is 00:03:52 interest. The investigation stemmed from compliance monitoring following a 2014 settlement regarding allegations of public deception around data collection. Snap admitted that their chatbot is prone to hallucinations and willing to answer inappropriate questions. During an investigative report from 2023, a Washington Post reporter posing as a teenager was able to get advice to hide the smell of alcohol and marijuana. Notably, both Republican commissioners were absent from the meeting where the decision to refer was made. Commissioner Andrew Ferguson issued a dissenting opinion. He said he was not allowed to comment on the case as the details were not public, but said it ran a foul of freedom of speech protections. He commented,
Starting point is 00:04:26 I did not participate in this farcical closed meeting at which this matter was approved. Snap also bit back, saying that the company is focused on the thoughtful development of generative AI, and adding, unfortunately on the last day of this administration, a divided FTC decided to vote out a proposed complaint that does not consider any of these efforts, is based on inaccuracies and lacks concrete evidence. It also fails to identify any tangible harm and is subject to serious First Amendment concerns. Safe to say that when it comes to AI policy, a lot of the next 100 days is going to be the crazy jockeying and transition between two very different administrations. I'm sure there will be much more significant news than that we've
Starting point is 00:05:01 covered today, but for now, that is going to do it for this set of headlines. Appreciate you listening and up next, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2-2 and ISO-2701. Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you
Starting point is 00:05:44 with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back, so you can focus on building your company. Join over 9,000 global companies like Atlassian, Kora, and Factory, who use VANT to manage risk and prove security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com slash NLW. That's VANTA.com slash NLW for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.
Starting point is 00:06:36 That's why Super Intelligence is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Super.a.I. Put the word agent in the subject line so I know what you're talking about, and let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to
Starting point is 00:07:15 share some very interesting findings from KPMG's latest AI quarterly Pulse survey. Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications. For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game,
Starting point is 00:07:53 keep an eye on KPMG. They're not just a part of the conversation. They're helping shape it. Learn more about how KPMG is driving AI innovation at KPMG.com slash US. Welcome back to the AI Daily Brief. Is Open AI about to ship artificial general intelligence? The conversation that we're having today got started on Friday afternoon when Sam Altman announced that OpenAI's O3 reasoning model is close to release. He posted, Thank you to the external safety researchers who tested O3 Mini. We have now finalized a version in our beginning the release process, planning to ship in a couple of weeks. Also, we heard the feedback. We'll launch API and chat GPT at the same time. It's very good. The hype cycle began
Starting point is 00:08:32 immediately. Santigeneshats writes, O3 is coming. Brace for the AGI. In fact, there was so much of this type of discussion that Allman dove in, participating in a long discussion in the replies, to level set expectations. After McKay Riggily asked, are you able to speak of how capable O3 Mini is compared to O1 Pro? Altman said, worse than O1 Pro at most things, but fast. When Terris Bob wrote, sad, I want a model even smarter than O1 Pro, willing to pay. Altman said O3 is much smarter, we're turning our attention to that now. An O3 Pro, mind-blown emoji. In terms of who has access to this, the new model will be available to at least OpenAI Pro subscribers, in other words, the folks who are paying $200 per month.
Starting point is 00:09:12 Overall, after the weekend, Sam Altman came back to Twitter to say, Twitter hype is out of control again. We are not going to deploy AGI next month, nor have we built it. We have some very cool stuff for you, but please chill and cut your expectations 100x. Now, of course, when OpenAI first previewed O3 at the end of December, to many it was the first model that looked a little bit like AGI. It was the first to score 75% on the ARC-AGI benchmark, maybe the best yardstick we have right now for testing AGI style performance. However, that testing was done on the full model and used an incredible amount of compute. RKGI tests allow for a budget of $10,000 for inference for official ranking.
Starting point is 00:09:46 Unofficial OpenAI also completed a run using over $100,000 of inference and perform much higher. But that level of compute isn't feasible to deliver to the public, so we're getting something much smaller and consequently less powerful. So that doesn't mean this model won't be a paradigm shift in its own right. Chubby, for example, wrote, to explain again why 03 Mini is so important, we get a reasoning model that is better than full 01 and costs only a fraction. of it. At medium compute, O3 Mini is still cheaper at least a tiny bit than O1 Mini, but outperforms full O1 in code forces by more than 100 ELO. That means better reasoning for more applications and more users. Wider application leads to more insights and more breakthroughs. That's why O3Mini is so important. Henry Mao, the founder of Jenny AI, got specific. If O3Mini is cheap enough, it might just supplant
Starting point is 00:10:29 4O and Sonnet 3.5 for daily coding tasks. Blake C and app developer wrote, O1 Pro will take five minutes sometimes when you ask it to say fix some code, but It's like 2 to 3x better than Sonnet most of the time. If 03 Mini is 2x sonnet and the same speed, that will be nuts. TDM suggested that this isn't really about releasing a more performant model, but rather a step towards making OpenAI's reasoning models more cost-effective. They posted, so O3 Mini is basically just faster O-1. I think the primary reason they are releasing this is that the O-1 costs can't be reduced enough to sustain scale while not losing money on it.
Starting point is 00:11:01 Another would be for API devs to start using O3 Mini more instead of Sonet since it would be faster and smarter. And so, taking cues from Sam Altman, this really doesn't sound like consumer-grade AGI. And yet, there are other hints that OpenAI is approaching some very big things. Axios reported over the weekend that Sam Altman has been invited to brief the Trump White House next week. The article stated that, quote, a top company possibly OpenAI, in coming weeks will announce a next level breakthrough that unleashes PhD-level superagents to do complex human tasks. OpenAI sources said that they are, quote, both jazzed and spooked by recent progress. Interestingly, there haven't really been any public rumblings about OpenAI launching agents,
Starting point is 00:11:37 but it does seem to many that this is an area where the company has been lagging behind. And yet it seems like this might not be the case for long. Tibor Blahoe, for example, found references to agents in OpenAI's code. He tweeted, confirm the chat GPT MacOS desktop app has hidden options to define shortcuts for the desktop launcher to toggle operator and force quit operator. Operator is the name of OpenAI's forthcoming general purpose agent. The information previously reported that January was the intended launch month,
Starting point is 00:12:03 for Operator. Chubby once again also noted that OpenAI already has a comparison page on their website, showing operators' performance contrast against Anthropics' computer use mode and Google's mariner agent. They wrote, looks like release is imminent. The benchmarks in this leaked graphic, which we don't know if it's real, show a substantial step up from Anthropics model and a slight improvement from Google's dedicated web browsing agent in that domain. Still, it doesn't seem as though OpenAI have perfected computer use mode. For example, the leaked testing showed the agent could only successfully sign up for a cloud services account and launch a virtual machine 60% of the time. Responding to some of the hype, Kumar Aparanji, the head of automation at Cognizant, tried to tampedown
Starting point is 00:12:40 expectations of what these agents can do. He posted, no, this is not going to get us ASI or AGI. These are agents real-time yes, and can be useful too, uniquely in narrow cases expensively and others, but agents nevertheless, which means they call the models. The models need to provide the AGI and ASI, and they're not doing that anytime soon. Not even DeepSeek R1, although it is 27x cheaper than 01. Speaking of which, while these release rumors from OpenAI said Imaginations Racing, a rival Chinese lab sucked a lot of the oxygen out of the room with their latest model. Over the weekend, DeepSeek released their full version of the R1 reasoning model. Now, you might remember that we've talked about Deepseek a number of times.
Starting point is 00:13:19 Economist Tyler Cowen used it as his example of why Trump should think differently about Biden's chip export policies. And in terms of what was released, the model performs in line with O1 on most benchmarks, in particular Swaybench verified, which focuses on programming tasks. R1 is now fully available as an open source model for commercial use and is capable of serving outputs via API at less than 5% of the cost of O1. Hobbies are also able to run the model at home, with several demonstrating that it runs on a cluster of Mac minis. Accompanying the full release of R1 was a technical paper describing the post-training process, which develops reasoning capability on top of a foundation model. Deepseek said they tried multiple forms of post-training
Starting point is 00:13:56 before landing on a relatively simple reinforcement learning process. Max Winga, a research engineer at Conjecture AI, posted, it's wild to me that they did this with no fine-tuning prior to the RL stage. R1 learns to reason on its own like Alpha Zero. During training, they observe the model learning to use advanced reasoning techniques, an aha moment. We're playing with alien minds, not just tools. AI entrepreneur Elvis Arabia writes,
Starting point is 00:14:18 The Deepseek R1 paper is a gem. It's clear that LLM reasoning capabilities can be learned in different ways. Reinforcement learning if applied correctly and at scale can lead to some really powerful and interesting scaling and emergent properties. Now, all of this has some people thinking ahead to future possibilities. The AI for Success account, for example, tweets, In a few years, China will create AGI and open source it for all. DeepSeek R1 costs 96% less compared to Open AI 01 and it's almost as good as 01. Intelligence too cheap to meter. 2025 is going to be crazy. I can feel it. Indeed, the rapid development going on in China has
Starting point is 00:14:52 major implications for AI policy. In announcing the latest round of export controls, the Biden administration made it clear that international competitiveness was a key issue. The policy statement set an explicit goal to ensure that U.S. models are dominant across the world, especially in the global south. Dean W. Ball, a research fellow at George Mason University posted, Deep Seek R1 takeaways for policy. One, Chinese labs will likely continue to be fast followers in terms of reaching similar benchmark performance to U.S. models. Two, the impressive performance of deepseek's distilled models, smaller versions of R1, means that very capable reasoners will continue to proliferate widely and be runable on local hardware, far from the eyes of any top-down control regime, including
Starting point is 00:15:30 U.S. diffusion rule. Three, open models are going to have strategic value for the U.S., and we need to figure out ways to get more frontier open models out to the world. We rely exclusively on meta for this right now, which, while great, is just one firm. Why do Open AI and Anthropic not open source their older models? What would be the harm? Mostly where people's minds are as just feeling the acceleration. Perplexity CEO, Ravon Shrinivas, writes, it's kind of wild to see reasoning get commoditized this fast. We should fully expect an O3-level model
Starting point is 00:15:57 that's open-source by the end of the year, probably even mid-year. So, friends, lots going on as we dig deeper into January. That, however, is going to do it for today's AI Daily Brief. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.