The AI Daily Brief: Artificial Intelligence News and Analysis - 2025 AI Battlelines: Agents, Reasoning, and World Models

Episode Date: December 21, 2024

Brought to you by: Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the po...dcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, why the 2025 AI battle lines are around reasoning models, agents, and world models. Before that in the headlines, he's a new AI mode coming to Google Search. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. One of the really fascinating phenomenon that is somehow kind of backseat relative to everything else going on in AI, is that for the first time, in a very long time, 20 years basically,
Starting point is 00:00:39 there's competition around what search means. Perplexity is obviously one of the most beloved AI products right now and continues to add to their war chest, as well as their feature set. Open AI is part of their 12 days of shipmiss, expanded access to chatypt search to the wider world, and now the information is reporting that Google is planning to add an AI mode option to their search as well. The information sources someone who's working on the product who says that Google is planning to, quote, give its billions of search users the user the search users the internet. option to switch to an AI mode that looks nearly identical to its Gemini AI chatbot.
Starting point is 00:01:10 Others have found indications that this shift won't just be about what you type into your computer, but also that there will be a way to talk to search. 9 to 5, Google found some indications in the code that suggests you'll be able to use mobile inputs, including voice and photos, as a way to search. A Google spokesperson was circumspect about all of this saying, as our state-of-the-art models continue to advance, there's a huge opportunity to bring these new capabilities into search, helping people discover even more of the web. And to some extent, this is absolutely obvious.
Starting point is 00:01:38 It feels very likely that in the future, there will be simply multiple types of search experiences for different types of queries. The type of perplexity or chat GPT search, where you're actually trying to get an answer to a question, is going to become default for lots and lots of types of queries. I don't think that means that Google's traditional search has no help, but being able to toggle between the two could be really valuable.
Starting point is 00:01:58 The challenge, of course, is that a federal judge has already ruled that Google's search engine is an illegal monopoly. As the information writes, the Department of Justice has suggested that it wants to make it harder for Google to leverage its search engines to beat AI chatbot rivals, which could create a legal barrier to something like AI mode. This is tough. On the one hand, I absolutely want competition, but on the other, this is just sort of the obvious place to take search for Google, and artificially blocking them from doing so is basically just forcing them to lose. Next up, former Twitch CEO and very briefly OpenAI CEO Emmett Shear is reportedly working on a new
Starting point is 00:02:31 AI startup with some intriguing goals. You might remember that Shear was very briefly named as replacement for Sam Altman as CEO of OpenAI during the leadership controversy in November 2023. Interestingly, he was credited by the Wall Street Journal as clearing the path for Sam Altman's return by effectively immediately threatening to resign if he wasn't given evidence by the board to support Altman's removal. TechCrunch is now reporting that Shear has founded a company called STEM AI with incorporation documents filed in June of last year. The company is still in stealth, so details are very limited, but what tech crunch and covered does sound interesting. According to a trademark filed last year, STEM AI is developing software to create AI that, quote,
Starting point is 00:03:10 understands, cooperates with, and aligns with human behavior, human preferences, human biology, human morality, and human ethics. More hints come from the presence of Adam Goldstein as a co-founder. After selling a travel website called Hitmunk in 2016, Goldstein became a visiting partner at Y Combinator. He also founded an incubator called Astonishing Labs to back bio-research startups. According to his LinkedIn page, Goldstein spent a year at Tufts University's 11 Labs as a visiting scientist, where he, quote, developed new models for biological systems with a focus on cancer. According to Pitchbook, STEM received backing from Andresen Horowitz back in August,
Starting point is 00:03:43 and while that's all we know right now about the actual company, Shear has been growing increasingly vocal about AI safety and regulation over the past month. In December, for example, he posted, almost all currently proposed regulation is a bad idea. He added, though, that ideas around regulating firms rather than AI models and increasing transparency, are some of the few reasonable ideas currently being floated. Back in November, he wrote, not being scared of AGI indicates either pessimism about the rate of future progress synthesizing digital intelligence or severe lack of imagination about the power of intelligence. In June, around the time that California's SB 1047 legislation was being debated,
Starting point is 00:04:17 he said on a podcast appearance that his greatest concern was self-improving models that could grow out of human control. He said at the time, I'm in favor of creating some kind of fire alarm, like maybe no AI is bigger than X. I think there's good options for, for international collaboration and treaties about some sort of AI test ban treaty. TLDR, Shear is a good operator, and this is likely one to watch. Lastly today, how the mighty have fallen, Intel is courting bids to buy out its Altera programmable chip arm. Altera specialized in the design of low-power programmable chips for use in AI-enabled devices. The company was spun off as a separate entity in February as Intel attempted to
Starting point is 00:04:51 write the ship after a disappointing few years. Bloomberg reports interest from multiple private equity firms, including Francisco partners, Silver Lake Management, Apollo global management and Bain Capital. Intel is giving potential buyout partners until January to formalize their offers. Deal terms presented in November range from taking a 20 to 30% stake in the company all the way up to taking full control. Bloomberg reports that Altera is being valued in the range of 9 to 12 billion, a steep discount from the 17 billion Intel paid back in 2015. The move comes, of course, in the shadow of the departure of CEO Pat Gelsinger. After being brought in three years ago to get Intel back on track, Gelsinger retired from his position earlier this
Starting point is 00:05:28 month at the request of the board. That, however, is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's security program, demonstrating top-notch security practices, and establishing trust is more important than ever. Venta automates compliance for ISO-2701, SOC-2, GDPR, and leading AI frameworks like ISO-402 and NIST AI Risk Management Framework, saving you time and money while helping you build customer trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer facing trust center all powered by Vanta AI.
Starting point is 00:06:08 Over 8,000 global companies like Langchain, Lila AI, and factory AI use Vanta to demonstrate AI trust and prove security in real time. Learn more at vanta.com slash NLW. That's vanta.com slash NLW. If there is one thing that's clear about AI in 2025, it's that the agents are coming. vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.
Starting point is 00:06:42 That's why Super Intelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Super.A.I.
Starting point is 00:07:11 Put the word agent in the subject line so I know what you're talking about, and let's have you be a leader in the most dynamic part of the AI market. Welcome back to the AI Daily Brief. I love it when a set of stories converge in a way that really tells a bigger story than just an individual piece of news, and boy, is that the case today. We are looking at what are shaping up to be the 2025 battle lines in AI, reasoning models, all of which got some interesting news yesterday. Now, of course, at this point, you probably don't need the background on reasoning models,
Starting point is 00:07:42 but effectively, this is a new approach to scaling that uses different strategies than just throwing more compute and data in the pre-training. This is clearly where OpenAI is putting its emphasis. it released O-1 preview back in September. Subsequent to that, we've had Amazon announcing Nova and talking about a reasoning model in the lineup. Meta releasing Lama 3.3, also emphasizing its reasoning capabilities. Several Chinese labs have also released very competent reasoning models. And now, Google has joined the party.
Starting point is 00:08:11 A few days after their initial launch, Google has added a reasoning model to the Gemini 2.0 Flash lineup. The model is called Gemini 2.0 Flash Thinking Experimental. Hopefully this is just a working title, guys. and the way it describes itself is that it's the best in the lineup for, quote, multimodal understanding, reasoning, and coding. In demonstrations, it seems to perform well on puzzles involving both visual and text clues. And so far as novel features that make it stand out from the pack,
Starting point is 00:08:35 the model shows its chain of logic so you can see what's going on under the hood. OpenAI co-founder Andre Carpathy wrote, The prominent and pleasant surprise here is that unlike 01, the reasoning traces of the model are shown. As a user, I personally really like this because the reasoning itself is interesting to see and read. The models actively think through different possibilities, ideas, debate themselves, etc. The case against showing these is that it's typically a concern of someone collecting the reasoning traces and training to imitate them on top of a different base model to gain reasoning ability
Starting point is 00:09:03 possibly into some extent. The model is also extremely fast compared to rivals and available for free on Google AI Studio. This in and of itself is pretty surprising as reasoning models so far have been extremely expensive to operate compared to their non-reasoning counterparts. One interesting thing here is that Google's naming convention implies that this is just fine-tune version of 2.0 Flash, or perhaps simply the base model with some system prompts to ask the model to think for longer and check its work before answering. Compare that to 01, where OpenAI went out of their way to present it as an entirely new model. Sam Altman even framed the release at
Starting point is 00:09:36 the beginning of a different branch of LLMs for the company. One of the big questions I think heading into next year is just how different these reasoning models are from their non-reasoning counterparts, and more particularly, whether they really do evolve in different ways going forward. Now, speaking of OpenAI, the other big news for reasoning models is that OpenAI is preparing to release the second generation of their 01 model. Funny enough, speaking of weird naming conventions, according to the information the model is going to be called 03 to avoid intellectual property disputes with British telco, 02. Sam Altman All But told us the model would be released today, so it's probably out by the time you're listening to this. The release could answer another big question surrounding reasoning models, which is whether they can show major improvements on the model layer. At the time 01 came out, it was suspected that OpenAI was pivoting to reasoning because adding
Starting point is 00:10:22 training data and compute to training runs was showing diminishing returns. In the following months, it was confirmed that noticeable improvements could be made to reasoning models by getting them to think longer. Assuming O3 is a brand new model and not a tweak of O1, it should reveal whether reasoning models themselves can scale an ability, or instead if all of the improvements are only possible by scaling up inference time. The community is pretty excited to check it out. Chubby on X writes, O'3 equals Orion.
Starting point is 00:10:47 There is probably no more GPT 4.5 or 5. Everything is summarized in Orion, i.e. 03. Surely, Orion was fed with a lot of synthetic data from 01 and now has evolved into 03. Chubby also got at the competitive dynamics in the field, writing, time to take back the crown from Google. We have one more normal episode coming on Monday, before we get into end-of-the-year specials, so I will have a chance to follow up on what exactly came out on Friday. Now, moving on to the next dimension of competition, the one that's even more obvious than reasoning models in some ways, is the race to deploy agents. Yesterday, OpenAI announced a long list of new integrations for ChatGPT. The desktop application can now access data from a gigantic list of coding
Starting point is 00:11:26 platforms, as well as Apple Notes, Notion, and Quip. For now, ChatGPT can only read these apps in context. It can't take actions within those programs. But Chief Product Officer Kevin Wheel made it clear that's where this is all going. He said, we've been putting a lot of effort into our desktop apps. As our models get increasingly powerful, ChatGPT will become more and more agentic. That means we'll go beyond just questions and answers. ChatGPT will begin doing things for you. A few weeks ago, Wharton Professor Ethan Malik posted, OpenAI has a lot of pieces on the board right now, multimodal vision and voice, small, large and reasoning models, image and video creation, code execution, mobile and desktop apps, web search, some agentic stuff, very curious when it will be
Starting point is 00:12:03 glued together into a singular thing. Interestingly, going back to this quote from Kevin Weill, it seems extremely notable to me that he says chat GPT will begin doing things for you. This indicates pretty clearly to me that we have the singular thing, that it is and has always been ChatGBTGBT.T. It's just that over time, ChatchapT is going to be a lot more and, frankly, different from what ChatGPT originally was. A couple days ago, we also got an update from Salesforce on their Agent Platform, Agent Force. Just three months after announcing Agent Force in September, the company announced Agent Force 2.0. They write, This release introduces a new library of pre-built skills and workflow integrations for rapid customization,
Starting point is 00:12:46 the ability to deploy agent force in Slack, and advancements in agentic reasoning and rag. These advancements will enable companies to scale their workforce with customized agents capable of handling complex multi-step tasks with even more precision and accuracy. And if you need a sense of just how important this is to Salesforce, go check out the piece in the information. AI is Mark Benioff's friend and foe. It talks about how Salesforce is facing increasing competition from companies like Sierra, which are bringing agents to market, and in fact, in some cases, winning business away from Salesforce.
Starting point is 00:13:14 The final vector of competition that I want to discuss today is this new world model approach. These models are trained in a fundamentally different way to LLMs. Where LLMs are trained in a large corpus of text image and voice data, world models are trained by observing real or simulated worlds. We've seen a few working prototypes of this style of AI with two big examples coming out of Faye-Fei Lee's World Labs and another coming out of Google DeepMind. A third big player is Descartes, who released a model in October. were capable of generating a fully playable Minecraft-like game. While the demo was buggy and
Starting point is 00:13:43 rudimentary, it clearly made investors sit up and pay attention. TechCrunch reports that the company has now raised their Series A. The startup raised $32 million at a $500 million valuation. CEO and co-founder Dean Leadersdorf said the company wants to compete at the highest level, building a, quote, fully vertically integrated AI research lab alongside enterprise and consumer products. He said the aim was to create what he's calling a kilocorn, in other words, a trillion-dollar company. He got to love ambition, man. Part of the reason that people are so interested in world models, especially recently, is the sense that perhaps their understanding of physics could be something that allows them to make more fundamental breakthroughs. Still, in many ways, this class of models feels closer to where
Starting point is 00:14:22 GPT-based LLMs were a few years ago, demonstrating some fascinating emergent properties, but still nowhere near the full scale that they're going to reach. Along those lines, a group of researchers across 19 different universities have just revealed something they're calling a comprehensive physics simulation platform. Named Genesis, the researchers claim the platform is, quote, capable of simulating a wide range of materials and physical phenomenon. Researchers Zhao Jian writes, after a 24-month large-scale research collaboration involving over 20 research labs, a generative physics engine able to generate 4D dynamic worlds, powered by a physical simulation platform designed for general-purpose robotics and physical AI
Starting point is 00:14:57 applications. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds. Together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI, and other applications. So basically, Genesis, as I understand, it can be used as both a robotic simulation platform and as a photorealistic rendering platform. The platform accepts natural language prompts and can be used as a data engine to produce a range
Starting point is 00:15:32 of different modalities of synthetic or simulated data. In the immediate term, it's a massive boost to robotics training in terms of speed inaccuracy. It could lead to immediate improvements in that field and possibly even unlock more complex use cases. Roboticist Ben Duffy commented, quote, with Genesis, you'll be able to train a locomotion policy that's deployable in real world in less than 26 seconds. That sentence tells us about a future we are not ready for. For reference, that's 430,000 times faster than the previous leading physical simulators to give a sense of how dramatic this change could be. What are the other potentials? Is that a platform like this could produce the gigantic datasets required to scale up world models. Currently, they've been trained using either
Starting point is 00:16:09 data sets from self-driving cars or by observing video games. There are a few projects trapping camera rigs to hikers to gather real-world data, but if this platform is as performant as the researchers claim, we could soon see near-infinite synthetic data sets available to train the next generation of world models. Effectively, all of the response to this is some version of wow. Viewing their announcement video, AI evangelist, Linus writes, this is all generated and simulated in 4D. Mindblown emoji, mindblown emoji, mindblown emoji. Billow Al-Seedhoo writes, Think Instant Physics Accurate Environments, Camerpaths, and Character Animations all from Natural Language.
Starting point is 00:16:46 Mela writes, What the? This Genesis project is like something out of a sci-fi movie. I mean, generating entire 4D worlds with physics simulations? That's mind-blowing. I'm just sitting here stunned trying to wrap my head around how this could change everything from robotics to video games. And I think, friends, if we had to sum this up, if you thought that 2025 was going to be any slower than 2024 and 2023 had been, boy, do you need to think again. That's going to do it for today's AI Daily Brief.
Starting point is 00:17:11 Appreciate you listening as always. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.