The AI Daily Brief: Artificial Intelligence News and Analysis - The State of AI for Robotics

Starting point is 00:00:00 Today on the AI Daily Brief, Google's new model for embodied AI. Before that in the headlines, more information on Google's investment in Anthropic. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link at our show notes. We kick off today with a report from the New York Times around Google's relationship with Anthropic. The headliner statistic was that documents obtained by the New York Times show that Google owns about 14% of Anthropic. Now, of course, we knew that Google had been in a lot of, an investor in Anthropics, so that's nothing new. Instead, this just gives a little bit more of a

Starting point is 00:00:43 background picture around one of these very interesting deals that is, quite frankly, novel to the AI space. OpenAI's deal with Microsoft set the template for this, and the catalyst for it is the fact that AI needs so much money that the traditional venture capital establishment, which kind of taps out at a billion or two billion dollars usually, just couldn't keep up with the demand for tens of billions of dollars of capital. That effectively left the frontier labs, having their only choice being to team up with one of the big tech giants. Part of the reason that the news media is so interested in this is that it's caught up in the Google antitrust case. You might remember that back in August, a federal court found that Google had acted as a monopolist in Internet search,

Starting point is 00:01:23 and the Justice Department has made a set of proposals around how to remedy the situation, including forcing Google to sell any AI products that could possibly compete with search. That puts their relationship with Anthropic, whose clawed chatbot is used as a form of search by some, squarely in the crosshairs. Now, Anthropic has argued that Google should not be forced to divest. They said that a forced divestment would, quote, harm both Anthropic and competition more generally. They said that it would depress Anthropics value and hinder its ability to raise capital. Ultimately, this is just another interesting artifact in what is a fast-changing financial landscape alongside the AI startup scene. Speaking of the fast-changing AI startup scene, a company that has gotten more attention than just

Starting point is 00:02:02 about any other over the last week or two is, of course, the AI agent startup Manus. Well, that company has now announced that it's teaming up with Alibaba to be officially able to launch their product in China. In a statement, they said that they were engaging in strategic cooperation with Alibaba's Quen team to, quote, meet the needs of Chinese users. Basically, the deal right now is that if you are releasing an artificial intelligence product for the Chinese market, you have to work with a Chinese AI company. This is why, for example, Apple hasn't released even their basic Apple intelligence features in the country because they've been working to finalize that set of partnerships.

Starting point is 00:02:33 Given the excitement around Manus right now, T. P. Huang captured a lot of the sentiment when they wrote, Alibaba Cloud will need a whole lot more compute. Speaking of Alibaba, that company has also released a new AI model they're calling R1 Omni, just firmly in the line of just great, memorable AI model names that they claim can read human emotions. The team published demos that showed the functionality in interpreting video inputs. In the video, a man in a brown jacket stands in front of a vibrant mural. His facial expression is complex with wide eyes, slightly open mouth, raised eyebrows, and furrow brows, revealing surprise and anger. Speech recognition technology suggests his voice contains words

Starting point is 00:03:09 like you, lower your voice and freaking out, indicating strong emotions and agitation. Overall, he displays an emotional state of confusion, anger, and excitement. While the specific use cases haven't been articulated for this, Bloomberg suggested it could be a way for Alibaba to keep up with OpenAI's GPT 4.5. On launch, OpenAI had said that their new model had, quote, a better understanding of what humans mean and interpret subtle cues or impact expectations with greater nuance and EQ. Lastly, today, beliegered Intel has announced a new CEO, renewing hopes, at least among some, that the struggling company could be revived. Three months ago, Pat Gelsinger was fired as CEO after a four-year stint. He was installed at the

Starting point is 00:03:45 head of the company in 2021 with a mandate to rationalize the business and turn things around. By the time he was ousted in December, however, it looked as though the once-great U.S. chipmaker was going to be sold off for parts. A few months went by with various merger and acquisition rumors. there were even reports that the Trump administration was pushing a shotgun arrangement with TSM who would take over chipmaking boundaries. The board, though, has now named Liputon as the new CEO. Tahn is a 40-year veteran tech investor and served on the board since 2022. He resigned from his board last year, reportedly due to disagreements on how to turn the company around. And when he did resign, that left the board with a sum total of zero members with any experience in the semiconductor industry.

Starting point is 00:04:24 Now at the helm, Tahn will be allowed to put his recovery plan into action. In a statement, he wrote, Together, we will work hard to restore Intel's position as a world-class products company, establish ourselves as a world-class foundry, and delight our customers like never before. Following the appointment, though, news broke that the TSM takeover plan is still alive. TSM has pitched NVIDIA, AMD, and Broadcom on taking shares in a joint venture that would operate Intel's foundries. DSMC would take the lead role in operating the business, but would not own more than 50% of the joint venture.

Starting point is 00:04:51 This would help ameliorate concerns from the Trump administration about a foreign company owning critical U.S.-based chipmaking facilities. According to Reuters, Intel board members have backed a deal and held negotiations with DSMC, while some executives are firmly opposed. We'll have to see if that goes through, but overall, Wall Street likes the deal, Wall Street likes the new appointment, with Intel stock up 11% in overnight trading. That's going to do it, however, for today's AI Daily Brief Headlines edition. Next up, the main episode.

Starting point is 00:05:19 We talk a lot about agents on this show. But if you've ever thought to yourself, I don't want to talk about agents anymore. I just want to actually build and deploy something. I'm really excited to share something special with you today. We've partnered with Lindy to offer companies that just want to dive into the deep end of agents a way to get their feet wet, a way to move fast and build something meaningful without breaking the budget. The first five companies that email me, NLW at Bsupert.a.i, with Lindy and the title, will have access to work with Lindy to build an actual functional agent serving their specific needs for under $20,000.

Starting point is 00:05:56 Some of the agents you can build include a customer support agent, maybe automating responses on your website. You could build an SDR for generating or qualifying sales leads, or you could build an agent that's perfectly suited for your internal communications needs, be it note-taking, scheduling, or something else. Not only is Lindy structured to integrate with all of the places that you already keep data and information. It's also a full extensible platform, which means as you hire more and

Starting point is 00:06:22 more agent employees and really build out your digital workforce, Lindy's going to enable those agents to be interoperable and basically be able to work together in a seamless way. So again, if you are interested in diving in all the way to agents, in a matter of weeks, not months, not years, email me, nLW at B-super.aI, put Lindy in the title, and let's get your first digital employee online. Today's episode is brought to you by Vanta.

Starting point is 00:06:49 Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your, first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2-2-2. Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program

Starting point is 00:07:26 quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back, so you can focus on building your company. Join over 9,000 global companies like Atlassian, Kora, and Factory, who use Vanta to manage risk and prove security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com slash NLW. That's V-A-N-T-A dot com slash NLW for $1,000 off. Hey listeners, are you tasked with the safe deployment and use of trustworthy AI? KPMG has a first of its kind AI Risk and Controls Guide, which provides a structured approach for organizations to begin identifying AI risks and design controls to mitigate threats.

Starting point is 00:08:09 What makes KPMG's AI Risks and Controls Guide different is that it outlines practical control considerations to help businesses manage risks and accelerate value. To learn more, go to www.kpmG.us. slash AI Guide. That's www.kmg.us. slash AI Guide. Today we're going to do that thing where we take a bit of contemporary news and use that as a lens to look at a broader set of updates that have happened over the last few weeks. And as I mentioned, we are talking today about the intersection of AI and robotics.

Starting point is 00:08:42 Now, the specific catalyst for this conversation is that Google has released a new family of AI models that are specifically designed to drive humanoid robotics, meaning it's a good time to talk about embodied AI. This is a field that is moving extremely quickly, and a big part of that is driven by the advances in the AI models that actually power the robotics. It's less than six months since Elon Musk unveiled Tesla's Optimus Robot at the big splashy Robotaxy event. And while those robots were visually impressive, it came out in the following days that the robots were largely being controlled by remote from behind the scenes. And as much as that was fodder for the Elon haters, it also reflected the fact that embodied AI is really hard,

Starting point is 00:09:22 especially when it comes to AI models that work for generalized tasks. Humanoid robots have so far required specific training for each action, with the AI models largely helping with edge cases and little deviations. For example, the optimist robots could easily mix a drink during the demo, likely because they were trained to do that. However, they would have had difficulty if a patron asked to shake their hand without a human controlling them. That's the problem that Google DeepMind's new AI model is trying to solve.

Starting point is 00:09:46 called Gemini Robotics, the new model is built on top of Gemini 2.0, inheriting Gemini's native multimodal functionality, meaning that the model can process visual text and audio inputs. In their announcement blog post, Deep Mind wrote, to be useful and helpful to people, AI models for robotics need three principal qualities. They have to be general, meaning they're there to adapt to different situations, they have to be interactive, meaning they can understand and respond quickly to instructions or changes in their environment, and they have to be dexterous, meaning they can do the kind of things people generally do with their hands and fingers, like carefully manipulate objects.

Starting point is 00:10:18 DeepMind is actually built a pair of models to drive different parts of the functionality required for generalized robotics. The first is their advanced vision language action or VLA model, which is functionally similar to other multimodal LLMs, but includes physical actions as a new mode of output. The second is called Gemini Robotics ER, short for embodied reasoning. The model takes the premise behind reasoning models and applies it to physical environments. As DeepMind put it, the model has, quote, advanced spatial understanding.

Starting point is 00:10:42 Now, as an interesting note, this is similar to the way that the current generation of AI agents are being designed. Agent builders typically use a reasoning model for planning and analysis of the situation and then hand that off to a separate model for execution, meaning that it's not unreasonable to think of embodied AI as agents with eyes and hands. DeMind says the Google Robotics model, quote, leverages Gemini's world understanding to generalize to novel situations and solve a wide variety of tasks out of the box, including tasks it has never seen before in training. As the model is built on top of an LLM, it has a general understanding of language inputs and can take instruction in natural language.

Starting point is 00:11:17 One of the demo videos shows a table with a variety of fruit and containers laid out. The embodied AI receives a voice command and deftly places the banana in the clear container without having any specific training on that task. Google also demonstrated a big step-up and fine motor skills, with the embodied AI able to close a Ziploced bag and even make an origami crane. The reasoning model, Google Robotics ER, is added to help increase the robot's ability to plan for novel tasks. task execution. DeMind writes, combining spatial reasoning and Gemini's coding abilities, Gemini Robotics ER can instantiate entirely new capabilities on the fly. For example, when shown a coffee mug, the model can intuit an appropriate two-finger grasp for picking it up by the handle and a safe trajectory for approaching it. Functionality from reasoning LLMs

Starting point is 00:11:58 also carries over into the real world, meaning the robots can do things like play tic-tac-tow or complete a word puzzle using scrabble tiles. A key breakthrough here is that this system of models allows robots to move from a narrow range of specific tasks to much more generalized applications. Kyrthana Gopalakrishnan, who works on the embodied AI team at DeepMind posted, Gemini Robotics is out and is the most advanced VLA in the world. I'm especially blown away by the instruction following results. It's the first time where I've personally felt that building generic embodied intelligence is within reach, like a robot coming to life. Bloomberg's Mark German pointed out that the implications are for much more than just Google

Starting point is 00:12:33 DeepMind. He said artificial intelligence is going to be at the core of everything, and really the ultimate hardware expression of AI is robotics, being able to understand how a human acts, artificially learn from data, and mimic a human. And that's what a robot is. Now, Google aren't the only ones that have been working on this form of embodied AI models. In early February, Figure AI ditched their partnership with Open AI to use their own models developed in-house. A few weeks later, we got a look at what these models can do. The demo video showed a pair of robots working together to pack away a grocery delivery. The robots had never seen the items before, but were able to reason about where the ketchup

Starting point is 00:13:07 bottle should go in the fridge. If one's trying to make direct one-to-one comparisons, some might think that this demo wasn't as impressive as Google's demos from this week, with the robots acting much more slowly, seeming less dexterous, and promising a more limited range of tasks. But on the other hand, Figure AI has their own humanoid design in production, while Google were demonstrating their software on hardware source from other companies. Still, both companies seem to be working on the same basic system design of pairing a reasoning model with an execution model. When they dropped the OpenAI deal, Figure AI CEO Brett Adcock said, we found that to solve embodied AI at scale in the real world, you have to vertically integrate robot AI. We can't outsource AI for the same reason we can't outsource our

Starting point is 00:13:46 hardware. And Figure AI has begun deploying their robots in real world settings. They have one pilot program currently underway in the BMW manufacturing plant in South Carolina, and a second undisclosed contract that the company says could potentially allow them to reach 100,000 robots shipped. The company indeed showed a video of robot sorting parcels, making many think that the client is one of the large U.S. shipping companies. These are both commercial clients, but much of the excitement and appetite, at least from an investor perspective, is what seems to many as the inevitable future of bringing humanoids into the household setting. Figure AI also seems to have demonstrated that humanoid companies are past the speculative phase, at least in terms of their valuations.

Starting point is 00:14:23 Last February, during their Series B, the company was valued at a very decent $2.6 billion, but last month, Bloomberg reported that they are in talks to raise their Series C at a valuation of $39.5 billion. Of course, we are now also living in the world of deep-seek and manis, and everyone is wondering what's going on in China. It feels like every day on X, you can see a video of some Chinese-produced robot carrying out some feat of dexterity. Earlier this month, one company called X-Robot went viral, with an extremely lifelike female robot with a good voice model behind it. Now, this video that you're watching here had the sci-fi factor turned all the way up, so who knows how real the product is.

Starting point is 00:15:03 Then again, with what we've seen out of Chinese AI in recent months, I certainly wouldn't count it out. One Chinese company that is definitely producing real products is Unitary. They had a huge range of robots and assorted form factors on display at CES in January. You also might have seen the company's latest viral video showing a Kung Fu robot kicking a stick out of a person's hand. Now, many of the videos from trade shows still have a human operator in control, which gets us exactly back to why potentially this Google model is such important news. as Google may have just demonstrated a path to fill in the blanks where Chinese embodied AI is lacking. Right now, Unitary is offering these G1 units starting at $16,000,

Starting point is 00:15:40 but you have to think those prices are going to come down precipitously in the years ahead. Another key player in embodied AI that's worth mentioning in this roundup is Invidia. The chipmaker isn't working on robots per se, but they've definitely made some big advancements in the AI used to train them. In January, Nvidia released their Cosmos World Foundation model. The generative model can be used to create virtual simulations of real-world simulations, scenarios for robot training. Improvements in world models have been one of the big breakthroughs over the past few months, with several startups showing off their own versions of the tech and development. The idea is that a digital twin of a robot can be placed in a simulation, which allows synthetic

Starting point is 00:16:13 training data to be quickly generated. This doesn't help necessarily with the reasoning and generalization problem that Google is working on, but it does allow for big improvements in dexterity and specific movement training. The Cosmos reveal in January also came with some very bullish statements from Nvidia CEO Jensen Huang. He said the chat-chabit-te moment for general robotics is just around the corner. He also delivered his keynote address standing in front of a chart showing the AI sector going exponential. After agentic AI, the wave that we're currently in the middle of, the chart spiked even higher for physical AI, consisting of self-driving cars and general robotics. During the speech, Huang said that self-driving cars would likely be the, quote,

Starting point is 00:16:48 first multi-trillion dollar robotics industry. And while at this point, we haven't seen anything that looks close to a fully capable general purpose humanoid, Huang did mention that he expects invidia's products to power a billion humanoid robots over the coming years. So far, I've hit a lot of the biggies. But even beyond these companies, VCs are definitely sitting up and paying attention to the potential inflection point we're hitting with embodied AI. Earlier this week, Dexterity Inc. raised $95 million at a $1.65 billion valuation to build robots capable of human-like dexterity. The company's pitch is remarkably similar to the way Google described their criteria for generalized robotics. CEO Samir Menon described that his robots can touch and

Starting point is 00:17:25 recognize objects, are aware of, and respond appropriately to surroundings, and will move gracefully and adjust as needed. He added, the combination of those three is what we engineer and what we believe will drive the future of physical AI. Revere's Jane, a partner at Lightspeed Ventures, said he was investing more money in the company because he believes were reaching an inflection point for physical AI. Also, last month, a startup called Apptronic raised $350 million in Series A funding at an undisclosed valuation. The company is a spin-out from the University of Texas and has been working on humanoid robots for over a decade. The round included participation from Google with DeepMind partnering with the company to provide the AI to drive their robots. In fact,

Starting point is 00:18:01 you could see the Apptronic robots putting Google's embodied AI through its paces in the demo videos from this week. The raise was vastly more money than the $28 million the company had raised prior to this round, and CEO Jeff Cardenas commented that the mega round was necessary because his robots are almost production ready. He said, what 2025 is about for Apptronic and the humanoid industry is really demonstrating useful work in these applications with these initial early adopters and customers, and then true commercialization and scaling happening in 2026 and beyond. Explaining the Google partnership, Cardenas said it made far more sense than creating their own models,

Starting point is 00:18:33 adding, we believe that right now, Google is at the top of the game and building some of the best models in the world. So friends, that is a quick update on the state of embodied AI, the intersection of AI and robotics. And that is where we will wrap today's episode. Appreciate you listening as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The State of AI for Robotics

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.