The AI Daily Brief: Artificial Intelligence News and Analysis - The Latest in AI + Humanoid Robots

Episode Date: August 8, 2024

NLW looks at the Figure 02 launch and the latest competition in humanoid robots. Plus rumors of a new OpenAI model. Concerned about being spied on? Tired of censored responses? AI Daily Brief listen...ers receive a 20% discount on Venice Pro. Visit ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://venice.ai/nlw ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠and enter the discount code NLWDAILYBRIEF. Learn how to use AI with the world's biggest library of fun and useful tutorials: https://besuper.ai/ Use code 'podcast' for 50% off your first month. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, the latest in AI in humanoid robotics. Before that in the headlines, are we getting a new open AI model soon? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlined edition, all the daily AI news you need in around five minutes. We kick off today with a fun one, something that has gotten the Twitter AI's fear chattering. A new model has appeared in the Limbsphere.
Starting point is 00:00:36 Siss Arena that appears to be some new version of GPT4. It was first noticed last night and is named Anonymous Chatbot. Now, you might remember that the last time that we got a new OpenAI model, it appeared first in this sort of semi-anonymous form on the LIMSIS Arena, and so people are already speculating around what this might represent. When OpenAI leaker Jimmy Apples asked, what model are you? The anonymous chatbot responded, I'm based on OpenAI's GPT4 architecture. specifically you're interacting with a version of GPT4 that has been fine-tuned for chat-based interactions.
Starting point is 00:01:10 This model is designed to understand and generate human-like text based on the input it receives. Now, of course, that could be a hallucination. It's no guarantee that that actually is what this model represents, but that's at least what it's self-reporting. Speculation is that it might be something called Q-Star. As the whole Sam Altman controversy was going down last November, Reuters and others reported that Q-Star was a new, more advanced reasoning model, and there was lots of speculation that safety concerns around QSTAR were part of what the Rift was inside the organization. Later, of course, all the parties involved would say that it wasn't about a safety issue, although there are still many who just don't believe that.
Starting point is 00:01:46 About a month ago, Reuters once again reported on the latest of QSTAR, although it was now codenamed strawberry. Information came from an internal open AI document that was seen by Reuters in May. Quote, the document describes a project that uses strawberry models with the aim of enabling the company's AI to not just generate answers to queries but to plan ahead enough to navigate the internet autonomously and reliably to perform what OpenAI terms deep research. Two other Reuters' sources had viewed demos of what OpenAI staffers told them was Q-Star, with the main difference being that it was able to answer science and math questions outside of the capacity of models like GPT4. So back to this new model, another frequent AI leaker flowers, said anonymous chatbot was able to solve all my test puzzles on the first
Starting point is 00:02:28 attempt. Another ex-user Golden Hawk said it can finally solve the river puzzle without some crazy and wrong solution. The river puzzle asks, if a man and a dog are on one side of a river and have a boat that can fit a human and an animal, how do they get to the other side? This has caused some trouble for existing LLMs, but anonymous chatbot seemed to have no problem. For example, Lama 3.170B has this long convoluted answer that ends up with the man taking two trips across the river, whereas anonymous chatbot answered, this puzzle is pretty straightforward since the boat can fit. both the man and the dog at the same time. The man and the dog simply get into the boat together, and the man rows them across the river. Once they reach the other side, they both get out of the boat.
Starting point is 00:03:06 That's it. Flowers responded, wow, that's great. I just hope they didn't intentionally overfit it to the usual memes because they knew these were the first things we would test. The model isn't perfect. When Feltsim asked, a farmer is on one side of a river with a wolf, a goat, and a goat, he can only take one item at a time with him. The wolf will eat the goat if left alone together and the goat will eat the cabbage if left alone together. How can the farmer transport the goat across the river without it being eaten? It does ultimately get the right answer, but in a very convoluted way. Anyway, all of this is to say in the most important part of this discussion is that there appears to be another new OpenAI model being tested right now, which, as Andrew Curran points out,
Starting point is 00:03:42 suggests that something new is on the way. Next up, staying on OpenAI for a minute, the company has announced a very requested feature from developers that are called Structured Outputs. They write, last year at Debday, we introduced JSON Mode, a useful building block for developers looking to build reliable applications with our models. While JSON mode improves model reliability for generating valid JSON outputs, it does not guarantee that the model's response will conform to a particular schema. Today, we're introducing structured outputs in the API. A new feature designed to ensure model-generated outputs will exactly match JSON schemas
Starting point is 00:04:14 provided by developers. They continue generating structured data from unstructured inputs is one of the core use cases for AI in today's applications. Developers use the OpenAI API API to build powerful assistants that have the ability to fetch data and answer questions via function calling, extract structured data for data entry and build multi-step agentic workflows that allow LLMs to take action. Developers have long been working around the limitations of LLMs in this area via open-source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats
Starting point is 00:04:41 needed to interoperate with their systems. Structured output solves this problem by constraining open AI models to match developers-supplied schemas and by training our models to better understand complicated schemas. Now, this is a feature that developers I've seen are very excited about. And interestingly, although it is a technology advance, in some ways it also matches the sort of product and user experience development that we're seeing happening elsewhere as well. A product feature designed to improve the product experience of using Clod.
Starting point is 00:05:08 It's not some big technological advance. It just makes it a better tool because it separates the output panel from the instruction panel in a way that makes it a lot easier to use. This is not exactly the same, but it's also not totally different. This basically takes a use case that OpenAI understands and modifies the existing experience just to better match it. Now, there is some technological advance here, so like I said, it's not exactly the same, but you get it. We're moving into an era where there is more product consideration for specific use cases for LLMs rather than just a big blinking chatbot window. Now, almost buried in that announcement, there was also a significant price drop.
Starting point is 00:05:41 Stephen Hydel from OpenAI said, oh yeah, nearly forgot to mention. The new 4-O version with structured outputs is 50% cheaper for input, 33% cheaper for output. and is available immediately no beta's previews or waitlists. So this, of course, gets at two recent critiques of OpenAI. The perpetual question of cost, which, to be fair to them, has been coming down precipitously for some time now. And second, the fact that recently, they'd become one of those companies that announces things before they're available, leading to some chagrin among users. Interestingly, reports also came out today that back in 2017, Intel had a chance to buy a stake in OpenAI. Reuters writes that over a several month period in 2017 and 207,
Starting point is 00:06:19 18, executives at OpenAI and Intel had discussed a variety of different arrangements, including Intel buying a 15% stake for a billion dollars in cash. They also discussed apparently Intel getting an additional 15% if it made hardware available for OpenAI at cost. And here we get the problem with ROI-based thinking. Intel ultimately decided against the deal, partly because then-C-Eo Bob Swan did not think generative AI models would make it to market in the near future and thus repay the chipmaker's investment. Now, I'm no mathematician, but if you look at 30% of the roughly $80 to $90 billion valuation that Open AI commands right now, you'll see the problem of having too short a time horizon when you're making decisions around
Starting point is 00:06:58 technology. But then again, one of the big stories that we keep coming back to is the tension between public market dynamics and private venture capital-style thinking. There is an incredible friction there that is leading to lots of weirdness in this space as it evolves. A couple more today before we get out of here. Audible is testing a new AI-powered search feature. Audible has a new personal recommendation expert they call Maven, through which a user can use natural language to enter queries. The example that TechCrunch gives is I'm looking for an uplifting fiction novel with a female protagonist. For me, it would probably be something like, I'm looking for a historical thriller that involves the Templars or the Vatican's entity.
Starting point is 00:07:36 One thing to watch out for when it comes to figuring out where in the hype cycle we are, when companies can no longer get news stories just for launching an AI feature, you will know we have entered a different phase. So far, we're not there yet. Case in point, Reddit is the latest to join the AI-powered summaries as part of search. During an earnings call on Tuesday, CEO Steve Huffman said that Reddit would begin testing AI-powered search results to summarize and recommend content. I am much less interested in the specifics for Reddit and much more interested in this broader trend of how search is being reorganized.
Starting point is 00:08:06 It's something we've been watching in the context of perplexity, Google AI overviews, and more recently searched GPT, and seems to be at least for the moment a broader shift. For now that, that is going to do it for today's AI Daily Brief Headlines edition, next up the main episode. Today's episode is brought to you by Venice. The leading AI companies store your entire conversation history and attach it to your identity forever. That's every question you ask, every answer you receive, every image you generate, every thought you share with the machine, it's all being spied on. If you trust all the company's hackers and NSA board members that will ever have access to your AI conversations, then rejoice.
Starting point is 00:08:40 For you are well served. For the rest of us, Venice is an alternative. Venice is a powerful AI app for text image and code generation that respects, you as a sovereign individual and believes privacy and free speech are not only human rights, but necessary for civilizational advancement. Private, permissionless, and uncensored, you can try it for free without an account. AIA Daily Brief listeners receive a 20% discount on Venice Pro. Visit venice.a.l. slash NLW. And enter the discount code, NLW Daily Brief. That's NLW Daily Brief. All one word. Today's episode is brought to you by Super Intelligent, the platform that
Starting point is 00:09:14 helps teams maximize AI. Super is, of course, the platform that we've been built. that pairs fun, fast, practical video tutorials with step-by-step instructions to get you actually using AI, and from there unlocks information about hundreds of use cases that show you how people are actually getting value out of AI right now. Now, we have just launched Super for teams. This is a new add-on experience that allows teams to share more information about what AI they're using, how it's working, and how to get more value out of it. Whether your company is 25 or 2,500, Super Bowl Superintelligent is going to be the best platform for unlocking information about how to get the most out of AI right now. If you'd like to learn more about the super intelligent team's offering, go to BESuper.A.I.
Starting point is 00:09:58 slash partner and send us a note so that our team can get right back to you. Once again, that's Bsuper.A.I. slash partner. Welcome back to the AI Daily Brief. Today we are talking about the latest in AI powered robots, specifically, Figure has announced their latest robot, the Figure 2. Now, for us here at the AI Daily Brief, this is obviously a related but separate area than what we normally focus on. However, we are likely also living through the last period in which AI isn't fully integrated with humanoid robots, which could have as much of an impact on
Starting point is 00:10:32 work as anything we talk about here separately. Right now, two of the most discussed companies when it comes to humanoid robots are the Tesla Optimus and the figure O2. So what is new about figure O2? Figure CEO, Brett Adcock, writes, figure O2 was a ground-up redesigned to achieve six times the cameras, 50% more battery, fourth generation hands, integrated wiring, exoskeleton structure, and speech-to-speech reasoning. Now, speech-to-speech reasoning is the big thing that has been called out here. Adcock writes, figure O2 is capable of speech-to-speech conversations with humans, on-board mics and speakers connected to custom AI models trained in partnership with open AI. The default UI to our robot will be speech. This is something we've been
Starting point is 00:11:12 discussing a lot, both on this show, as well as within the super-intelligent community, of how much the default UI for our interactions with AI in general will be moving to speech and voice. Adcock shared a graphic that shows, on a very high level, how figure works. When someone asks, for example, can I have something to eat? The figure O2 takes advantage of an integrated system that includes open AI's model, which allows for common sense reasoning from images, neural network policies that enable fast, extras manipulation, body controller to output the ability to say, sure thing, here's an apple,
Starting point is 00:11:43 and then hands someone an apple. Adcock also writes that figure O2 has an onboard. vision language model. This enables semantic grounding and fast common sense visual reasoning from robot cameras. Basically, the VLM is how the robot takes in visual stimuli and interacts with it. The big thing about the battery is that they're framing it in terms of what it can actually do from a work perspective. Adcock writes, we hope figure O2 will be able to achieve upwards of around 20 hours of useful work per day, which is obviously totally transformative if the figure can actually mirror things that are done by humans today. One of the big shifts from the figure
Starting point is 00:12:16 2001, they say, is the exoskeleton structure. Adcock writes, in order to provide structural stiffness and protect against crash loads, figure O2 was designed as an exoskeleton structure, similar to aircraft where the outer skin bears the load. Now, you might also remember that back in February, figure raised a $675 million series B. OpenAI was a partner and investor in that deal, and that partnership is clearly expanded in the time since. Alongside the announcement, the company also shared more details of its test with humanoid robots at a BMW. plant in Spartanburg. BMW writes, during a trial run lasting several weeks at BMW group plant Spartanburg, the latest humanoid robot, figure O2, successfully inserted sheet metal parts
Starting point is 00:12:56 into specific features, which were then assembled as part of the chassis. The robot must be particularly dexterous to complete this production step. BMW says, using a robot can save employees from having to perform ergonomically awkward and tiring tasks. The takeaway for many is not so much that the figure O2 is completely production ready, but that it is actually in the testing in a real-world environment stage. Another big topic of conversation on Twitter, at least, is the competition with Tesla's Optimus. I personally think that this is a little bit overstated, mostly because I think that the total addressable market for humanoid robots is going to be so massively immense that, while yes, there will be a first mover advantage to some extent, there is going to be a lot of market share
Starting point is 00:13:35 to go around. For Tesla investors, though, it might be seen more as a bellwether for how the company is doing in general. Danish on X writes, Tesla Optimus has serious competition. now. As expected, Tesla is now under significant pressure from a more focused and better-performing competitor figure. I spoke about this last year, including the importance of visionary leadership and good governance. Well, here is the result. Figure is backed by Nvidia and OpenAI who are betting that this will be the first real embodied agent. Now, one other interesting note in the hardware space, it's not exactly robotics but seems somewhat related. OpenAI has made a $60 million investment into Opel. In fact, the company is leading Opel Series B.
Starting point is 00:14:15 Opal is a plug-and-play webcam, which promises quality similar to a DSLR. It's what you see when I'm on camera in these videos, or, for example, on the videos on Superintelligent. The information writes OpenAI's involvement in the funding round is surprising. Opel is best known for its $300 professional-grade webcams, not an obvious match for an LLM developer. But they write, Opel plans to develop other types of devices powered by OpenAI's AI models while it continues to sell its webcams. The three-year-old startup envisions developing devices that individuals can use as creative tools rather than AI-powered friends or companions.
Starting point is 00:14:46 Opal will be working closely with OpenAI researchers to prototype various device ideas. It is very clear from all of Altman in OpenAI's moves that they think that there is going to be an entire new generation of AI-powered hardware, AI-powered devices, and AI-powered robots, and it's a future which appears to be materializing rapidly. For now, though, that is going to do it for today's AI Daily Brief. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.