The AI Daily Brief: Artificial Intelligence News and Analysis - OpenAI's Q* Reasoning AI is Now Code-Named "Strawberry"
Episode Date: July 16, 2024Discover OpenAI’s latest breakthrough with the newly announced reasoning AI, code-named “Strawberry.” This episode examines the features and capabilities of “Strawberry,” its potential impac...t on the AI industry, and what this means for the future of artificial intelligence. Explore this exciting development and its implications for AI research and applications. Concerned about being spied on? Tired of censored responses? AI Daily Brief listeners receive a 20% discount on Venice Pro. Visit https://venice.ai/nlw and enter the discount code NLWDAILYBRIEF. Learn how to use AI with the world's biggest library of fun and useful tutorials: https://besuper.ai/ Use code 'podcast' for 50% off your first month. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
OpenAI's reasoning AI Q-Star has become strawberry and meta seems set to release its biggest
Lama 3 model yet next week. The AI Daily Brief is a daily podcast and video about the most
important news and discussions in AI. To join the conversation, follow the Discord link in our show notes.
Welcome back to the AI Daily Brief Headlines edition, all the AI headlines you need in around five minutes.
Today is a very product news-centric edition of the headlines, with the kickoff story being that
meta is finally releasing its largest Lama 3 model next week on July 23rd. This is according to a
meta employee as reported by the information. Now this model has been announced, this is the 405 billion
parameter Lama 3 model, and this one, in addition to being larger than the previous versions
we've gotten, will be multimodal. It will be able to understand and generate both images and text.
When Lama 3 was released back in April, it was their 8 billion and 70 billion parameter models,
which quickly became very commonly used among AI developers.
Back in April when Lama 370B was released,
Professor Ethan Malik speculated that the $400 billion parameter plus bottle
would reach GPT4 level.
Of course, since then, we've gotten GPT40 and Claude 3.5 Sonnet,
and it will be a big question just how far off the state of the art
this newest open source release really is.
Not content to let meta have all the fun, Google Gemini also has some upcoming features.
This is from a blog post on Testing Catalog.com.
The post is based on the fact that Google has scheduled five Gemini announcements for July 15th and July 18th,
and then goes through to rank what they're most likely about.
The big contender that people seem to be interested in is gems.
Effectively, this is a version of custom GPs, which people have been waiting for for some time now.
Other speculations include memory or personalized responses, scheduled prompts, which could be
an interesting integration with Google's search capabilities.
For example, allowing people to ask Google to send them a curated set of daily news every morning.
there's evidence for voice recording and Google Photo Integration,
and testing catalog also found a hidden button that suggests that we might get a prompt enhancer.
Given that we've seen Claude push really far into the let our system figure out the right prompts based on your prompts,
this is one that wouldn't be too surprising, even if it would be incredibly useful.
Some of the other speculations are around a Chrome extension, a real-time response toggle, and an updated image and model.
At the time of recording, we don't have any more information, but this is certainly something I'll be watching for this week.
Third today, Amazon's AI shopping assistant, Rufus, is now available to all U.S. customers in the Amazon shopping app.
Amazon writes,
Rufus is designed to help customers save time and make more informed purchasing decisions by answering questions
on a variety of shopping needs and products right in the Amazon shopping app.
We're pleased to announce they write that Rufus is now available to all U.S. customers in the Amazon shopping app.
As part of the announcement, Amazon also shared some of what they've learned during the beta test.
They say that customers have already asked Rufus tens of millions of questions,
and so far they're using it for things like understanding product details and hearing what other customers say.
When Rufus gives an answer, it appears that it also suggests another set of additional questions,
which apparently customers are also clicking on as well.
And then the other things that people are using it for are pretty much exactly what you'd expect.
Getting contextual product recommendations, the example they give being a pool umbrella specifically for Florida.
People are using it to compare options, e.g., what's the difference between gas and wood-fired pizza ovens?
people are using it to get product updates, access current and past orders,
and even answer questions that are, quote, not obviously related to shopping.
Amazon writes, because Rufus can answer a wide range of questions,
it can help customers at any stage of their shopping journey.
A customer interested in cookware may first ask,
what do I need to make a souffle?
Preparing for special occasions is also popular,
with shoppers asking questions like,
what do I need for a summer party?
So far, I have not used Rufus,
but it feels to me like one of those applications of AI
that either will become completely default,
just the totally normal way that we interact with shopping,
or will be quietly removed from this application in about a year.
Given that this has been live with testers
and that Amazon is choosing to put it in their main shopping app
in their biggest market in the U.S.,
it seems like they like the results they've had so far.
If you have had a chance to use it,
use either the comments on Spotify or on YouTube
to share how Rufus has been for you.
For now that is going to do it for our Headlines Edition,
next up the main episode.
Today's episode is brought to you by Superintelligence,
the platform for fun, fast AI learning. Super has a ton of new things going on. We recently announced
our partnership with Spotify, through which users of that app can now access Super Intelligent content
directly from their mobile apps. We've also just launched the AI learning feed. In addition to seeing
the tutorials that we're dropping, there are polls, news items with related lessons, and a chance for
people to show off the projects and use cases that are making AI come alive for them. We've also
just kicked off the Super Summer Challenge, where each week will share a new challenge,
that you can use to discover new AI tools and use cases. Go to Bsuper.com. A.I.
and use code super fun for 50% off your first two months. That's Bsuper. Today's episode is brought
to you by Venice. The leading AI companies store your entire conversation history and attach it
to your identity forever. Every question you ask, every answer you receive, every image you generate,
every thought you share with the machine, it's all being spied on. If you trust all the
companies, hackers, and NSA board members that will ever have access to your AI conversations,
then rejoice, for you are well served. For the rest of us, Venice is an alternative.
Venice is a powerful AI app for text, image, and cogeneration that respects you as a sovereign
individual and believes privacy and free speech are not only human rights, but are necessary
for civilizational advancement. Private, permissionless, and uncensored. You can try it for free
without an account at venice.a.i. Welcome back to the AI Daily Brief. At the very end of last week,
we got news that OpenAI was working on a new, more advanced type of AI,
that they have codenamed strawberry.
And in fact, this is not the first time we've heard about this project.
However, it is the first time that it's had this name.
So what we're going to do today is give not only this new report about what OpenAI is working on,
but go back a little bit to the history of this particular project.
And for that, we actually have to go back to the days and weeks that followed the ouster
and then rehiring of CEO Sam Altman last November.
About a week after Altman was reinstated, the information published a piece called OpenAI
made an AI breakthrough before Altman firing, stoking,
excitement and concern. You might remember that during that whole time, as everyone was trying to figure out
just why Altman had been fired, probably the most popular working theory was that they had made some
big technical advance, and that there was internal disagreement around whether they should be
pushing it forward. This was, of course, despite the fact that the board was explicit about the idea that
that wasn't the case, however, that didn't stop this report from getting tons of traction.
Wrote the information on November 22nd of last year. One day before he was fired by OpenAI's board last week,
Sam Altman alluded to a recent technical advance the company had made that allowed it to push the veil of ignorance back and the frontier of discovery forward.
The cryptic remarks that the APEC CEO summit went largely unnoticed as the company descended into turmoil.
But some OpenAI employees believe Altman's comments referred to an innovation by the company's researchers earlier this year
that would allow them to develop far more powerful AI models.
The technical breakthrough spearheaded by OpenAI chief scientist Ilya Sutskever raised concerns among some staff that the company didn't have proper safeguards in place to commercialize such advanced AI models.
The information we got was that the model was called Q-star.
The big thing that it was able to do that previous models hadn't was that it could solve basic math problems.
The information said that in the months following the breakthrough, Ilya himself appeared to have reservations.
Another data point from that article, Ilya's breakthrough allowed Open AI to overcome limitations on obtaining enough high-quality data to train new models, according to the person with knowledge.
The research involved using computer-generated rather than real-world data.
Reuters followed up and found their own sources, confirming the story.
They added the detail that, quote,
though only performing math on the level of grade school students,
acing such tests made researchers very optimistic about QSTAR's future success.
Reuters also dug up a letter that was sent to the board
from a number of staff researchers, warning, it seems, about the discovery.
Wrote Reuters, unlike a calculator that can solve a limited number of operations,
advanced general intelligence can generalize learn and comprehend.
In their letter to the board, researchers flagged AI's prowess and potential danger,
although Reuters' source couldn't confirm exactly that it was QSTAR's
that they were worried about. Separally, however, the Verge reported that the board never received a
letter about QSTAR, and that, quote, the company's research progress didn't play a role in Altman's
sudden firing. Of course, lots of people wanted to know more. One of the most viewed discussions on the
open AI forums last November was, what is QSTAR and when will we learn more? No one really had
information on that thread. Many people were talking about it in the context of what it might have meant for
the firing, but then there were also a lot of responses represented by this one from Quirtle, which
said, as someone who's done a fair amount of ML slash AI research, I can tell you that it is very,
very easy to think you've discovered a breakthrough. There's a great deal of cognitive bias in
AI and you have to falsify very aggressively. I am deeply skeptical. It's also worth noting in the news
today that we found out that the $86 billion share sale is back on. I'm sure this quote-unquote
breakthrough will get investors quite interested. So obviously they are calling into question the
veracity of the claims and saying that perhaps it was being overstated for the sake of an investment.
In December, Timothy B. Lee wrote a post on UnderstandingAI.com
called the real research behind the wild rumors about OpenAI's Q-Star project.
The piece departs from just trying to suss out the details of this supposed Q-Star breakthrough
and instead goes through OpenAIs two other published papers about its effort to solve grade school math problems,
as well as some other research from outside of OpenAI on the similar area.
One thing he pointed to was a tweet from Chief AI scientist at META Jan LeCoon, who wrote,
Please ignore the deluge of complete nonsense about Q-Star.
One of the main challenges to improve LLM reliability is to replace auto-rogressive token prediction with planning.
Pretty much every top lab, there, deep-mind open AI, etc., is working on that, and some have already published ideas and results.
It is likely that QSTAR is openAI's attempt at planning.
Now, earlier this year, Nimrod Kramer over at Daily.dev published a piece called OpenAI Q,
Everything You Need to Know in one place.
He adds to the discussion the point that, in addition to solving basic math, QSTAR, quote, showcases reasoning abilities beyond current AI model.
From what we've heard, he writes, Project QSTAR can work out basic math problems and think symbolically better than other AI systems out there, understand ideas and make smart guesses about them.
Move past just recognizing patterns to actually think through problems step by step.
He speculates a little bit about how it might work.
He points to step-by-step reasoning where he says, instead of just spitting out answers, Project QSTAR could explain how it got there by breaking the problem into smaller, easier parts, figuring out each part one by one, making sure each part helps solve the big problem.
He also contended that, quote, Project QSTAR probably uses some of the problem.
self-supervised learning. It's a bit like how the game AlphaGo gets better at playing against itself.
The AI practices by solving problems against older versions of itself. This provides a way for the
AI to learn and get better without needing people to check its work. Just like AlphaGo, the AI teaches
itself for moving the need for outside help. Still, mostly, after that initial burst of interest,
we haven't gotten much information. Six months ago on the OpenAI Reddit, poster Echo Storm wrote,
Just wondering what happened to QSTAR. I read that it was able to solve mathematical problems
faster and better than humans ever could, as well as bypass any encryption and improve itself.
If that's true, why is nobody talking about it? Was it false news? If so, why was the leak in the
board's reaction so believable? Personally, it doesn't seem to me to be a good publicity stunt
for a successful company like OpenAI to do this unless something about QSTAR is true.
And that gets us to last week, when we had two big stories that followed along these lines.
The first was that OpenAI had internally shared definitions for five levels of AGI, or at least
five levels of AI on the path to AGI. The levels were one, chatbots AI with conversational language.
That's where we are now. Second, reasoners. Human level problem solving. Something that OpenAI argued
that they were close to in this internal meeting. Three agents, systems that can take actions.
Four, innovators, AI that can aid an invention. Five organizations AI that can do the work of an
organization. Now, if you go check out the YouTube comments on any of my recent videos about this,
there is tons of debate around those specific definitions. But the relevant point for us today is that
these came out early last week. However, separately, but clearly relatedly, we got this Reuters
Report, OpenAI working on a new reasoning technology under code named Strawberry. This came from
internal sources as well as internal documentation. The document was seen by Reuters in May, but not
reported until now. Roiders also said they couldn't ascertain the precise date of the document.
The document, quote, details a plan for how OpenAI intends to use Strawberry to perform research.
Reuter's source also added that how Strawberry works is a tightly kept secret even within OpenAI.
Basically, this document describes a project that would use the Strawberry model
with the aim of allowing the AI to plan ahead enough to navigate the internet autonomously
to perform what OpenAI calls deep research.
According to this report, Strawberry is the new name for Q Star.
According to Bloomberg, last Tuesday at an all-hands meeting,
OpenAI, quote, showed a demo of a research project that it claimed had new human-like reasoning skills.
This was the same meeting, I believe, where they introduced that five-level classification system.
While the information remains sparse, there were a few other things we got from this report.
Writers writes,
Strawberry includes a specialized way of what is known as post-training OpenAI's generative AI models,
or adapting the base models to hone their performance in specific ways
after they have already been trained on reams of generalized data.
Strawberry has similarities to a method developed at Stanford in 2022 called self-taught reasoner or Star.
Star enables AI models to bootstrap themselves into higher intelligence levels
via iteratively crafting their own training data,
and in theory could be used to get language models to transcend human-level
intelligence. Continuing Reuters writes, among the capabilities OpenAI is aiming strawberry at, is
performing long horizon tasks, referring to complex tasks that require a model to plan ahead and
perform a series of actions over an extended period of time. OpenAI specifically wants its
models to use these capabilities to conduct research by browsing the web autonomously with the
assistance of a CUA or computer using agent that can take actions based on its findings.
OpenAI also plans to test its capabilities on doing the work of software and machine learning engineers.
So basically what we've got here is an update that confirms that QSTAR has not gone away,
it's evolved into whatever the strawberry is, that two, the context that they're thinking about
deploying it in or at least researching it in is this deep research context, three, that it's
clearly a part of their plans to get to agentic AI, and four, that it's close enough that
they're talking about it widely within the company, even though you would have to think that they
would assume, or at least not be surprised that some amount of this information would get out.
So far, there is not that much information out there beyond what I'm
just shared with you, and there's not even all that much chatter. People are very clearly
interested, but without more details, we're just going to have to wait and see what evolves.
However, it seems likely that Open AI's comparative quietness in this period might be coming
to an end. For now that, that is going to do it for today's AI Daily Brief. Until next time,
peace.
