The AI Daily Brief: Artificial Intelligence News and Analysis - Can OpenAI's New GPT Training Model Solve Math and AI Alignment At the Same Time?
Episode Date: June 1, 2023A look at the latest developments from OpenAI, including new features, a cybersecurity grant program, and their new process rewards model for trading. Before that on the Brief, Japan declines to enfor...ce copyright around AI model training, Australia asks citizens if it should ban AI, and an AI camera without a lens. The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
On today's AI breakdown, OpenAI's research team has released a new approach to training
that seems to solve not only math, but offer some promise for AI alignment.
Before that on the brief, AI policy from Japan and Australia and an AI camera without a lens.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Like subscribe and share and learn more at breakdown.network.
Welcome back to the AI breakdown brief.
All the AI headline news you need in five minutes or less.
Remember, as always, you can get this as a newsletter at the AIbreakdown.Behive.com.
We kick off today with news out of Japan in what might be an important precedent.
Earlier this year, Getty images sued stability AI.
Basically, they alleged that the creator of stable diffusion had trained their model on some 12 million images that were owned by Getty proprietary.
You might remember when this came out and there was the sort of little Getty watermark that appeared to be in some of the stable diffusion outputs.
While new thinking in Japan would basically nullify Getty's claim in that case.
In a recent public meeting, the Japanese Minister of Education, Culture, Sports, Science, and Technology said that the country would not enforce copyright law when it came to training AI models.
Now, reporting around this is somewhat sparse, but it seems as though Japan is concerned that worries about copyright have been holding it back relative to international competitors when it comes to the development of AI.
And they see this non-enforcement of copyright laws when it comes to training models as a way to get out ahead.
Now, could this create a global precedent?
It's hard to say.
but it's certainly an interesting evolution when it comes to a key area of law as relates to AI.
Speaking of APAC countries and AI rules, Australia has just introduced two new research papers around AI,
as well as an eight-week consultation period designed to get feedback from Australian citizens.
Australia is no stranger to AI policy, first publishing voluntary ethics principles in 2018,
but now it seems like lawmakers are wondering if tougher laws are needed.
Industry and science minister Ed Hustich said there is clearly in the community a concern
about whether or not the technology is getting ahead of itself.
Governments have got a clear role to play in recognizing the risk and putting curbs in place.
As part of the consultation, Australia is saying that its interventions could range all the way
from voluntary ethical principles and technical standards to bans, prohibitions, and moratoriums.
Next up, some really interesting research called GPT4 Tools.
We've discussed on this show how multimodal is the future of LLMs.
In other words, AI models that are able to move between different modalities, from images to text and back
and forth and even including audio, video, etc. GPT4 Tools is an approach to teaching large
language models of the open source variety, things like Lama, how to use tools to develop
multimodal capacities in a way that doesn't require huge amounts of computing power or data input.
I use the new chat GPT share feature as well as the XPapers plugin to provide a simplified
analysis of the paper, including a bullet point overview of what it said, how the training was
conducted, what the low-rank adaptation technique is, and what the real-world use cases of this might be,
including customer service bots, content creation, education and training, accessibility tools,
data analysis, research, and development, and more.
The TLDR for the sake of this brief is that multimodal continues to be the cutting edge when it comes
to LLM research, and more than ever people are thinking about not just how big companies that have
unlimited resources and huge amounts of data can train multimodal models, but how different
techniques can be used to train open source versions of those models as well.
Last up today, people are absolutely loving this new AI camera that has no lens.
Paragraphica is a different type of
camera that is designed for an AI world. So instead of using a lens to capture an image of what's
right in front of you, it instead uses location data as well as weather data, time data, to create a
prompt that then generates an AI image. So for example, a midday photo taken at Cliffordstraat
Amsterdam, the weather is partly cloudy in 18 degrees. The date is Wednesday 24th of May,
2023. Nearby, there is parking and a yoga studio. On the left where he stood with the camera and
on the right, the image that was produced. Now on the viral Twitter thread where Bjorn Carmen introduced the
Paragraphica, there was a little exchange that I thought was extremely reflective of the current
state of the AI discourse. Tariq Khan responds to Bjorn and says, what is the purpose behind this?
Do you think that this is helping anyone? Does this make our society better? Or does it make it
more inauthentic and synthetic? You ought to really consider this path in your motivations,
and more importantly, the effect it could have. Rationing that comment was a response from PHABC,
who posted a version of the mid-twit meme where the middle said, no, you can't be creative,
have fun and build things unless they move humidity forward. And in the meanwhile, the 0.1
on the bottom say this is fun, and the 0.1% of the top say this is fun. Now, while I disagree with
Tariq here in the specifics, I do think it's important to have these discussions, so I'm not
discouraging at all. You just got to love a good meme. Anyways, guys, that is it for today's
AI breakdown brief. If you were enjoying, please like, subscribe and share, and I will be back
soon for the main AI breakdown. Welcome back to the AI breakdown. Today, we are talking about
some exciting research out of OpenAI with progress on both math as well as AI alignment. But
where we're going to start is just looking at the company's set of updates.
There's been an absolute flurry over the last couple of weeks.
From a product standpoint, OpenAI has been extremely busy.
Probably the biggest feature is, of course, the fact that ChatGPT launched its iOS app.
Surprising exactly no one, it has been number one, basically since it launched.
And while it was rolled out first just in the U.S., it quickly came to a dozen and then now 152 different countries.
For my money, the most impressive thing about the ChatGPT iOS app,
is the Whisper Transcription Voice to Text.
I agree wholeheartedly with Jackson Doll, who writes,
The gap between new AI transcription tech, e.g. Whisper on ChatG.T. Mobile,
and Siri, purely transcription quality, not responses, is actually unbelievable.
I've been trained to assume that transcription just doesn't work at all,
and in fact, Siri is just an utterly bad product.
I'm so excited to embrace voice technology again after years in Apple's Middle Ages.
This has been completely my experience as well.
Although I will say that Logan, who runs developer relations at OpenAI,
actually responded to Jackson and said,
expect this will change on June 5th.
Lots of very competent people who have likely been working on this exact problem.
Apple is a sleeping giant.
Logan is, of course, referring to Apple's WWDC conference, which happens next Monday.
Speaking of Logan, here he announces another of ChatGPT's recent features, shared links.
For the first time since ChatGPT launched last November,
users can now share out links to the conversations they've had.
Without even trying to be really intentional about it, I found myself using this feature pretty frequently.
And then finally, credit where credit is due, I literally just did an episode called Our ChatGPT
plugins overhyped. And one of the things that I pointed to as an example of just how nascent they are is that there
wasn't even a search interface for them. Well, someone at OpenAI must have heard me because less than 24
hours after I published that video, here we now have a search field for plugins. So for example,
if I want to find all the plugins that allow me to interact with PDFs, I can search PDF.
Same for YouTube.
Now, while this is, of course, a huge improvement over the previous entire lack of search,
there are still kinks to be worked out.
So far as I can tell, it's literally just pulling from text either in the names or the description,
which means that some categories aren't being fully represented in search.
By my accounting, there are coming up on a dozen plus plugins that relate to finance in some way,
but only those that actually use the word finance in their name or descriptions show up when you search for finance.
When you search money, none at all come up.
But, as I said, still a huge improvement, and I think representative of exactly what I was saying,
which is that a lot of these features are going to be rolled out over time.
Now, somehow, despite all of that, OpenAI has been even more busy when it comes to the regulatory and policy side of things.
A couple of weeks ago, CEO Sam Altman was the star witness on the Senate's first hearing on AI regulation in the post-chat GPT world.
During that hearing, Sam said that OpenAI would support a new dedicated agency for AI regulation,
and some sort of licensing regime that would have control over whether companies could release super-powerful models.
Now, there was enough of a dust up and a hullabaloo about whether this amounted to an attempt at regulatory capture that OpenAI actually released a blog post talking a little bit more in depth about what they were interested in on this front.
The short piece was called governance of superintelligence.
First, they said we need some degree of coordination among the leading development efforts.
Second, OpenAI argued we're likely to eventually need something like the nuclear regulatory body for superintelligence efforts.
which would come with the ability for an international authority to inspect systems,
require audits, tests for compliance, and even placed restrictions on degrees of deployment and
levels of security. And third, we need more research about AI alignment and the technical
capability to make a superintelligence safe. Now what we don't need, they said, is onerous
regulation on open source models and models that are below a certain capability threshold.
This was their way of reinforcing the point that Sam was trying to make at that hearing,
that he was talking about extremely advanced models, think GPT5+, not open source
tinkering with Lama. Recognizing that even outside of existential questions, there are big,
thorny issues that relate to AI in the public sphere. About a week ago, OpenAI also announced
$10,000, $100,000 grants to fund experiments in and around what they call democratic inputs to
AI. So these are grants for people who have answers to questions like, under what conditions
should AI systems condemn or criticize public figures, given different opinions across groups
regarding those figures? What should the default person for an AI system actually be? How
should it be represented? The point that they say is that no single individual company or even
country should dictate these decisions, and so they want to fund people with interesting ideas around
them. Now, this was followed up today, just about a week later after that Democratic Inputs grant
program, with a new cybersecurity grant program. This is once again a $1 million initiative,
with the focus they say to, quote, boost and quantify AI-powered cybersecurity capabilities
and to foster high-level AI and cybersecurity discourse. Our goal is to work with defenders across the globe
to change the power dynamic of cybersecurity through the application of AI and the coordination
of like-minded individuals working for our collective safety. Some of the project ideas that
their team has put forward as the type of thing they'd like to fund include identifying security
issues in source code, detecting and mitigating social engineering tactics, developing or
improving confidential compute on GPUs, creating honeypots and deception technology to misdirect
or trap attackers, and many, many more. Now, when it comes to government interest in AI safety,
and AI risk, a lot of the focus is not so much on the paperclip problem, but instead on exactly
this type of cybersecurity issue. So it's not surprising to see OpenAI taking a more active role here.
Now, Open AI has also recently been pretty transparent about its forthcoming plans. Human Loops,
Raza Habib, was one of around 20 developers who recently got to sit with Sam Altman, who he said
discussed in extensive detail what the company's near-term plans were. Raza then wrote up a few
of the key takeaways that he had from that conversation. First, reinforcing why NVIDBORI,
is now a trillion-dollar company, Altman said that OpenAI is heavily GPU limited at the present time.
Raza said that this came up throughout the discussion, and that a lot of their short-term plans
are in fact delayed or dictated by their access to GPUs. Sam said that the biggest customer
complaint was about the reliability and speed of the API, and that most of that was directly the
result of GPU shortages. Other ways that these GPU shortages are impacting OpenAI right now
include one, that their longer 32K context window can't be rolled out to more people. Right now,
chat GPT is generally limited to an 8K token window, which of course limits the amount of data
that can be fed in without being chopped up. Sam also said that their current approaches to fine
tuning are extremely compute intensive, having not adopted models like Laura or adapters.
And so that's being caught up in this GPU shortage. And then one that I found really interesting
was that multimodality, which was demoed as part of the GPT4 released, can't be extended to everyone
until more GPUs come online. So if GPU access is a huge bottleneck right now, what are they focused
on in the short term. In 2023, it sounded like Sam and OpenAI's priorities included one,
cheaper and faster GPT, which they said was their top priority. They said that they want to drive
the, quote, cost of intelligence down as far as possible. Second, longer context windows. Again, this is
limited a little bit by GPU access, but is something they want to focus on. Three, the fine-tuning API
we just talked about, and four, a stateful API. The way that Raza describes it is, quote,
when you call the chat API today, you have to repeatedly pass through the same conversation history
and pay for the same tokens again and again.
In the future, there will be a version of the API that remembers the conversation history.
So that's what's coming up, at least on the API front, in the short term.
Now, a couple of other interesting things from that conversation.
Apparently, a number of developers in the room were nervous about building on OpenAI's API,
given that they thought that OpenAI might end up releasing products that were competitive.
Sam, however, said that OpenAI was not focused on releasing products beyond ChatGPT.
He said that the vision for ChatGPT is to be a super smart assistant for work,
but there's going to be lots of other GPT use cases that they won't get in.
into. Altman reinforced the idea that open source is going to be an important part of the AI policy
future and said that they were considering open sourcing GPT3. When it comes to scaling, despite all of
the internet's assertions that the age of giant AI models is already over, Sam said that open AI
did believe that making models larger will continue to yield performance. Now lastly, one that I found
really interesting, especially in light of the discussion that I was having on the episode yesterday
about whether plugins are overhyped, Sam said that while developers are interested in getting access,
to chat GPT plugins via the API, he didn't think that they'd be releasing that anytime soon.
Right now, Altman said in his view, plugins don't have product market fit outside of browsing.
The way that Raza put it was that Sam suggested that a lot of people thought they wanted their apps to be
inside chat GPT, but what they really wanted was chat GPT in their apps.
This gets back exactly to the interface questions that I was asking in that episode.
In other words, are we really going to move all of our activity to that chat GPT interface, or are there
some types of experiences that still make sense in dedicated environments.
Anyways, a lot of really interesting stuff there about what the near-term future of OpenAI
and ChatGBT-GPT might look like.
But then just today, OpenAI released some new research that has really captured people's
attention.
The announcement blog post is called Improving Mathematical Reasoning with Process Supervision.
Dr. Jim Fan from Nvidia says the idea is so simple that it fits in one tweet.
For challenging step-by-step problems, give a reward at each step, instead of a single
reward at the end. The way that OpenAI describes it is, we've trained a model to achieve a new
state-of-the-art in mathematical problem-solving by rewarding each correct step of reasoning, i.e. process
supervision, instead of simply rewarding the correct final answer, which is outcome supervision.
In addition to boosting performance relative to outcome supervision, process supervision also
has an important alignment benefit. It directly trains the model to produce a chain of thought that is
endorsed by humans. Let's read a little bit more from the introduction. They write,
In recent years, large language models have greatly improved in their ability to perform complex
multi-step reasoning.
However, even state-of-the-art models still produce logical mistakes, often called hallucinations.
Mitigating hallucinations is a critical step towards building aligned AGI.
We can train reward models to detect hallucinations either using outcome supervision, which
provides feedback based on a final result, or process supervision, which provides feedback on each
individual step in a chain of thought.
We conducted a detailed comparison of these two methods using the math dataset as our
testbed. We find that process supervision leads to significantly better performance even when judged by
outcomes. So here's the simple chart for how these two different methodologies performed as it related to
solving math problems. The outcome supervised approach in which the reward was only for the right
outcome correctly solved problems about 71% of the time, while the process supervised approach
got the right answer 78% of the time. So there are a couple things to note here. One is that even if
there were no alignment benefits, teaching LLMs to solve math problems more accurately, is
is a valuable thing in its own right.
But second, it's not hard to understand
the potential benefits here when it comes to an AI alignment perspective.
Adipai on Twitter wrote a really good summary saying,
this open AI paper might as well have been titled
Moving Away from Paperclip maxing.
They took a base GPT4, fine-tuned it on a bit of math
so that it understood the language as well as the output format,
then no reinforcement learning.
Instead, they trained and compared two reward models.
One, outcome only, and two, process and outcome.
This is clearly a building block
to reducing the expense of human supervision for reinforcement learning.
The humans move up the value chain from supervising the model
to supervising the reward model to the model.
The process reward model system is so human,
exactly the way teachers teach math in early grades.
Show your work or steps.
Process matters as much as outcome.
It's only applied to math right now,
but I can totally see a way to move this to teaching rules and laws of human society,
just like we do with kids.
They tested on AP Chem, physics, etc.,
and found the process model outperforming the objective model,
model. It's a step away from paperclip maximization, i.e. objective goal focusing whatever necessary.
Now briefly, this idea of paperclip maxing is one of the most off-talked about AI safety or
AI risk scenarios. And basically what it's shorthand for is the idea that if an AI has an objective,
one of the ways that things could go badly is if it determines that humans are in some ways
the barrier to that objective. So if its goal has been programmed to be make the most paperclips,
what happens if it decides that humans are getting in its way?
So what Aida Pi is pointing out is that this approach to training takes some emphasis off the end objective
and also rewards along the process of how an AI gets there.
OpenAI also points out that it has interpretability benefits.
They write,
Process supervision is also more likely to produce interpretable reasoning
since it encourages the model to follow a human-approved process.
In contrast, outcome supervision may reward an unaligned process
and it is generally harder to scrutinize.
Now, as OpenAI points out,
there has been a sense in the past that safer methods,
for AI systems can sometimes lead to reduced performance. This is sometimes known as an alignment
tax. And of course, alignment taxes may hinder the adoption of alignment methods. However, they say in
this case, when it comes to math, their process supervision model incurs a negative alignment tax,
i.e. a performance benefit over other approaches. This, they say, could increase the adoption
of process supervision, which we believe would have positive alignment side effects.
So what's the problem with this? Well, as Dr. Jim Fan points out,
A caveat is that the process reward model does require a lot more human labeling.
For example, as part of this research, OpenAI, released their human feedback data set,
which was 800,000 step-level labels across 75,000 solutions to 12,000 math problems.
Still, even with that caveat, it's not every day that we get something that shows both improved performance of AI models as well as better alignment.
And so maybe this moves some folks pee dooms out there just a little bit down.
That's it for today's AI Breakdown.
If you're enjoying the show,
please like, subscribe, and share.
Check out the newsletter version at the AIbreakdown.bohythebriketown.bhive.com
or go check out the AI Breakdown podcast.
Until next time, guys.
Peace.
