The AI Daily Brief: Artificial Intelligence News and Analysis - Which AI Model is the Most Transparent?
Episode Date: October 19, 2023A group of researchers from Stanford, MIT and Princeton have come up with a new system for determining which AI foundation models are the most transparent. Watch to figure out who scores highest. Befo...re that on the Brief: Anthropic is sued by Universal Music Group, the FBI says AI is helping with terrorist propaganda, and more. ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI Breakdown, we're looking at a new methodology for determining which
AI model is the most transparent.
Before that on the brief, Universal Music Group has sued Anthropic around the infringement
of copyright.
The AI Breakdown is a daily podcast and video about the most important news and discussions
in AI.
Go to Breakdown.network for more information about our Discord channel, our newsletter, and
our YouTube.
Welcome back to the AI Breakdown Brief, all the AI headline news you need in around five minutes.
Today, we kick off with yet another lawsuit around the way that AI models have been trained.
This time, however, it's not a group of authors.
It is the viciously-lawyered up music industry, specifically Universal Music Group,
filing a $75 million lawsuit against Anthropic AI.
They argue that Anthropic has perpetrated mass copyright infringement in serving up their lyrics to people who ask.
All right, so let's get into some specifics.
This was actually a trio of music publishers, UMG.
Concord Music Group and ABK Co.
And the suit was filed in Tennessee Federal Court.
The specific accusation was, quote, systemic and widespread infringement
by copying and distributing lyrics from at least 500 songs,
including Katie Perry, the Rolling Stones, and Beyonce.
The complaint says Anthropic has neither sought nor secured publishers' permission
to use their valuable copyrighted works in this way.
Just as Anthropic does not want its code taken without its authorization,
neither do music publishers or any other copyright owners want their works
to be exploited without permission.
Now, in terms of what their evidence is,
They basically asked Claude what the lyrics to Katie Perry's roar were, which it provided not without a few errors, but mostly.
They say that this undercuts music lyric aggregators and other websites who have explicitly licensed those works.
And that by not licensing this content, Anthropic is, quote, depriving publishers and their songwriters of control over the copyrighted works and the hard-earned benefits of their creative endeavors.
They say it's also competing unfairly against those website developers that respect the copyright law and pay for licenses.
Now, I will, for the sake of this piece, bite my tongue about what I view as the utter stupidity
of having to license the ability to print song lyrics on the web.
But it feels to me like this lawsuit isn't really about that.
This is yet one more in the line of lawsuits that are frankly trying to get their way to the
Supreme Court to figure out how to deal with copyright in AI training in general.
Now, the Hollywood Reporter suggests that this specific case in the way that Universal Music Group
has architected it, is meant to cut directly out what they would expect the defense to be,
which is, of course, fair use. The reason this might be harder to defend with fair use is that
since other websites are licensing lyrics to be able to print them, this actually potentially
hurts the business opportunities of the publisher. Now, so far, Anthropic hasn't responded to
request for comment, but it's clear at this point that a major part of the AI lab's job is
going to be fighting these copyright battles until it gets actually sorted out, either in the
courts or through some sort of regulatory policy, but frankly, much more likely in the courts.
It was then interesting to see, however, that even as Anthropic was being sued by the labels,
YouTube is apparently trying to work with the labels to specifically get access to rights for
their songs to train a new AI-powered voice replication tool.
Wrights Bloomberg, who broke the news, YouTube is developing a tool powered by artificial
intelligence that would let creators record audio using the voices of famous musicians.
The video site has approached music companies about obtaining the rights to songs it could use to
train the tool, although major label records have yet to sign off on the deal. Now, if you've
listened to me with any regularity, you've heard me talk about exactly this before. I literally
cannot imagine that the way that this gets resolved in part is that we end up with a handful
of rights approved venues or rights approved models through which people who want to create
music that sounds like Drake or Sia or Bob Dylan or whoever can actually do so legally. One of the
things about the music industry is that they are extremely good at adapting to the threat of new technology
and then co-opting it in a way that reinforces their supremacy at the center of the industry. That was the
byproduct of Napster into the streaming era, and I would be shocked if anything different happened here.
That doesn't mean, of course, that there won't be bootlegged versions and people training their own
models on artists without their permission or without the rights approval. But the bet will be that the
legitimate use case that's willing to pay for the privilege in some way or another will be a heck of a lot
higher than the pirated use case, and that sort of will work out better for everyone. Meanwhile,
yet another story of lawsuits against AI training, former governor Mike Huckabee, along with a group
of other Christian authors, have filed a lawsuit in New York federal court, arguing that their
works were used unlawfully in the training of models from OpenAI and others. This particular
lawsuit targets meta, Microsoft, and even Bloomberg LP. Interestingly, this threads a little bit of a line.
The plaintiffs here say, while using books as part of data sets is not inherently problematic,
Using pirated or stolen books does not fairly compensate authors and publishers for their work.
So I guess if Meta and Microsoft had just bought a copy of each of these books, it would have been
fine with them. I'm not really sure, but it is an interesting wrinkle in this larger conversation.
Now let's move on to some very, very different topics. First, Amazon has announced two new
types of robots that are being integrated into their delivery and fulfillment systems.
This news comes directly from Amazon in a blog post written by Scott Dresser, the VP of Amazon
robotics. First, robotic system is called Sequoia, and it's designed to improve how warehouses
fulfill customer orders. So far, it's operating at one of their fulfillment centers in Houston, Texas.
Dresser writes, Sequoia allows us to identify and store inventory we receive at our fulfillment
centers up to 75% faster than we can today. This means we can list items for sale on Amazon.com more
quickly, and when orders are placed, Sequoia also reduces the time it takes to process an order through
a fulfillment center by up to 25%, which improves our shipping predictability and increases the number of
goods we can offer for same day or next day shipping. Now, the other robot that they announced
was a bipedal, quote, mobile manipulator solution, in other words, a type of robot that can move
while also grasping and handling items. This one is called digit and is a collaboration with agility
robotics. They argue that robots like these are going to be best used in collaboration with humans,
specifically on hyper repetitive tasks. For example, they write, our initial use for this technology
will be to help employees with tote recycling, a highly repetitive process of picking up and moving
empty totes once inventory has been completely picked out of them. Moving on to another development
in the product space, there is a ton of innovation right now in and around voice cloning. 11 Labs,
Wondercraft, and others have all come out with voice dubbing for YouTube and podcast content
recently. And now PlayHT has announced PlayHT2.0 Turbo. They call it the fastest conversational
AI text-to-speech model. I think this is an area that is going to have a lot of impact in how
content gets produced, and so I'm always interested to keep track of the updates in the space.
Now, closing out more on the policy and macro side of things, FBI director Christopher Ray
discussed AI earlier this week in a conference for the U.S. in its closest intelligence allies.
He said that AI has already been successfully used to amplify terrorist propaganda
and that terrorist groups are very focused on trying to get around AI safeguards, said his
British counterpart Ken McCallum, quote,
If you are experienced in security, you would be unwise to rely on these controls to remain impregnable.
There is clear risk that some of these systems can be used, put
to uses that their makers did not intend. Relatedly, another New York Times piece on AI today
tells the story of a group of researchers who found that AI safeguards are not so safe at all.
The paper is called fine-tuning-aligned language models compromises safety even when users do not intend to.
And basically, they're pointing out that there are safety costs associated with the type
of customized fine-tuning that many models are going through right now as they go into production
for various real-world applications. They write,
Are red-teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning
with only a few adversarially designed training examples.
For instance, we jailbreak GBT 3.5 turbo safety guardrails by fine-tuning it on only 10
such examples at a cost of less than 20 cents via OpenAI's APIs, making the model responsive
to nearly any harmful instruction.
Disconcertingly, our research also reveals that even without malicious intent, simply
fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety
alignment of LLMs, although to a lesser extent.
The concern was summed up by Scale AIs Riley Goodside, who said,
This is a very real concern for the future.
We do not know all the ways this can go wrong.
But for now, I will leave you to ponder all the ways that it can go wrong.
I, of course, appreciate you guys listening or watching as always.
Next up, the main AI breakdown.
One of the notable and frankly often griped about features of the artificial intelligence space
is how opaque many of the leading models are.
Even the labs who work on these models don't really know exactly how they produce the
results they produce, and for people on the outside who don't have information about things like
what data the models were trained on, the lack of transparency can be an even greater concern.
Well, there is a new project coming out of Stanford, but with the participation of researchers
at MIT and Princeton as well, that is designed to create a system for actually capturing
and scoring the transparency of major AI Foundation models.
What we are going to do today is look at that new system, see how different models score,
and then look at what some of the responses in the AI community have been.
The name of the project is the Foundation Model Transparency Index.
And the project was organized by the Stanford University Human-Centered Artificial Intelligence
Institute or HAI, and specifically within it, the Center for Research on Foundation models.
The motivation, says Rishi Bomasani from the CRFM,
is that companies in the space are becoming less, not more transparent,
about how the name OpenAI looks a little bit ironic now,
given how they sit relative to others in the space.
In their announcement post, Stamford's H.AI makes a bunch of different arguments about why it is important to have more transparency around these models. They write, less transparency makes it harder for other businesses to know if they can safely build applications that rely on commercial foundation models, for research, for policymakers to design meaningful policies to reign in this powerful technology, and for consumers to understand model limitations or seek redress for harms caused. Now, of course, even if one has a lot of excitement about the goals of a project like this,
The devil can really be in the details.
So how was this group of academics going about trying to figure out which models were most transparent?
They ended up deciding on 100 different indicators that would be used to give a transparency score overall.
They linked to a PDF hosted on GitHub that has all 100 of those indicators that you can scroll through.
But the major dimensions that they list include data, labor, compute, methods, model basics, model access, capabilities, risks, mitigations, distribution, usage policy, feedback,
and impact. Now, in terms of the methodology, for the sake of this starting point, they picked
10 major foundation models, which of course leaves a ton of projects out, but which they thought
represented a meaningful slice of the industry. From there, they gathered all publicly available
information, which of course is the only information that matters, given that this is a study
about transparency. In fact, that's one of the things that I think makes the methodology the
best, or at least the most reliable, is that the whole point is to see what they've said publicly,
so it doesn't so much matter if there's information we don't have access to behind the scenes. That's
kind of exactly the point. From there, for each of these 10 projects, two researchers scored the 100
indicators, and then compared scores with their fellow, and discussed to resolve disagreements.
The last step was sharing the scores with leaders at the companies, which gave them a chance
to contest scores they disagreed with, which could then be factored into the final scores.
So let's jump to the high level first. First, the most transparent model in this index was, in fact,
Mehta's Lama 2. Now, this will probably be gratifying to Meta, who of course have set out to be the
real open AI project, given their open source,
approach to how they're doing things. Other notable entrants on this list include OpenAIs GPT4,
coming in at third, stability AI's stable diffusion coming in fourth at 47, Google's Palm 2
and Anthropics Claude 2 coming in 5th and 6, with scores of 40 and 36 respectively, and way down
near the bottom inflection coming in at 9th with a score of just 21, and Amazon's Titan Tech's
coming in at just 12. Now, even though Meta's Lama 2 was the highest score, it still scored only a 54%,
suggesting that there is a lot more room for transparency within this field.
Now, getting into the specific indicators, they divided those 100 indicators into upstream,
model, and downstream.
Upstream refers to, quote, the ingredients and processes involved in building a foundation model,
such as the computational resources, data, or labor used.
The model indicators, quote, specify the properties and function of the foundation model,
such as the model's architecture capabilities and risks,
and the downstream indicators, quote, specify how the foundation model is distributed and used,
such as the model's impact on users, any updates to the models, and the policies that govern its use.
I think in some ways more telling than just the overall score is breaking it down in terms of these
major dimensions of transparency. For example, one of the rail standout low scores is that when
it came to information about the data that models were trained on, these 10 projects scored an
average of just 20%. The high was Blooms at 60% and Lama 2 had 40%, but many had 0% to 20%.
The highest category on average was model basics, the basic information about what models can do.
The average score there was 63%.
Now, one of the things that one might expect is that there would be a fairly significant difference
between open developers versus closed developers, and indeed, that's what we saw.
Three of the top four projects were the open models, including Lama 2, blooms, and stable
diffusion, which scored respectively 54%, 53% and 47%.
OpenAI's GPT4 came in at third again just above stable diffusion 2.
The researchers also noted that.
a lot of the disparity here had to do with that upstream category and a lack of transparency
around the data used to train the model, what labor was used to train the model, and how
much compute was used to build the model, which was a lot more clear with the open developers
than with the closed. Indeed, it's even more stark when you look at the average transparency
of the open versus closed developers on a dimension by dimension level. When it comes to what
data models were trained on, open developers scored 47%, whereas closed developers scored just 9%.
Labor and compute were similarly stark, with 43% versus 6% in both cases.
Methods, model basics, and model access also had huge disparity.
Open models scored a 92% on methods versus just a 29% for example for closed developers.
Now, there were a couple areas where closed developers did outscore open developers.
Those include capabilities, risks, and mitigations, where in each case, closed developers were
slightly ahead of their open counterparts.
Researchers speculate that this is because closed developers dedicate more
resources to actually controlling and shaping the way the model is used after it's released.
Usage policy, for example, is another area where closed way outscored open, 49% to 33%.
One of the researchers involved in the project, Syash Kapoor, who also writes the AI Snake Oil
newsletter wrote, developers of open foundation models scored higher in many axes of transparency
despite many of our indicators being easier to satisfy for closed models. For example, many indicators
assessed policies for downstream use, since closed model developers often provide access only through
an API, they can share information related to downstream use more easily, whereas developers of
open models need to collaborate with the downstream deployers to satisfactorily provide such
information. In theory, this should mean a much higher score for closed models on these indicators,
but we find no substantive difference. Now, in terms of how they sum up their findings,
the researchers said, quote, no major foundation model developer is close to providing adequate
transparency, revealing a fundamental lack of transparency in the AI industry. However, and I thought
that this was a really interesting point, while the mean score was just
37%, 82 of the 100 indicators were satisfied by at least one developer, meaning that if these
developers simply adopted best practices from their competitors, it would significantly
improve transparency all on its own. Now, this study has been picked up by the news quite a bit.
I think putting on my meta-narrative analysis hat for a minute, it fits a bit of the general
skeptical bias that many news organizations have when it comes to technology in general right now
and AI more specifically. The New York Times piece about this was called Maybe We Will Finally
learn more about how AI works.
Other Kevin Ruse definitely wants more, not less transparency.
As he put it, we can't have an AI revolution in the dark.
We need to see inside the black boxes of AI if we're going to let it transform our lives.
Now, some like VC Vinod Kossela, thought that the approach was just ridiculous to start with.
He tweeted, Stanford's AI model openness ranking likely in reverse order of the competency
of the model.
Naive to ask private companies to disclose their secrets or investment will decline and
we will help China.
Would we disclose all details of the Manhattan Project?
Interestingly, Ruse tries to peel apart more specific reasons why AI labs say that they're not more transparent.
The first category of answers he writes is lawsuits.
Basically, here lawyers at AI companies are worried that the more that they say about how their models were trained, the more it opens them up to lawsuits around that training.
Given that the brief today started with a story about Universal Music Group suing Anthropic and then continued with an extension of another writer's lawsuit, this is not an unreasonable concern.
The second response Ruse hears is around competition.
Quote, most AI companies believe that their models work because,
they possess some kind of secret sauce, a high-quality data set that other companies don't have,
a fine-tuning technique that produces better results, some optimization that gives them an edge.
If you force AI companies to disclose these recipes, they argue, you make them give away
hard-won wisdom to their rivals who can easily copy them. The last argument, Roos says,
is around safety, and certainly this is one you've heard from people like OpenAI Sam Altman.
Ruse says, basically the argument here is that if you give more information about models,
the faster progress around creating new models will accelerate, and the more likely it is that it
gets in the hands of the wrong people, or just creates an arms race from which we can't escape.
As Ruse puts it, it would give society less time to regulate and slow down AI, which could put us
all in danger if AI becomes too capable too quickly.
Ruse says these researchers don't buy it and neither does he. Quote, if AI executives are worried
about lawsuits, maybe they should fight it for a fair use exemption that would protect their
ability to use copyrighted information to train their models, rather than hiding the evidence.
If they're worried about giving away trade secrets to rivals, they can disclose other types of
information or protect their ideas through patents. And if they're worried about
starting an AI arms race, well, aren't we already in one? Certainly when it comes to the lawsuits,
this is happening whether they hide the information or not, so it might not be an issue for much
longer. Now, the one other type of response that I saw that is worth noting are folks who are broadly
supportive of this goal, but who weren't really sure about the specific results of this test. For example,
Clem, the co-founder and CEO of Hugging Face said, the scores and ranking look super weird and
inconsistent to me and a lot of interesting models are missing, but I love the message from
Stanford. More transparency equals more safety for AI.
Now, I think when push comes to shove, questions of transparency and more specifically
expectations around transparency are going to be dictated by policymakers rather than shaped by some
industry norms.
But I do think that this sort of attempt to try to articulate the dimensions of transparency
is going to be super useful in trying to actually make that policy more precise and more
targeted at what it's actually trying to achieve without negative unintended consequences.
I think in general it's a net asset to have this sort of information, even if it inherently
remains incomplete. And so good on these researchers for going about this work.
Another great topic for discussion in the AI breakdown Discord. Again, that link is bit.ly slash
AI breakdown. Come join us, chat about transparency and anything else. And until next time,
peace.
