The AI Daily Brief: Artificial Intelligence News and Analysis - Which AI Model is the Most Transparent?

Starting point is 00:00:00 Today on the AI Breakdown, we're looking at a new methodology for determining which AI model is the most transparent. Before that on the brief, Universal Music Group has sued Anthropic around the infringement of copyright. The AI Breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our Discord channel, our newsletter, and our YouTube.

Starting point is 00:00:25 Welcome back to the AI Breakdown Brief, all the AI headline news you need in around five minutes. Today, we kick off with yet another lawsuit around the way that AI models have been trained. This time, however, it's not a group of authors. It is the viciously-lawyered up music industry, specifically Universal Music Group, filing a $75 million lawsuit against Anthropic AI. They argue that Anthropic has perpetrated mass copyright infringement in serving up their lyrics to people who ask. All right, so let's get into some specifics. This was actually a trio of music publishers, UMG.

Starting point is 00:01:00 Concord Music Group and ABK Co. And the suit was filed in Tennessee Federal Court. The specific accusation was, quote, systemic and widespread infringement by copying and distributing lyrics from at least 500 songs, including Katie Perry, the Rolling Stones, and Beyonce. The complaint says Anthropic has neither sought nor secured publishers' permission to use their valuable copyrighted works in this way. Just as Anthropic does not want its code taken without its authorization,

Starting point is 00:01:23 neither do music publishers or any other copyright owners want their works to be exploited without permission. Now, in terms of what their evidence is, They basically asked Claude what the lyrics to Katie Perry's roar were, which it provided not without a few errors, but mostly. They say that this undercuts music lyric aggregators and other websites who have explicitly licensed those works. And that by not licensing this content, Anthropic is, quote, depriving publishers and their songwriters of control over the copyrighted works and the hard-earned benefits of their creative endeavors. They say it's also competing unfairly against those website developers that respect the copyright law and pay for licenses. Now, I will, for the sake of this piece, bite my tongue about what I view as the utter stupidity

Starting point is 00:02:02 of having to license the ability to print song lyrics on the web. But it feels to me like this lawsuit isn't really about that. This is yet one more in the line of lawsuits that are frankly trying to get their way to the Supreme Court to figure out how to deal with copyright in AI training in general. Now, the Hollywood Reporter suggests that this specific case in the way that Universal Music Group has architected it, is meant to cut directly out what they would expect the defense to be, which is, of course, fair use. The reason this might be harder to defend with fair use is that since other websites are licensing lyrics to be able to print them, this actually potentially

Starting point is 00:02:41 hurts the business opportunities of the publisher. Now, so far, Anthropic hasn't responded to request for comment, but it's clear at this point that a major part of the AI lab's job is going to be fighting these copyright battles until it gets actually sorted out, either in the courts or through some sort of regulatory policy, but frankly, much more likely in the courts. It was then interesting to see, however, that even as Anthropic was being sued by the labels, YouTube is apparently trying to work with the labels to specifically get access to rights for their songs to train a new AI-powered voice replication tool. Wrights Bloomberg, who broke the news, YouTube is developing a tool powered by artificial

Starting point is 00:03:15 intelligence that would let creators record audio using the voices of famous musicians. The video site has approached music companies about obtaining the rights to songs it could use to train the tool, although major label records have yet to sign off on the deal. Now, if you've listened to me with any regularity, you've heard me talk about exactly this before. I literally cannot imagine that the way that this gets resolved in part is that we end up with a handful of rights approved venues or rights approved models through which people who want to create music that sounds like Drake or Sia or Bob Dylan or whoever can actually do so legally. One of the things about the music industry is that they are extremely good at adapting to the threat of new technology

Starting point is 00:03:54 and then co-opting it in a way that reinforces their supremacy at the center of the industry. That was the byproduct of Napster into the streaming era, and I would be shocked if anything different happened here. That doesn't mean, of course, that there won't be bootlegged versions and people training their own models on artists without their permission or without the rights approval. But the bet will be that the legitimate use case that's willing to pay for the privilege in some way or another will be a heck of a lot higher than the pirated use case, and that sort of will work out better for everyone. Meanwhile, yet another story of lawsuits against AI training, former governor Mike Huckabee, along with a group of other Christian authors, have filed a lawsuit in New York federal court, arguing that their

Starting point is 00:04:31 works were used unlawfully in the training of models from OpenAI and others. This particular lawsuit targets meta, Microsoft, and even Bloomberg LP. Interestingly, this threads a little bit of a line. The plaintiffs here say, while using books as part of data sets is not inherently problematic, Using pirated or stolen books does not fairly compensate authors and publishers for their work. So I guess if Meta and Microsoft had just bought a copy of each of these books, it would have been fine with them. I'm not really sure, but it is an interesting wrinkle in this larger conversation. Now let's move on to some very, very different topics. First, Amazon has announced two new types of robots that are being integrated into their delivery and fulfillment systems.

Starting point is 00:05:08 This news comes directly from Amazon in a blog post written by Scott Dresser, the VP of Amazon robotics. First, robotic system is called Sequoia, and it's designed to improve how warehouses fulfill customer orders. So far, it's operating at one of their fulfillment centers in Houston, Texas. Dresser writes, Sequoia allows us to identify and store inventory we receive at our fulfillment centers up to 75% faster than we can today. This means we can list items for sale on Amazon.com more quickly, and when orders are placed, Sequoia also reduces the time it takes to process an order through a fulfillment center by up to 25%, which improves our shipping predictability and increases the number of goods we can offer for same day or next day shipping. Now, the other robot that they announced

Starting point is 00:05:45 was a bipedal, quote, mobile manipulator solution, in other words, a type of robot that can move while also grasping and handling items. This one is called digit and is a collaboration with agility robotics. They argue that robots like these are going to be best used in collaboration with humans, specifically on hyper repetitive tasks. For example, they write, our initial use for this technology will be to help employees with tote recycling, a highly repetitive process of picking up and moving empty totes once inventory has been completely picked out of them. Moving on to another development in the product space, there is a ton of innovation right now in and around voice cloning. 11 Labs, Wondercraft, and others have all come out with voice dubbing for YouTube and podcast content

Starting point is 00:06:25 recently. And now PlayHT has announced PlayHT2.0 Turbo. They call it the fastest conversational AI text-to-speech model. I think this is an area that is going to have a lot of impact in how content gets produced, and so I'm always interested to keep track of the updates in the space. Now, closing out more on the policy and macro side of things, FBI director Christopher Ray discussed AI earlier this week in a conference for the U.S. in its closest intelligence allies. He said that AI has already been successfully used to amplify terrorist propaganda and that terrorist groups are very focused on trying to get around AI safeguards, said his British counterpart Ken McCallum, quote,

Starting point is 00:07:01 If you are experienced in security, you would be unwise to rely on these controls to remain impregnable. There is clear risk that some of these systems can be used, put to uses that their makers did not intend. Relatedly, another New York Times piece on AI today tells the story of a group of researchers who found that AI safeguards are not so safe at all. The paper is called fine-tuning-aligned language models compromises safety even when users do not intend to. And basically, they're pointing out that there are safety costs associated with the type of customized fine-tuning that many models are going through right now as they go into production for various real-world applications. They write,

Starting point is 00:07:35 Are red-teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GBT 3.5 turbo safety guardrails by fine-tuning it on only 10 such examples at a cost of less than 20 cents via OpenAI's APIs, making the model responsive to nearly any harmful instruction. Disconcertingly, our research also reveals that even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, although to a lesser extent.

Starting point is 00:08:05 The concern was summed up by Scale AIs Riley Goodside, who said, This is a very real concern for the future. We do not know all the ways this can go wrong. But for now, I will leave you to ponder all the ways that it can go wrong. I, of course, appreciate you guys listening or watching as always. Next up, the main AI breakdown. One of the notable and frankly often griped about features of the artificial intelligence space is how opaque many of the leading models are.

Starting point is 00:08:36 Even the labs who work on these models don't really know exactly how they produce the results they produce, and for people on the outside who don't have information about things like what data the models were trained on, the lack of transparency can be an even greater concern. Well, there is a new project coming out of Stanford, but with the participation of researchers at MIT and Princeton as well, that is designed to create a system for actually capturing and scoring the transparency of major AI Foundation models. What we are going to do today is look at that new system, see how different models score, and then look at what some of the responses in the AI community have been.

Starting point is 00:09:12 The name of the project is the Foundation Model Transparency Index. And the project was organized by the Stanford University Human-Centered Artificial Intelligence Institute or HAI, and specifically within it, the Center for Research on Foundation models. The motivation, says Rishi Bomasani from the CRFM, is that companies in the space are becoming less, not more transparent, about how the name OpenAI looks a little bit ironic now, given how they sit relative to others in the space. In their announcement post, Stamford's H.AI makes a bunch of different arguments about why it is important to have more transparency around these models. They write, less transparency makes it harder for other businesses to know if they can safely build applications that rely on commercial foundation models, for research, for policymakers to design meaningful policies to reign in this powerful technology, and for consumers to understand model limitations or seek redress for harms caused. Now, of course, even if one has a lot of excitement about the goals of a project like this,

Starting point is 00:10:07 The devil can really be in the details. So how was this group of academics going about trying to figure out which models were most transparent? They ended up deciding on 100 different indicators that would be used to give a transparency score overall. They linked to a PDF hosted on GitHub that has all 100 of those indicators that you can scroll through. But the major dimensions that they list include data, labor, compute, methods, model basics, model access, capabilities, risks, mitigations, distribution, usage policy, feedback, and impact. Now, in terms of the methodology, for the sake of this starting point, they picked 10 major foundation models, which of course leaves a ton of projects out, but which they thought represented a meaningful slice of the industry. From there, they gathered all publicly available

Starting point is 00:10:50 information, which of course is the only information that matters, given that this is a study about transparency. In fact, that's one of the things that I think makes the methodology the best, or at least the most reliable, is that the whole point is to see what they've said publicly, so it doesn't so much matter if there's information we don't have access to behind the scenes. That's kind of exactly the point. From there, for each of these 10 projects, two researchers scored the 100 indicators, and then compared scores with their fellow, and discussed to resolve disagreements. The last step was sharing the scores with leaders at the companies, which gave them a chance to contest scores they disagreed with, which could then be factored into the final scores.

Starting point is 00:11:22 So let's jump to the high level first. First, the most transparent model in this index was, in fact, Mehta's Lama 2. Now, this will probably be gratifying to Meta, who of course have set out to be the real open AI project, given their open source, approach to how they're doing things. Other notable entrants on this list include OpenAIs GPT4, coming in at third, stability AI's stable diffusion coming in fourth at 47, Google's Palm 2 and Anthropics Claude 2 coming in 5th and 6, with scores of 40 and 36 respectively, and way down near the bottom inflection coming in at 9th with a score of just 21, and Amazon's Titan Tech's coming in at just 12. Now, even though Meta's Lama 2 was the highest score, it still scored only a 54%,

Starting point is 00:12:03 suggesting that there is a lot more room for transparency within this field. Now, getting into the specific indicators, they divided those 100 indicators into upstream, model, and downstream. Upstream refers to, quote, the ingredients and processes involved in building a foundation model, such as the computational resources, data, or labor used. The model indicators, quote, specify the properties and function of the foundation model, such as the model's architecture capabilities and risks, and the downstream indicators, quote, specify how the foundation model is distributed and used,

Starting point is 00:12:32 such as the model's impact on users, any updates to the models, and the policies that govern its use. I think in some ways more telling than just the overall score is breaking it down in terms of these major dimensions of transparency. For example, one of the rail standout low scores is that when it came to information about the data that models were trained on, these 10 projects scored an average of just 20%. The high was Blooms at 60% and Lama 2 had 40%, but many had 0% to 20%. The highest category on average was model basics, the basic information about what models can do. The average score there was 63%. Now, one of the things that one might expect is that there would be a fairly significant difference

Starting point is 00:13:10 between open developers versus closed developers, and indeed, that's what we saw. Three of the top four projects were the open models, including Lama 2, blooms, and stable diffusion, which scored respectively 54%, 53% and 47%. OpenAI's GPT4 came in at third again just above stable diffusion 2. The researchers also noted that. a lot of the disparity here had to do with that upstream category and a lack of transparency around the data used to train the model, what labor was used to train the model, and how much compute was used to build the model, which was a lot more clear with the open developers

Starting point is 00:13:42 than with the closed. Indeed, it's even more stark when you look at the average transparency of the open versus closed developers on a dimension by dimension level. When it comes to what data models were trained on, open developers scored 47%, whereas closed developers scored just 9%. Labor and compute were similarly stark, with 43% versus 6% in both cases. Methods, model basics, and model access also had huge disparity. Open models scored a 92% on methods versus just a 29% for example for closed developers. Now, there were a couple areas where closed developers did outscore open developers. Those include capabilities, risks, and mitigations, where in each case, closed developers were

Starting point is 00:14:21 slightly ahead of their open counterparts. Researchers speculate that this is because closed developers dedicate more resources to actually controlling and shaping the way the model is used after it's released. Usage policy, for example, is another area where closed way outscored open, 49% to 33%. One of the researchers involved in the project, Syash Kapoor, who also writes the AI Snake Oil newsletter wrote, developers of open foundation models scored higher in many axes of transparency despite many of our indicators being easier to satisfy for closed models. For example, many indicators assessed policies for downstream use, since closed model developers often provide access only through

Starting point is 00:14:56 an API, they can share information related to downstream use more easily, whereas developers of open models need to collaborate with the downstream deployers to satisfactorily provide such information. In theory, this should mean a much higher score for closed models on these indicators, but we find no substantive difference. Now, in terms of how they sum up their findings, the researchers said, quote, no major foundation model developer is close to providing adequate transparency, revealing a fundamental lack of transparency in the AI industry. However, and I thought that this was a really interesting point, while the mean score was just 37%, 82 of the 100 indicators were satisfied by at least one developer, meaning that if these

Starting point is 00:15:33 developers simply adopted best practices from their competitors, it would significantly improve transparency all on its own. Now, this study has been picked up by the news quite a bit. I think putting on my meta-narrative analysis hat for a minute, it fits a bit of the general skeptical bias that many news organizations have when it comes to technology in general right now and AI more specifically. The New York Times piece about this was called Maybe We Will Finally learn more about how AI works. Other Kevin Ruse definitely wants more, not less transparency. As he put it, we can't have an AI revolution in the dark.

Starting point is 00:16:04 We need to see inside the black boxes of AI if we're going to let it transform our lives. Now, some like VC Vinod Kossela, thought that the approach was just ridiculous to start with. He tweeted, Stanford's AI model openness ranking likely in reverse order of the competency of the model. Naive to ask private companies to disclose their secrets or investment will decline and we will help China. Would we disclose all details of the Manhattan Project? Interestingly, Ruse tries to peel apart more specific reasons why AI labs say that they're not more transparent.

Starting point is 00:16:31 The first category of answers he writes is lawsuits. Basically, here lawyers at AI companies are worried that the more that they say about how their models were trained, the more it opens them up to lawsuits around that training. Given that the brief today started with a story about Universal Music Group suing Anthropic and then continued with an extension of another writer's lawsuit, this is not an unreasonable concern. The second response Ruse hears is around competition. Quote, most AI companies believe that their models work because, they possess some kind of secret sauce, a high-quality data set that other companies don't have, a fine-tuning technique that produces better results, some optimization that gives them an edge. If you force AI companies to disclose these recipes, they argue, you make them give away

Starting point is 00:17:07 hard-won wisdom to their rivals who can easily copy them. The last argument, Roos says, is around safety, and certainly this is one you've heard from people like OpenAI Sam Altman. Ruse says, basically the argument here is that if you give more information about models, the faster progress around creating new models will accelerate, and the more likely it is that it gets in the hands of the wrong people, or just creates an arms race from which we can't escape. As Ruse puts it, it would give society less time to regulate and slow down AI, which could put us all in danger if AI becomes too capable too quickly. Ruse says these researchers don't buy it and neither does he. Quote, if AI executives are worried

Starting point is 00:17:41 about lawsuits, maybe they should fight it for a fair use exemption that would protect their ability to use copyrighted information to train their models, rather than hiding the evidence. If they're worried about giving away trade secrets to rivals, they can disclose other types of information or protect their ideas through patents. And if they're worried about starting an AI arms race, well, aren't we already in one? Certainly when it comes to the lawsuits, this is happening whether they hide the information or not, so it might not be an issue for much longer. Now, the one other type of response that I saw that is worth noting are folks who are broadly supportive of this goal, but who weren't really sure about the specific results of this test. For example,

Starting point is 00:18:13 Clem, the co-founder and CEO of Hugging Face said, the scores and ranking look super weird and inconsistent to me and a lot of interesting models are missing, but I love the message from Stanford. More transparency equals more safety for AI. Now, I think when push comes to shove, questions of transparency and more specifically expectations around transparency are going to be dictated by policymakers rather than shaped by some industry norms. But I do think that this sort of attempt to try to articulate the dimensions of transparency is going to be super useful in trying to actually make that policy more precise and more

Starting point is 00:18:45 targeted at what it's actually trying to achieve without negative unintended consequences. I think in general it's a net asset to have this sort of information, even if it inherently remains incomplete. And so good on these researchers for going about this work. Another great topic for discussion in the AI breakdown Discord. Again, that link is bit.ly slash AI breakdown. Come join us, chat about transparency and anything else. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Which AI Model is the Most Transparent?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.