The AI Daily Brief: Artificial Intelligence News and Analysis - Which LLMs Hallucinate Least?
Episode Date: November 14, 2023On today's episode, NLW looks at new research about LLM hallucination; Google suing AI scammers; and China and the US working towards an agreement not to use AI in nuclear device control systems. Toda...y's Sponsor: Notion - Notion AI. Knowledge, answers, ideas. One click away. - https://notion.com/aibreakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, the U.S. and China are set to agree not to use AI in the control of nuclear weapon systems.
Before that, in the brief, which LLMs hallucinate least.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown Not Network for more information about our YouTube channel, our Discord, and our newsletter.
Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes.
One of the things that we have been talking about quite a bit here on the AI breakdown.
breakdown is the fact that we are moving from a period that was largely characterized by
experimentation and sort of first blush efforts to experiment with AI to a world in which
organizations and professionals are increasingly integrating generative AI tools into their
actual professional workflows. Now, in that, one of the big barriers is, of course,
the fact that AI models still hallucinate. For some professions and roles, this isn't such a big
deal, and it simply means that you have to double check facts that come through these LLMs.
But for other areas, particularly think medical use cases, as a for example,
obviously hallucinations can have significant impacts.
Well, somehow, up until now, we haven't really had good information or research
around which models hallucinate more or less.
But we've just gotten a new report published in nature called fabrication and errors in the bibliographic
citations generated by chat chbt.
The abstract reads,
although chatbots such as chat chepti can facilitate cost-effective text generation and editing,
factually incorrect responses or hallucinations,
limit their utility. Now, this particular study was focused on how often these different models
saw hallucinations around the specific citation of works. And this is an area where there was a far
greater rate of hallucination than there was in general. For example, across all works,
GPT 3.5 hallucinated 55% of cited works, and even GPT4 hallucinated 18% of cited works. And this is why,
as Professor Ethan Mollock points out, it's important to understand how the hallucination rate looks
not just in general, but for specific applications. Now, going back to the overall accuracy and the
general hallucination rate, this research from Vectara has GPT4 on the top of the heap, hallucinating just
3% of the time, GPD 3.5% of the time, the Lama 2 models based on their size, hallucinate between
5.1 and 5.9% of the time, cohere's models hallucinate 7.5 and 8.5% of the time,
which is the same as Anthropics Claude 2 at 8.5%. Hot new kid on the block, Mistral 7B, has a
9.4% hallucination rate. And then Google Palm is all the way down at the bottom with 12.1% and Google
Palm chat at 27.2%. Perhaps one more reason to be very eager for their forthcoming Gemini model to
finally get out into the wild. Now, speaking of Google, the company is going on the offensive
against scammers who are trying to take advantage of hype and excitement around artificial
intelligence for their nefarious schemes. Basically, what's happening is that there are a group of
individuals and potentially companies in India and Vietnam, the subjects of the lawsuit, aren't named
in this case, who have been trying to trick small business owners into clicking Facebook ads
that say that they will download a version of Google's Bard chatbot for mobile. The problem
is that Bard is a web-based platform and isn't available for download, and so what these scammers
are actually doing is installing malware that steals social media credentials. Google's General
Council said that the lawsuit is the first such lawsuit aimed at protecting users of a major
tech companies flagship AI product. Now, while the overall size of the scam isn't clear,
Google says that they've filed over 300 takedown requests to have the ads removed.
According to Google, Facebook, and others have been generally responsive to the takedown requests,
but with so many advertisers in the self-serve platform, there can be a bit of latency there,
and of course, that still creates risk for these users. What's notable to me is the fact that
Google is availing themselves of the legal system for this. In general, the response to
these sort of scams, especially when they're international, tends to kind of just be focused on
pressuring the platform in this case meta to have better protections, and more broadly, just
helping consumers get better educated about what is and isn't a scam. Now, speaking of things in the
alphabet universe cracking down, YouTube is starting to implement policies around content that
features AI clones of musicians. As the Verge describes, YouTube really has two sets of content
guidelines when it comes to AI deepfakes. There is a looser general set of rules for the average
user, and then a very strict set of rules when it comes to protecting the platform's music
partners. Now, a blog post has just come out from the company, which shares how they are starting to think
about moderating AI-generated content. Part of it is pretty common sense. YouTube is going to require
that creators begin labeling, what they determine as realistic AI-generated content when they upload
videos, and those disclosure requirements are going to be more significant, the more socially
impactful the topic of the video is, such as elections or ongoing conflicts. Now, YouTube hasn't
described what it thinks realistic means yet, but say there's going to be more detailed guidance when
those requirements start rolling out next year. Now, one thing the Verge points out is that while YouTube
has the ability to penalize content creators that don't label their AI generated content, figuring out
if an unlabeled video was actually generated by AI could be something of a problem. YouTube says
that the platform is, quote, investing in the tools to help us detect and accurately determine if creators
have fulfilled their disclosure requirements when it comes to synthetic or altered content, but tools for
detecting AI generated content simply don't work right now. Now, on top of that, there is going to be a
moderation process whereby people can request that videos that simulate them get taken down. However,
it's not a guarantee. YouTube says that it will evaluate, quote, a variety of factors when evaluating
these requests, including whether the content is parody or satire, and whether the individual is a
public official or well-known official. Once again, the vagueness of the definition of parody and satire
could create a lot of problems when it comes to actually implementing these policies. However,
when it comes to AI-generated music content from YouTube partners, there is no exception for parody
and satire, and anything that, quote, mimics an artist unique singing or rapping voice is subject
to take down. Now, one thing that is worth noting is that there won't be any automated detection
of that, but instead there will be a manual request form that partner labels will have to fill out
when they see violations of the policy. It also seems that YouTube is going to take a light hand
when it comes to punishing the creators, especially in the early days of these policies rolling out.
Now, part of why YouTube might be so concerned with their music industry partners and the
copyright protections they're in is that their deals with those companies are,
very important for the way that they've set up their site, and particularly the way they make
music and sound available for YouTube shorts. We've also heard that Google more broadly is in
conversations with the music labels to create some sort of apparatus through which people can
legitimately and in an above-board way make new music using synthetic versions of existing
artists in a way that is approved and cuts artists in. Anyways, it will be an interesting case-stated
watch of how copyright plays out, not just in the courts, but in the business realm. Now, moving over
to the world of medicine, a couple interesting stories.
There. One is a new study from Oxford that suggests that AI analysis of cardiac CT scans could
accurately predict the risk of heart attacks even up to 10 years in the future even before
someone officially has heart disease. One of the doctors involved in the study said,
our study found that some patients presenting in hospital with chest pain, who are often reassured
and sent back home, are at high risk of having a heart attack in the next decade even in the absence
of any signs of disease in their heart arteries. Here we demonstrate that providing an accurate
picture of risk to clinicians can alter and potentially improve the course of treatment for many
heart patients. Obviously, one of the big promises of AI is better preventative care when it comes to
medicine that allows doctors to get out ahead of issues that their patients are likely to face in the future.
Now, speaking of AI in medicine, Bloomberg also writes that in the race for the first drug to be
discovered by an AI, a key milestone is soon to be reached. Bloomberg writes,
The global push to use AI to find new medicines faces a crucial test as one front rudder starts
approaching late-stage trials for a drug discovered by algorithms. In Silicom Medicine, which has
headquarters in Hong Kong and New York, used AI to develop an experimental drug for the incurable
lung disease idiopathic pulmonary fibrosis. The treatment is in mid-stage trials in the U.S.
in China, with some results expected early 2025. Now, the world is watching this one even more
closely than other drug trials, because it's the first fully AI-based pre-clinical candidate.
As Bloomberg writes, a string of other leading molecules that relied on AI have faced setbacks,
and in silicose could still fail in the process or take years to reach the market.
At the same time, the implications of any success would be huge,
opening the door for new and cheaper AI therapies that can save lives and cut costs for health systems.
Now, even as the medical world watches that closely,
moving over into markets, there are indications that Wall Street is falling in love with AI once again.
As the Wall Street Journal writes,
this year's hottest stock is regaining its momentum.
The report is about InVideo, which is traded up for nine straight sessions,
and is up about 20% over that period.
Still, the big question will be what happens next week when the company shows off its third quarter results.
Obviously, we will keep you informed about all those developments, but that is going to do it for today's AI breakdown brief.
Next up, the main AI breakdown.
And now a quick word from today's sponsor.
I am a huge notion user.
We're talking multiple accounts for multiple projects.
I use it for everything from applicant tracking to note taking to project management, to sharing public documents, to frantically capturing ideas I have.
while out hiking or just driving around. Given that and given the topic of the AI breakdown,
I was excited to learn that they've launched a new AI tool called Q&A. It's like a personal assistant
that responds in seconds with exactly what you need. Notion AI can give you instant answers to your
questions using information from across your wiki, projects, docs, and meeting notes. For someone
like me who makes dozens of notes per day around a huge array of topics, having a built-in AI
tool to help recall that is incredibly useful. Now beyond that use case, think about this. Have an urgent question
you normally turn to a coworker to answer, just ask Q&A instead. It'll search through thousands of documents
and seconds and answer your question in clear language no matter how larger complex your workspace is.
Plus, you can trust your data is secure because Notion AI is designed to protect your information.
No AI models are trained with your information, the data is encrypted, and answers will never
use information from pages you don't have access to. With Notion AI, it's even easier to do your most
meaningful work. Try Notion AI for free when you go to Notion.com slash AI breakdown.
That's all lowercase letters, notion.com slash AI breakdown, to try the powerful, easy-to-use
notion AI today. And when you use our link, you're supporting the show. One more time,
that's notion.com slash AI breakdown.
Welcome back to the AI breakdown. When it comes to the geopolitics of artificial intelligence,
there is quite obviously no more significant relationship than that between the U.S. and China.
Now, we have had numerous contexts where this has been on display over the course of the last few months.
One is, of course, everything around the UK Safety Summit.
Rishi Sunox's government made the controversial decision to have China participate in that AI safety summit,
in spite of the fact that they were dealing with an active Chinese spying scandal,
on the logic that if the world is really concerned,
with mitigating the biggest risks of runaway artificial intelligence,
it needs the participation of everyone, not just some people.
Now, at that event, there was a declaration around AI's potential for catastrophic danger that was signed by both the U.S. and China, among other signatories, but it wasn't really about anything more than acknowledging the risk. The so-called Bletchley Declaration was intended to be the first time that the governments of the world got together to collectively agree that, as the declaration puts it, there is potential for serious, even catastrophic harm, either deliberate or unintentional, stemming from the most significant capabilities of these AI models.
said UK technology secretary Michelle Donnellan.
For the first time, we now have countries agreeing that we need to look not just independently
but collectively at the risks around frontier AI.
Now, of course, also happening recently is that the U.S. has been tightening its export controls
when it comes to AI chips.
The Biden administration first put some rules into practice last year around this time,
and this latest set of rules coming through the Commerce Department were effectively meant
to close loopholes that had been identified over the course of the last year.
This involved things like tighter restrictions even on lower-powered chips, as well as an inclusion of
foreign subsidiaries that were owned by Chinese companies as part of the firms who were prohibited
from getting access to these technologies.
Meanwhile, as all of this has been going on, for anyone paying close attention, there has been
a steady drumbeat of announcements and news and reports around both China and the US developing
further AI capabilities when it comes to military power.
Take, for example, this piece from Fox News on October 17th, China, U.S.,
race to unleash killer AI robot soldiers as military power hangs in balance.
AI technology is the new arms race pitting the world's power against each other, experts
agree. Now, the details aren't super salient to the discussion that we're having today,
but suffice it to say that while everyone is talking metaphorically about an AI arms race
in the context of frontier models, there is an actual AI arms race happening between the
world's biggest military powers. Now, the US quite clearly is thinking about the military
implications of artificial intelligence, not just in terms of a blank slate that they can do whatever
they want with, but as something that needs to be managed on a global stage. This week, they
released the Political Declaration of Responsible Military Use of Artificial Intelligence and
autonomy. This declaration was signed by 45 endorsing states and contains 10 what they call
concrete measures to guide the responsible development and use of military applications of AI and
autonomy. So what did the actual declarations say? Well, this is from the latest version that has
been published on the State Department website, which comes from November 1st. It reads,
An increasing number of states are developing military AI capabilities, which may include using
AI to enable autonomous functions and systems. Military use of AI can and should be ethical,
responsible, and enhance international security. Military use of AI must be in compliance with
applicable international law, and in particular, use of AI in armed conflict must be in accord
with states' obligations under international humanitarian law. So then what the endorsing states agreed to
were things like the idea that military organizations should take appropriate steps to review their AI capabilities as relates to international and humanitarian law,
that states should have systems for effective oversight of the development and deployment of military AI capabilities,
that they should take proactive steps to minimize unintended bias, that they should ensure that the development of these technologies is done in a transparent and auditable way,
that the personnel who approve and use this technology should be appropriately trained,
that capability should have explicit and well-defined uses, and that states should implement
appropriate safeguards to mitigate risks of failures. Now, you see, these are very kind of
common-sense declarations, and they don't really limit what states can or can't develop. There's
nothing here that says, for example, you're not allowed to use AI in such-as-such-such-a-way as
as it comes to military applications outside of already established norms of international humanitarian
law. Ambassador Bonnie Denise Jenkins made statements around the launch event for the declaration
at the UN in New York saying,
we cannot predict how AI technologies will evolve
or what they might be capable of in a year or five years.
However, we know that there are steps states can take now
to put in place the necessary policies
and to build the technical capacities
to enable responsible development and use
no matter the technological advancements.
We need, therefore, to come together
as an international community
around a set of strong norms for responsible development and deployment,
norms that will enable nations to harness
the potential benefits of AI systems in the military domain,
while encouraging steps that avoid irresponsible destabilize,
and reckless behavior. Now, Jenkins' speech also noted that this was in many ways a foundation
for deeper conversations. In that same speech, she said, it provides a basis for a much more
concrete dialogue on what responsible means in practice. What does an effective testing and assurance
process look like? How do you exercise appropriate care in a range of practical applications?
She also said, we envision this collaboration among endorsers to be far more robust than simply
committing to high-level principles. The declaration is a foundation for collaboration and exchanges,
such as sharing best practices, expert-level exchanges, and capacity-building activities.
Finally, she said that the broad terms used in the declaration was specific, and that the U.S.
isn't trying to unilaterally decide for countries how to apply these principles, but to simply
provide a starting point in a shared space of international agreement.
Now, like I said, this declaration was signed by 45 countries, but the most notable absence
was, of course, China, said Sam Bresnick of Georgetown, it wasn't really surprising that
Beijing declined to endorse this declaration.
Bresnik said, although Beijing likely supports many of the declaration's proposals, it is not
enthusiastic about signing on to a U.S.-led effort on responsible military AI.
Instead, he said, quote, China seems more interested now in engaging in multilateral discussions
surrounding the responsible development and use of AI, while unlikely to agree to binding
agreements that might limit its ability to develop and field AI-enabled military systems.
And indeed, that seems to be echoed in the fact that reports are that President Biden and President
Xi will sign a deal this week, focus on specific issues around the
the use of AI in military applications, most notably, questions of keeping AI out of the control systems
for nuclear weapons. At Wednesday in San Francisco at the Asia-Pacific Economic Cooperation Summit,
U.S. President Joe Biden and U.S. President Xi Jinping are set to meet. Two sources familiar with
the planned discussions say among the top items on the agenda is the proliferation of AI and military
technologies. According to reports, the leaders will pledge a deal that limits the use of AI in autonomous
weapons such as drones, as well as in the systems that control and deploy nuclear warheads.
Indeed, one of the things that's interesting about this is that it appears that while AI is such a
wedge issue in so many other contexts, here these presidents are using common ground around AI
as part of an attempt to reduce tensions. Now, whether China agrees or not, it appears that the
position of this U.S. administration is that AI should not be involved in the deployment of nuclear
weapons. Secretary of State Anthony Blinken was asked last week about this and said,
I can't get into the specific issues that they would discuss in any such meeting about
President Xi and President Biden, but, quote, I can say as a general president.
for us, that when it comes to artificial intelligence, we believe AI should not be in the loop
or making the decisions about how and when a nuclear weapon is used. Now, I kind of saw two
different categories of reactions to this. I didn't see anyone who is negative on it, but on the one
hand, you have people like Max Tegmark who wrote, I'm delighted to hear that the U.S. and China
plan to agree on not empowering AI to launch nukes. On the other hand, you had folks like
Matthew Pines who said, I know the bar is low, but a U.S.-China agreement to avoid automating
nuclear command and control systems is like the bare minimum of what functioning human beings
interested in collective survival should agree to. I don't think that there's actually necessarily
any sort of mutual exclusiveness between these two positions. Like Tagmark, I am very excited that this
seems to be an agreement that the U.S. and China can make. And like Matthew, I feel like this is a low
bar that we can all rally around. What I will say is that especially in as tense an environment as we
have between the U.S. and China right now, getting to small agreements even on issues that seem
like they should be incredibly obvious, is a really important part of the diplomatic process.
Getting to alignment between two nations, especially nations that compete as intensely as the
U.S. and China do, where there is as much mutual suspicion as there is between these two parties,
requires laying slow foundations of small alignments on top of one another, which have the potential
to become strong enough to handle future breaks and cracks in that relationship that come along.
History is littered with examples of countries that were extremely antagonistic or even outright at war,
who could still come to agreement on certain invalienable principles,
much to the good of the survival of the world.
So perhaps, yes, this is not something to fist-pump about in excitement,
but it's not nothing either.
Thanks for listening or watching as always, and until next time, peace.
