The AI Daily Brief: Artificial Intelligence News and Analysis - Which LLMs Hallucinate Least?

Starting point is 00:00:00 Today on the AI breakdown, the U.S. and China are set to agree not to use AI in the control of nuclear weapon systems. Before that, in the brief, which LLMs hallucinate least. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown Not Network for more information about our YouTube channel, our Discord, and our newsletter. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. One of the things that we have been talking about quite a bit here on the AI breakdown. breakdown is the fact that we are moving from a period that was largely characterized by experimentation and sort of first blush efforts to experiment with AI to a world in which

Starting point is 00:00:45 organizations and professionals are increasingly integrating generative AI tools into their actual professional workflows. Now, in that, one of the big barriers is, of course, the fact that AI models still hallucinate. For some professions and roles, this isn't such a big deal, and it simply means that you have to double check facts that come through these LLMs. But for other areas, particularly think medical use cases, as a for example, obviously hallucinations can have significant impacts. Well, somehow, up until now, we haven't really had good information or research around which models hallucinate more or less.

Starting point is 00:01:18 But we've just gotten a new report published in nature called fabrication and errors in the bibliographic citations generated by chat chbt. The abstract reads, although chatbots such as chat chepti can facilitate cost-effective text generation and editing, factually incorrect responses or hallucinations, limit their utility. Now, this particular study was focused on how often these different models saw hallucinations around the specific citation of works. And this is an area where there was a far greater rate of hallucination than there was in general. For example, across all works,

Starting point is 00:01:48 GPT 3.5 hallucinated 55% of cited works, and even GPT4 hallucinated 18% of cited works. And this is why, as Professor Ethan Mollock points out, it's important to understand how the hallucination rate looks not just in general, but for specific applications. Now, going back to the overall accuracy and the general hallucination rate, this research from Vectara has GPT4 on the top of the heap, hallucinating just 3% of the time, GPD 3.5% of the time, the Lama 2 models based on their size, hallucinate between 5.1 and 5.9% of the time, cohere's models hallucinate 7.5 and 8.5% of the time, which is the same as Anthropics Claude 2 at 8.5%. Hot new kid on the block, Mistral 7B, has a 9.4% hallucination rate. And then Google Palm is all the way down at the bottom with 12.1% and Google

Starting point is 00:02:37 Palm chat at 27.2%. Perhaps one more reason to be very eager for their forthcoming Gemini model to finally get out into the wild. Now, speaking of Google, the company is going on the offensive against scammers who are trying to take advantage of hype and excitement around artificial intelligence for their nefarious schemes. Basically, what's happening is that there are a group of individuals and potentially companies in India and Vietnam, the subjects of the lawsuit, aren't named in this case, who have been trying to trick small business owners into clicking Facebook ads that say that they will download a version of Google's Bard chatbot for mobile. The problem is that Bard is a web-based platform and isn't available for download, and so what these scammers

Starting point is 00:03:16 are actually doing is installing malware that steals social media credentials. Google's General Council said that the lawsuit is the first such lawsuit aimed at protecting users of a major tech companies flagship AI product. Now, while the overall size of the scam isn't clear, Google says that they've filed over 300 takedown requests to have the ads removed. According to Google, Facebook, and others have been generally responsive to the takedown requests, but with so many advertisers in the self-serve platform, there can be a bit of latency there, and of course, that still creates risk for these users. What's notable to me is the fact that Google is availing themselves of the legal system for this. In general, the response to

Starting point is 00:03:51 these sort of scams, especially when they're international, tends to kind of just be focused on pressuring the platform in this case meta to have better protections, and more broadly, just helping consumers get better educated about what is and isn't a scam. Now, speaking of things in the alphabet universe cracking down, YouTube is starting to implement policies around content that features AI clones of musicians. As the Verge describes, YouTube really has two sets of content guidelines when it comes to AI deepfakes. There is a looser general set of rules for the average user, and then a very strict set of rules when it comes to protecting the platform's music partners. Now, a blog post has just come out from the company, which shares how they are starting to think

Starting point is 00:04:27 about moderating AI-generated content. Part of it is pretty common sense. YouTube is going to require that creators begin labeling, what they determine as realistic AI-generated content when they upload videos, and those disclosure requirements are going to be more significant, the more socially impactful the topic of the video is, such as elections or ongoing conflicts. Now, YouTube hasn't described what it thinks realistic means yet, but say there's going to be more detailed guidance when those requirements start rolling out next year. Now, one thing the Verge points out is that while YouTube has the ability to penalize content creators that don't label their AI generated content, figuring out if an unlabeled video was actually generated by AI could be something of a problem. YouTube says

Starting point is 00:05:04 that the platform is, quote, investing in the tools to help us detect and accurately determine if creators have fulfilled their disclosure requirements when it comes to synthetic or altered content, but tools for detecting AI generated content simply don't work right now. Now, on top of that, there is going to be a moderation process whereby people can request that videos that simulate them get taken down. However, it's not a guarantee. YouTube says that it will evaluate, quote, a variety of factors when evaluating these requests, including whether the content is parody or satire, and whether the individual is a public official or well-known official. Once again, the vagueness of the definition of parody and satire could create a lot of problems when it comes to actually implementing these policies. However,

Starting point is 00:05:41 when it comes to AI-generated music content from YouTube partners, there is no exception for parody and satire, and anything that, quote, mimics an artist unique singing or rapping voice is subject to take down. Now, one thing that is worth noting is that there won't be any automated detection of that, but instead there will be a manual request form that partner labels will have to fill out when they see violations of the policy. It also seems that YouTube is going to take a light hand when it comes to punishing the creators, especially in the early days of these policies rolling out. Now, part of why YouTube might be so concerned with their music industry partners and the copyright protections they're in is that their deals with those companies are,

Starting point is 00:06:16 very important for the way that they've set up their site, and particularly the way they make music and sound available for YouTube shorts. We've also heard that Google more broadly is in conversations with the music labels to create some sort of apparatus through which people can legitimately and in an above-board way make new music using synthetic versions of existing artists in a way that is approved and cuts artists in. Anyways, it will be an interesting case-stated watch of how copyright plays out, not just in the courts, but in the business realm. Now, moving over to the world of medicine, a couple interesting stories. There. One is a new study from Oxford that suggests that AI analysis of cardiac CT scans could

Starting point is 00:06:52 accurately predict the risk of heart attacks even up to 10 years in the future even before someone officially has heart disease. One of the doctors involved in the study said, our study found that some patients presenting in hospital with chest pain, who are often reassured and sent back home, are at high risk of having a heart attack in the next decade even in the absence of any signs of disease in their heart arteries. Here we demonstrate that providing an accurate picture of risk to clinicians can alter and potentially improve the course of treatment for many heart patients. Obviously, one of the big promises of AI is better preventative care when it comes to medicine that allows doctors to get out ahead of issues that their patients are likely to face in the future.

Starting point is 00:07:28 Now, speaking of AI in medicine, Bloomberg also writes that in the race for the first drug to be discovered by an AI, a key milestone is soon to be reached. Bloomberg writes, The global push to use AI to find new medicines faces a crucial test as one front rudder starts approaching late-stage trials for a drug discovered by algorithms. In Silicom Medicine, which has headquarters in Hong Kong and New York, used AI to develop an experimental drug for the incurable lung disease idiopathic pulmonary fibrosis. The treatment is in mid-stage trials in the U.S. in China, with some results expected early 2025. Now, the world is watching this one even more closely than other drug trials, because it's the first fully AI-based pre-clinical candidate.

Starting point is 00:08:05 As Bloomberg writes, a string of other leading molecules that relied on AI have faced setbacks, and in silicose could still fail in the process or take years to reach the market. At the same time, the implications of any success would be huge, opening the door for new and cheaper AI therapies that can save lives and cut costs for health systems. Now, even as the medical world watches that closely, moving over into markets, there are indications that Wall Street is falling in love with AI once again. As the Wall Street Journal writes, this year's hottest stock is regaining its momentum.

Starting point is 00:08:33 The report is about InVideo, which is traded up for nine straight sessions, and is up about 20% over that period. Still, the big question will be what happens next week when the company shows off its third quarter results. Obviously, we will keep you informed about all those developments, but that is going to do it for today's AI breakdown brief. Next up, the main AI breakdown. And now a quick word from today's sponsor. I am a huge notion user. We're talking multiple accounts for multiple projects.

Starting point is 00:09:02 I use it for everything from applicant tracking to note taking to project management, to sharing public documents, to frantically capturing ideas I have. while out hiking or just driving around. Given that and given the topic of the AI breakdown, I was excited to learn that they've launched a new AI tool called Q&A. It's like a personal assistant that responds in seconds with exactly what you need. Notion AI can give you instant answers to your questions using information from across your wiki, projects, docs, and meeting notes. For someone like me who makes dozens of notes per day around a huge array of topics, having a built-in AI tool to help recall that is incredibly useful. Now beyond that use case, think about this. Have an urgent question you normally turn to a coworker to answer, just ask Q&A instead. It'll search through thousands of documents

Starting point is 00:09:43 and seconds and answer your question in clear language no matter how larger complex your workspace is. Plus, you can trust your data is secure because Notion AI is designed to protect your information. No AI models are trained with your information, the data is encrypted, and answers will never use information from pages you don't have access to. With Notion AI, it's even easier to do your most meaningful work. Try Notion AI for free when you go to Notion.com slash AI breakdown. That's all lowercase letters, notion.com slash AI breakdown, to try the powerful, easy-to-use notion AI today. And when you use our link, you're supporting the show. One more time, that's notion.com slash AI breakdown.

Starting point is 00:10:22 Welcome back to the AI breakdown. When it comes to the geopolitics of artificial intelligence, there is quite obviously no more significant relationship than that between the U.S. and China. Now, we have had numerous contexts where this has been on display over the course of the last few months. One is, of course, everything around the UK Safety Summit. Rishi Sunox's government made the controversial decision to have China participate in that AI safety summit, in spite of the fact that they were dealing with an active Chinese spying scandal, on the logic that if the world is really concerned, with mitigating the biggest risks of runaway artificial intelligence,

Starting point is 00:10:59 it needs the participation of everyone, not just some people. Now, at that event, there was a declaration around AI's potential for catastrophic danger that was signed by both the U.S. and China, among other signatories, but it wasn't really about anything more than acknowledging the risk. The so-called Bletchley Declaration was intended to be the first time that the governments of the world got together to collectively agree that, as the declaration puts it, there is potential for serious, even catastrophic harm, either deliberate or unintentional, stemming from the most significant capabilities of these AI models. said UK technology secretary Michelle Donnellan. For the first time, we now have countries agreeing that we need to look not just independently but collectively at the risks around frontier AI. Now, of course, also happening recently is that the U.S. has been tightening its export controls when it comes to AI chips. The Biden administration first put some rules into practice last year around this time,

Starting point is 00:11:50 and this latest set of rules coming through the Commerce Department were effectively meant to close loopholes that had been identified over the course of the last year. This involved things like tighter restrictions even on lower-powered chips, as well as an inclusion of foreign subsidiaries that were owned by Chinese companies as part of the firms who were prohibited from getting access to these technologies. Meanwhile, as all of this has been going on, for anyone paying close attention, there has been a steady drumbeat of announcements and news and reports around both China and the US developing further AI capabilities when it comes to military power.

Starting point is 00:12:23 Take, for example, this piece from Fox News on October 17th, China, U.S., race to unleash killer AI robot soldiers as military power hangs in balance. AI technology is the new arms race pitting the world's power against each other, experts agree. Now, the details aren't super salient to the discussion that we're having today, but suffice it to say that while everyone is talking metaphorically about an AI arms race in the context of frontier models, there is an actual AI arms race happening between the world's biggest military powers. Now, the US quite clearly is thinking about the military implications of artificial intelligence, not just in terms of a blank slate that they can do whatever

Starting point is 00:12:59 they want with, but as something that needs to be managed on a global stage. This week, they released the Political Declaration of Responsible Military Use of Artificial Intelligence and autonomy. This declaration was signed by 45 endorsing states and contains 10 what they call concrete measures to guide the responsible development and use of military applications of AI and autonomy. So what did the actual declarations say? Well, this is from the latest version that has been published on the State Department website, which comes from November 1st. It reads, An increasing number of states are developing military AI capabilities, which may include using AI to enable autonomous functions and systems. Military use of AI can and should be ethical,

Starting point is 00:13:36 responsible, and enhance international security. Military use of AI must be in compliance with applicable international law, and in particular, use of AI in armed conflict must be in accord with states' obligations under international humanitarian law. So then what the endorsing states agreed to were things like the idea that military organizations should take appropriate steps to review their AI capabilities as relates to international and humanitarian law, that states should have systems for effective oversight of the development and deployment of military AI capabilities, that they should take proactive steps to minimize unintended bias, that they should ensure that the development of these technologies is done in a transparent and auditable way, that the personnel who approve and use this technology should be appropriately trained, that capability should have explicit and well-defined uses, and that states should implement

Starting point is 00:14:20 appropriate safeguards to mitigate risks of failures. Now, you see, these are very kind of common-sense declarations, and they don't really limit what states can or can't develop. There's nothing here that says, for example, you're not allowed to use AI in such-as-such-such-a-way as as it comes to military applications outside of already established norms of international humanitarian law. Ambassador Bonnie Denise Jenkins made statements around the launch event for the declaration at the UN in New York saying, we cannot predict how AI technologies will evolve or what they might be capable of in a year or five years.

Starting point is 00:14:51 However, we know that there are steps states can take now to put in place the necessary policies and to build the technical capacities to enable responsible development and use no matter the technological advancements. We need, therefore, to come together as an international community around a set of strong norms for responsible development and deployment,

Starting point is 00:15:07 norms that will enable nations to harness the potential benefits of AI systems in the military domain, while encouraging steps that avoid irresponsible destabilize, and reckless behavior. Now, Jenkins' speech also noted that this was in many ways a foundation for deeper conversations. In that same speech, she said, it provides a basis for a much more concrete dialogue on what responsible means in practice. What does an effective testing and assurance process look like? How do you exercise appropriate care in a range of practical applications? She also said, we envision this collaboration among endorsers to be far more robust than simply

Starting point is 00:15:37 committing to high-level principles. The declaration is a foundation for collaboration and exchanges, such as sharing best practices, expert-level exchanges, and capacity-building activities. Finally, she said that the broad terms used in the declaration was specific, and that the U.S. isn't trying to unilaterally decide for countries how to apply these principles, but to simply provide a starting point in a shared space of international agreement. Now, like I said, this declaration was signed by 45 countries, but the most notable absence was, of course, China, said Sam Bresnick of Georgetown, it wasn't really surprising that Beijing declined to endorse this declaration.

Starting point is 00:16:11 Bresnik said, although Beijing likely supports many of the declaration's proposals, it is not enthusiastic about signing on to a U.S.-led effort on responsible military AI. Instead, he said, quote, China seems more interested now in engaging in multilateral discussions surrounding the responsible development and use of AI, while unlikely to agree to binding agreements that might limit its ability to develop and field AI-enabled military systems. And indeed, that seems to be echoed in the fact that reports are that President Biden and President Xi will sign a deal this week, focus on specific issues around the the use of AI in military applications, most notably, questions of keeping AI out of the control systems

Starting point is 00:16:46 for nuclear weapons. At Wednesday in San Francisco at the Asia-Pacific Economic Cooperation Summit, U.S. President Joe Biden and U.S. President Xi Jinping are set to meet. Two sources familiar with the planned discussions say among the top items on the agenda is the proliferation of AI and military technologies. According to reports, the leaders will pledge a deal that limits the use of AI in autonomous weapons such as drones, as well as in the systems that control and deploy nuclear warheads. Indeed, one of the things that's interesting about this is that it appears that while AI is such a wedge issue in so many other contexts, here these presidents are using common ground around AI as part of an attempt to reduce tensions. Now, whether China agrees or not, it appears that the

Starting point is 00:17:24 position of this U.S. administration is that AI should not be involved in the deployment of nuclear weapons. Secretary of State Anthony Blinken was asked last week about this and said, I can't get into the specific issues that they would discuss in any such meeting about President Xi and President Biden, but, quote, I can say as a general president. for us, that when it comes to artificial intelligence, we believe AI should not be in the loop or making the decisions about how and when a nuclear weapon is used. Now, I kind of saw two different categories of reactions to this. I didn't see anyone who is negative on it, but on the one hand, you have people like Max Tegmark who wrote, I'm delighted to hear that the U.S. and China

Starting point is 00:17:58 plan to agree on not empowering AI to launch nukes. On the other hand, you had folks like Matthew Pines who said, I know the bar is low, but a U.S.-China agreement to avoid automating nuclear command and control systems is like the bare minimum of what functioning human beings interested in collective survival should agree to. I don't think that there's actually necessarily any sort of mutual exclusiveness between these two positions. Like Tagmark, I am very excited that this seems to be an agreement that the U.S. and China can make. And like Matthew, I feel like this is a low bar that we can all rally around. What I will say is that especially in as tense an environment as we have between the U.S. and China right now, getting to small agreements even on issues that seem

Starting point is 00:18:36 like they should be incredibly obvious, is a really important part of the diplomatic process. Getting to alignment between two nations, especially nations that compete as intensely as the U.S. and China do, where there is as much mutual suspicion as there is between these two parties, requires laying slow foundations of small alignments on top of one another, which have the potential to become strong enough to handle future breaks and cracks in that relationship that come along. History is littered with examples of countries that were extremely antagonistic or even outright at war, who could still come to agreement on certain invalienable principles, much to the good of the survival of the world.

Starting point is 00:19:11 So perhaps, yes, this is not something to fist-pump about in excitement, but it's not nothing either. Thanks for listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Which LLMs Hallucinate Least?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.