The AI Daily Brief: Artificial Intelligence News and Analysis - Anthropic Begins to Unlock the Mystery of LLMs
Episode Date: May 24, 2024Anthropic’s new research brings us closer to understanding the inner workings of LLMs. By identifying and manipulating patterns within their AI model, Claude 3, Anthropic sheds light on the internal... mechanics of LLMs, offering potential solutions to bias, safety, and autonomy issues. Dive into the latest breakthroughs in AI interpretability and their implications for the future of artificial intelligence. ** Check out the hit podcast from HBS Managing the Future of Work https://www.hbs.edu/managing-the-future-of-work/podcast/Pages/default.aspx Join Superintelligent at https://besuper.ai/ -- Practical, useful, hands on AI education through tutorials and step-by-step how-tos. Use code podcast for 50% off your first month! Check out https://useplumb.com/ to build complex AI pipelines simply. ** ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://aidailybrief.beehiiv.com/ Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@AIDailyBrief Join the community: bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, Anthropic makes a major breakthrough in interpretability.
Before that, in the headlines, Invidia just continues to smash expectations.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, check out the Discord linked in our show notes.
Welcome back to the AI Daily Brief Headlines edition, all the AI headline news you need in around five minutes.
We kick off today with an earnings report from Nvidia, and surprise, surprise, it just
keeps getting bigger. Wall Street Journal writes,
NVIDIA delivered a record quarter and signal that the AI boom is still going strong,
driving its already meteoric stock up above $1,000 a share. Revenue last quarter more than
tripled year over year to $26 billion, and net profit was up $7.6 to $14.88 billion.
Both were quarterly records and both beat analyst expectations.
The stock market obviously loved this, with share prices up 6% in pre-market trading following
the report, and a single share surpassing $1,000.
One of the things that's also really interesting, though, is that while currently the big cloud
companies like Google, Microsoft, and Amazon account for around 45% of Nvidia's data center
revenue, they're clearly trying to move to a world where they're not just selling to data
centers, but also selling directly to companies. At Dell's big annual event this week,
for example, Nvidia and Dell talked about how they were trying to create the AI factories
of the future, where individual companies had more direct access to this sort of capacity.
CEO Jensen Huang said we are poised for our next wave of growth.
From Bloomberg, NVIDIA emphasized Wednesday that it wants to sell its technology to a wider market,
expanding beyond the giant cloud computing providers known as hyperscalers.
Huang said that AI is moving to consumer internet companies, carmakers, biotechnology,
and health care customers.
The large-scale deployment of NVIDIA chips by Elon Musk's Tesla is one sign of that expansion.
It continues to be a battle for NVIDIA to keep up with demand.
Said Huang, nobody has ever manufactured supercomputers at volume.
We're doing the best we can.
The one other interesting note, as called out by the verge, was that NVIDIA will now make
new AI chips every single year. Said Huang, I can announce that after Blackwell, there's another chip.
We're on a one-year rhythm. The verge points out that until now, Nvidia produced a new architecture
roughly once every two years, Ampier in 2020, Hopper in 2022, Blackwell in 2024, but that everything
is getting faster now. Next up in the headlines, you'll remember that recently Microsoft
inked deal with UAE-based G-42. Part of why I was so interested in the deal is that it seemed to
have been facilitated by the Department of Commerce and reflected geopolitics as much as business
considerations. Basically, G-42 had been right at the center of the U.S. China tension, and the U.S.
had been putting a ton of pressure on it to pick aside. Well, pick-aside it did, and Microsoft's
minority investment of $1.5 billion was part of that picking. Still, it's not without complications.
Wright's Reuters, Microsoft President Brad Smith said the tech company's high-profile deal with
the UAE-backed AI firm G-42 could eventually involve the transfer of sophisticated chips and tools,
a move that a senior Republican congressman warned could have national security implications, said Michael
call the Republican chairman of the Foreign Affairs Committee. Despite the significant national security
implications, Congress still has not received a comprehensive briefing from the executive branch about
this agreement. I am concerned the right guardrails are not in place to protect sensitive U.S. origin
technology from Chinese espionage given the CCP's interest in the UAE. To me, this sounds a little bit
like, if this really was coming from the Department of Commerce that was obviously a White House
facilitated deal, that Congress just doesn't have visibility into. Anyway, there's tons more
details here, but what's interesting to me continues to be just the geopolitical implications of AI and how
quickly they've become an ongoing concern. In the world of M&A, I'm wondering if we're not about
to see a bit of a wave of consolidation in the AI space. We've had a couple boom years of funding,
and now we're very naturally in the phase where companies are figuring out if there's enough
of it there there to raise a next round, or if it makes sense to try to join up with someone bigger.
One company seemingly going through that decision-making process is adept, which was valued by
investors at more than a billion last year, and which the information reports has held talks recently
around a possible sale or strategic partnership with large tech companies, most notably meta.
Adept is in the much vaunted AI agent space, and of course that means they're dealing with
not only intense competition from every angle in every direction, but also the fact that
AI agents are still at this point highly theoretical. There is an existing consumer demand to tap
into. It's a new behavior in a new category that's being invented on the fly. When it comes to
something that challenging, they may decide that it makes sense to do that from within one of the
big giants that has the capital to actually pursue it to its full ends.
Speaking of big companies, in what will be a surprise to no one, Amazon is apparently planning to give Alexa an AI upgrade, as well as a monthly subscription fee.
Notably, this will not be included in Amazon Prime subscriptions.
Right CNBC, Amazon will launch a more conversational version of Alexa later this year, potentially positioning it to better compete with new generative AI powered chatbots from companies including Google and OpenAI.
Will the end of the year see us watching a souped up Alexa compete with a souped up Siri, both of them competing against some brand new OpenAI product?
Kind of seems like it.
Speaking of an OpenAI product,
the Washington Post reports that OpenAI didn't actually copy Scarlett Johansson's voice.
They write,
When OpenAI issued a casting call last May for a secret project to endow open AI's popular
chat GPT with a human voice, the flyer had several requests.
The actors should be non-union.
They should sound between 25 and 45 years old,
and their voices should be warm, engaging and charismatic.
One thing the AI company didn't request,
according to interviews with multiple people involved in the process
and documents shared by OpenAI in response to questions from the Washington
Post, a clone of actress Scarlett Johansson. This of course gets to the conversation that's been
happening where Scarlett Johansson released a statement expressing concern that she had been asked
by OpenAI to use her voice. And when she said no, there ended up being a voice that kind of sounded
like her. As I mentioned in a previous episode, I think there are multiple things going on here.
There is the legal side of this, which at least this reporting from the Washington Post suggests
that there might not be a there there. There's also just a broader question of the look and the
public's trust or lack thereof in Sam Altman and OpenAI. For now, though, that is going to
to do it for today's headlines. Stay tuned for the main episode. Today's podcast is brought to you by
Plum. Are you a lean product team trying to rapidly develop and deploy AI features that deliver
real value to your users? Plum empowers you to build complex AI pipelines, transform data, and leverage
validated JSON schema to create reliable high-quality AI features, accessible as API endpoints,
all in an intuitive low-code interface. Go from Idea to MVP in hours, not days. Get your AI-powered
product in front of customers as soon as possible with Plum.
Check out useplum.com, that's Plum with a B, for early access to the future of AI app development.
Hello, friends, before we get back to the episode, I want to tell you about something special I'm doing on Superintelligent this June.
Super is, of course, our platform for AI learning, and I've heard from a lot of you that you really want something for a true AI beginner, someone who's really just getting their feet wet with these tools.
So what I'm going to do is put together basically a course that sits on top of and uses super intelligent tutorials and lessons, but where I hand guide you through around,
10 different lessons and how-toes that I think once you complete them will have you ahead of
80% of the other people who are just starting to use AI right now. If you are interested in this learning
experience, go to B-super.a-I and sign up using code June. You'll get 25% off your first month,
and I'll automatically add you to that AI for beginners group. That's B-super.a.i, discount code
June. See you there. Welcome back to the AI Daily Brief. One of the remarkable things about
LLMs, this technology that has taken the world by storm, that is changing how people work, how
people think about work, that is generating entirely new categories of interactions with
computers that has some people thinking that Terminator is going to become real, is that we genuinely
don't understand exactly how they work. They just sort of seem to. Indeed, this is part of the
reason for some researchers having concerns about the future state of these technologies. To wit,
if we don't understand how they work now, how do we think we're going to control them as they get
more powerful. While new research from Anthropic may be shedding some light that will help us with
that sort of understanding. The New York Times summed this up in a piece called AI's black boxes just got a
little less mysterious. Kevin Roos writes, one of the weird or more unnerving things about today's
leading AI systems is that nobody, not even the people who build them, really know how the systems
work. That's because LLMs are not programmed line by line by human engineers as conventional computer
programs are. Instead, these systems essentially learn on their own by ingesting vast amounts of data
and identifying patterns and relationships in language,
then using that knowledge to predict the next word in a sequence.
Again, this is one of the great dividing lines in terms of how people think about AI risk.
To some, this lack of understanding is precisely a cause for concern,
while for others, perhaps most notably, or at least most loudly,
Jan Lacoon from meta, the current approach to LLMs that are just predicting the next word in a sequence
are in his mind simply incapable of the types of things that some folks are worried about.
Holding aside any of the big long-term existential risk things, there are challenges of our lack of
understanding right now.
The examples of the New York Times points out, right now if a user types which American
city has the best food and a chatbot responds Tokyo, there's no way of understanding why
the model made that error, or why the next person who asks may receive a different answer.
So if you are a company building a chatbot trying to make it better, it's very hard to
improve things in any sort of linear or controllable way.
Of course, there is also the alignment side of this problem, as the New York Times,
Kevin Ruse writes, when LLM's do misbehavior go off the rails, nobody can really explain why.
From there, the Times talks about the field of research that is trying to figure out how
these models work, which is called mechanistic interpretability.
Ruse characterizes the work as slow going with progress being incremental.
This week, however, Anthropic announced what they're calling a major breakthrough, and
here's how Ruse sums it up.
The researchers looked inside one of Anthropics AI models, Claude 3 Sonnet, and used a technique
known as dictionary learning to uncover patterns and how combinations of neurons, the mathematical
units inside the AI model were activated when Claude was prompted to talk about certain topics.
They identified roughly 10 million of these patterns which they call features.
This research actually started previously. Anthropic in their announcement post writes,
In October 2023, we reported success applying dictionary learning to a very small toy language model
and found coherent features corresponding to concepts like uppercase text, DNA sequences,
surnames and citations, nouns and mathematics, or function arguments in Python code.
Now, however, they say, we've successfully extracted millions of things.
millions of features from the middle layer of Quad 3 Sonnet, providing a rough conceptual map
of its internal states halfway through its computation. Whereas the features we found in the toy
language model were rather superficial, the features we found in Sonnet have a depth, breadth, and abstraction
reflecting Sonnet's advanced capabilities. We see features corresponding to a vast range of entities
like cities, San Francisco, atomic elements like lithium, scientific fields, immunology, and programming
syntax like function calls. These features are multimodal and multilingual responding to images
of a given entity as well as its name or description in many languages.
At this point in the piece, they show the Golden Gate Bridge feature,
which activates around images of the Golden Gate Bridge
or around text containing the Golden Gate Bridge.
Anthropic goes on,
we were able to measure a kind of quote-unquote distance between features
based on what neurons appeared in their activation patterns.
This allowed us to look for features that are quote-unquote close to each other.
Looking near a Golden Gate Bridge feature,
we found features for Alcatraz Island,
Jurydally Square, the Golden State Warriors,
California Governor Gavin Newsom, the 1906 earthquake, and the San Francisco set Alfred Hitchcock film Vertigo.
They continue this hold at a higher level of conceptual abstraction.
Looking near a feature related to the concept of inner conflict, we find features related
to relationship breakups, conflicting allegiances, logical inconsistencies, as well as the phrase
catch-22.
This shows that the internal organization of concepts in the AI model corresponds at least somewhat
to our human notions of similarity.
Importantly, says Anthropic, they're not just able to identify these features but to manipulate
them. Quote, artificially amplifying or suppressing them to see how Claude's response changes.
Holding again with the example of the Golden Gate Bridge, they said when initially asked,
what is your physical form? Claude's usual kind of answer is, I have no physical form. I am
an AI model. But when amplifying the Golden Gate Bridge feature, Claude responded, I am the
Golden Gate Bridge. My physical form is the iconic bridge itself. Quote, altering the feature
had made Claude effectively obsessed with the bridge, bringing it up in answer to almost any query,
even in situations where it wasn't at all relevant. They continue, the fact that manipulating these
features causes corresponding changes to behavior validates that they aren't just correlated with the presence
of concepts and input text, but also causally shaped the model's behavior. In other words, the features
are likely to be a faithful part of how the model internally represents the world and how it uses
these representations in its behavior. Said Chris O'Law from Anthropic, who led this team,
we're discovering features that may shed light on concerns about bias, safety risks, and autonomy.
I'm feeling really excited that we might be able to turn these controversial questions that people
argue about into things we can actually have more productive discourse on. An associate professor of
computer science at MIT, Jacob Andreas, who reviewed Anthropics research, called it a hopeful sign that
large-scale interpretability might be possible. He said, in the same way that understanding basic things
about how people work has helped us cure diseases, understanding how these models work will both let us
recognize when things are about to go wrong and let us build better tools for controlling them.
So obviously, this doesn't tell us everything about how LLMs work, but it does give us a pretty
strong jumping off point to go deeper in terms of this question of interpretability.
Science-y and dense, though this may be, I think this is going to be an important part of how we
resolve some of these questions of risk and challenges as AI moves forward.
The longer we stay in the realm of theoretical debates, the harder it will be to actually put
policies in place, whereas the more specific and applied we get, the better able we might be
to actually solve some of the challenges. Super interesting stuff, great work from the Anthropic
team, but for now, that is going to do it for the AI Daily Brief. Appreciate you listening or
watching as always, and until next time, peace.
