The AI Daily Brief: Artificial Intelligence News and Analysis - ChatGPT's First True Competitor? Anthropic Releases Claude 2
Episode Date: July 12, 2023Anthropic has released Claude 2. The model preforms comparably to GPT-4 but offers a 100k context window for 4-5x cheaper. NLW does a roundup of the AI community's first impressions. Before that on th...e Brief: the OECD's 2023 employment report finds that more than a quarter of jobs in their member countries have the highest level of risk to be automated away by AI; OpenAI strikes a 6-year deal with Shutterstock and Senators reflect on yesterday's classified White House AI briefing. Today's Sponsor: Supermanage - AI for 1-on-1's - https://supermanage.ai/breakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, Anthropic releases Claude 2 and GPT4 finally has some real competition.
Before that on the brief, the OECD says 27% of jobs are at high risk for disruption.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information.
Welcome back to the AI breakdown brief.
All the AI headline news you need in five-ish minutes or less.
Today we kick off with a new report from the OECD.
This is their annual employment report, and it found that 27% of jobs are at the highest level of risk from AI.
Now, the OECD is the Organization of Economic Cooperation and Development.
It's a 38-member transnational organization, and this survey was something they do every year.
The OECD surveyed 5,300 professionals across 2,000 firms in seven countries,
and the jobs that they said were at the highest risk were those who had at least 25 of the 100 skills and abilities that AI
experts said could be easily automated away. They found that the jobs that have the highest risk of
being automated away comprised 27% of employment in the OECD block. And making matters worse, this survey
was collected before the rise of chat GPT. Now, in addition to that headline number, the OECD
employment report also looked into how companies in different industries were going to handle these
changes. For example, when it came to finance, 64% of companies planned on retraining or upskilling their
internal workers. Fifty-three percent said that they would buy services from external companies.
35 percent that they would be hiring new workers. And 17 percent anticipated layoffs because of redundancies.
Now, at the same time, there were some positive things here as well. Sixty-three percent of respondents
in the manufacturing and finance industries said that AI had improved their enjoyment of their job.
This would follow along from the idea that AI helps automate away wrote tasks that are
frustrating but key parts of any given profession. Seventy-nine percent of finance workers and 80 percent
of manufacturing workers said it had improved their performance.
There were even mental health benefits with 54% in finance and 55% in manufacturing saying
that it had improved their mental health.
But there is still anxiety.
63% of professionals surveyed in finance are worried about job losses in the next 10 years due to AI,
along with 57% of people in manufacturing.
Next up today, the latest salvo in the battle around AI model training, Google has been hit
by a class action lawsuit accusing it of stealing millions of.
of users data to train their various AI tools. Clarkson Law Firm brought the suit against Google,
alphabet, and its subsidiary deep mind, and alleged that Google had, quote, been secretly stealing
everything ever created and shared on the internet by hundreds of millions of Americans.
Now, Google for that part, don't seem particularly concerned. Their general counsel called the suit
baseless and said, quote, we've been clear for years that we use data from public sources,
like information published to the open web and public data sets to train the AI models behind
services like Google Translate, responsibly, and in line with our AI principle.
Now, you'll remember that just last week there was an updated privacy policy that said Google had access to even more publicly available information, so clearly these things are top of mind.
Meanwhile, showing a slightly different path forward for how big models could access data, OpenAI and Shutterstock have come to an agreement to expand their deal around building generative AI tools.
As part of the deal over the next six years, OpenAI will have access to shutterstock data that includes everything from images to videos to music and its associated metadata.
As part of the deal, Shutterstock will get what Open AI calls priority access to their latest tech capabilities,
and it sounds like they're trying to integrate those tools directly into Shutterstock's content library as well.
Given how litigious Getty images has been around stability AI's use of their images in the training of stable diffusion,
this seems like a very different approach that Open AI is going for.
Next, we follow up from one of our big stories from yesterday.
You'll remember that the White House held its first ever classified AI briefing for senators,
and after that meeting we got a lot of different responses.
I think the ways that I would sum it up are a couple different parts.
First, there's definitely a sense that this meeting reinforced the urgency of dealing with these issues.
Senator John Kennedy from Louisiana said,
AI has this extraordinary potential to make our lives better if it doesn't kill us first.
A second notable piece of the discourse coming out of this briefing is the extent to which people realize
that there is an inherent tension, a line that they have to walk,
between on the one hand wanting to provide some sort of guardrails and on the other,
not wanting to over-regulate an industry which is extremely geostrategically important.
Marco Rubio from Florida said,
one thing I'm certain of is,
I know of no technological advancement in human history you've been able to roll back.
It's going to happen.
The question is how do we build guardrails and practices around it
so that we can maximize its benefits and diminish its harm?
At the same time, Rubio said,
I just don't particularly know enough about AI yet
to even understand what it is we're trying to regulate.
There's probably some role to play in codifying
how government uses it in the defense realms and so forth,
but beyond that I'm not prepared to give you
an opinion because I think it's something we're still learning about. Now, that sentiment that we need to
be careful with regulation even though we know we need regulation is something that wasn't just
Republicans saying. In fact, Martin Henrik, Democrat from New Mexico, said, one of the interesting
things about this space right now is it doesn't feel particularly partisan. So we have a moment
we should take advantage of. And the last takeaway for me is that it is definitely clear who the
big boogeyman is when it comes to AI and it's China. For those of you who are watching this, you can
see it on your screen right here. Fox News is headline.
is Senators leave classified AI briefing confident but wary of existential threat posed by China.
Eric Schmidt, Republican from Missouri, described China as, quote, playing for keeps.
Senator Joni Ernst from Iowa said,
We should always be concerned about China always and strive to do anything better and faster than China.
Tim Cain from Virginia said, I think we're all very concerned about it.
So given all this, when it comes to what Congress and the Senator are expected to actually do,
it's not exactly clear yet, but Chuck Schumer, the Senate Majority Leader, is trying to give some sense of it, saying,
our timetable in terms of producing legislation is not years and not days and weeks, but months.
We can't rush too fast, but we can't go slowly that either other governments that are authoritarian
or bad actors who are private sector actors get ahead of us.
So taking this all together, I think that we are very clearly going to hear a lot more about
AI policy in the months to come.
That's going to do it, however, for today's AI breakdown brief.
If you enjoyed it and you want to be part of a larger conversation, come check out the
AI breakdown Discord.
The link is bit.ly slash AI breakdown, and I'll be looking forward to seeing you there.
For now, thanks for watching or listening, and I'll be back soon with the main AI breakdown.
Before we get into the main AI breakdown, I want to tell you about today's sponsor, Supermanage.
If you work in a professional setting, you probably have some version of a one-on-one meeting,
either with the people that work for you or the people that you work with.
Unfortunately, all too often, those one-on-one meetings become glorified catch-up calls.
Don't you wish you could jump right to the stuff that really matters?
That's where SuperManage comes in.
Supermanage AI magically distills your team's public Slack channels into a real-time
brief on any employee, any time.
Catch up on contributions, work in progress, challenges they're facing, sentiment,
everything you need to show up ready for a truly meaningful conversation.
And it's completely free.
Visit supermanage.a.i forward slash breakdown today to start making the most of your
one-on-ones.
And thanks again to SuperManage for sponsoring the AI breakdown.
The LLM Wars have ratcheted up another huge notch today, as Anthropic has released
Claude 2, and many feel it's the first time GPT4 has some real competition.
Welcome back to the AI breakdown.
Today we are talking about what is a major moment in the history of the competition between
LLMs.
Since last November, with the launch of ChatGPT, really nothing has actively competed for
supremacy with chat GPT. Sure, Google's bard has at times shown flashes of how it might be able to
keep up or compete through integration with other Google products. And yes, Microsoft has obviously
made a ton of advances in LLMs integrated into their services as well. But of course, they were
connected to open AI in an integral way. And so in many ways, Microsoft success also seems like
OpenAI success. Anthropic, and specifically Claude II, offers something a little bit different.
There have been two moments this year where Anthropic really hopped into public
attention. One was when they announced their constitutional AI model. The idea of Anthropics
constitutional model is that when it comes to answering questions such as which questions an LLM will
engage with versus deem inappropriate, what types of actions will it encourage versus discourage?
Instead of using a human feedback model for specific instances of those questions,
Anthropic would try to instill values in its model through what it calls constitutional AI.
After having identified that the human feedback version has a number of problems,
including difficulty scaling, requiring people to interact with disturbing outputs,
and of course the substantial time and resources it takes,
Anthropic has an approach where, quote,
the Constitution guides the model to take on the normative behavior described in the Constitution,
such as helping to avoid toxic or discriminatory outputs,
avoiding helping a human engage in illegal or unethical activities,
and broadly creating an AI system that is helpful, honest, and harmless.
The next thing that captured attention was when Anthropic introduced a 100K context window.
Context window refers to how much information a model can ingest at one time.
Right now, for some comparison, the GPT4 context window that you use when you're interacting
with chat GPT is around 8,000 tokens.
So 100,000 tokens makes a significant difference.
When you're using chat GPT, you have to use embeddings and other ways to get around that
context window, whereas with this 100K context window, that corresponds to around 75,000 words.
That's roughly the length of the Great Gatsby, meaning it's hundreds of pages of
of material. This also means it can take in things like business 10Ks, complete research reports,
and other dense information that would have had to be broken up in other models. Okay, so coming
into yesterday, we have two things where Anthropic is either leading or doing something different,
the 100K context window where they're leading, and the constitutional AI model where it's probably
too early to call them leading, but at least they were thinking about it in new and interesting
and novel ways. Well, yesterday they announced Claude II. Claude II is their latest model, and it
is putting up some impressive results. Those include passing grades on the USMLA medical exam,
impressive results on the GRE on both verbal reasoning and analytical writing, significantly improved
coding performance, and more. Now, of course, what people are really interested in is how it compares
to Chad GPT, or more specifically, GPT4. Dr. Jim Fan from Nvidia did a comparison post. He writes,
On standard exams, it's not quite at GPT4 yet, but catching up fast compared to version 1.3.
GRE verbal, it scored 165 versus GPT4 is 169.
On GRE writing, Claude 2 scored a 5 versus GPT4 is 4.
On the GRE quantitative, it got a 154 versus GPT4's 163.
On the USMLE, as we said, Claude got a passing grade at 67, but it was far behind GPT4 with an 85.
But on the bar, Claude 2 actually outperformed GPT4 very slightly.
Now, when it comes to reasoning benchmarks, on human evaluation coding, Claude got a 71.2% versus
Gpt 4 is 67%. On grade school math, Claude got an 88% versus GPT4 is 92%. Now, as Jim points out, big caveat
on the standard exams. The prompting protocols may be very different and there's no error bars
in a large number of exams. The comparison may not be statistically significant. What Jim is saying
is that when it comes to that bar score of 76.5% for Claude versus 75.7% for GPT4, that's
functionally the same, but I think that for our purposes of understanding Anthropics
Claude 2 as an actual real competitor for GPT4, the point remains the same. So what we have here
is a model that seems pretty similar in performance to GPT4, which is obviously the state of the
art. However, there are a number of other areas where Claude 2 offer something really different.
One of those comes with its knowledge cutoff. For GPT4 at September 2021, whereas Claude 2's knowledge
cutoff is in early 2023. Now, of course, access to Browse with Bing and other internet-connected
models of GPT4 probably reduced the importance of that just a little bit, but it's still worth noting.
A second huge difference is obviously that context window that we discussed.
Earlier this year, people were freaking out about the idea that ChatGPT would at some point
move up to a 32K context window, so having a 100K context window in a model this powerful is significantly
different. But when it comes to high-performance uses of this, maybe the most significant thing
is that Claude 2 is currently four to five times cheaper than GPT432K.
Jim writes prompt tokens cost $11 versus $60 per million,
and completion costs $32 versus $120 per million,
assuming similar tokenization length.
So what you have here then is a model with comparable performance and capabilities,
with a much bigger context window, more recent knowledge natively, and much cheaper.
And at this point, you're probably getting why I called this video the first,
chat GPT competitor. So let's talk about what people are finding Claude 2 useful for right out of the
gate. One thing is definitely around document summarization, which makes sense given that larger context
window. Selly Omar writes, Anthropic literally killed every chat PDF wrapper with Claude 2.
You can upload files now. I tried Tesla's latest Q1 update and asked, what are the key
takeaways from this Q1 update? Can you make any forecast for the price? It was able to answer it
flawlessly with sources. Now, interestingly, it isn't just single document summarization that's
valuable, but you can actually manage multiple documents at the same time. Given that, you can ask for
things like comparisons, common points, changes between them, which obviously opens up a whole
different set of use cases. Carlos E. Perez writes, oh my gosh, you can import several documents
into Claude II and ask the relationship between the concept found in each document. It's
conceptual blending on steroids. Karen Neuyan, who's technical staff at and
Anthropics said, after spending some time with Claude 2, I've uncovered some interesting use cases for
myself that might be useful for others too. One, she says, is U.X writing. Two is prototyping.
Three is asking Claude to brainstorm conversation starters and topics for discussions with people based
on their background, uploading people's resume as a way to get it prompted. And then she also
discusses data storytelling, i.e. understanding trends and data across multiple attachments, as well as
editorial feedback. Professor Ethan Mollick writes,
The new Claude 2 AI is quite good in early testing. Definitely much
closer to GPT4 in terms of quality, continues to be the most pleasant AI personality,
very good at summarizing documents, especially PDFs.
On the doubt side, don't use Claude for data.
It hallucinates answers.
And so far, that's definitely what I've seen from others as well,
that there is still more hallucination than perhaps there is with GPT4.
Now, the other big one is coding.
As you heard before, Claude 2 seemed to be doing as well or better than GPT4 on coding,
but is that the case in the real world?
AI startup founder David writes,
doing lots of tests between Claude 2 and GPT4, my initial observation is that Claude 2 actually
seems to be following a given JSON schema's description a lot better.
GPT4 sometimes gets a bit too creative, even at temperature zero.
Now, corresponding with this launch was a piece in the New York Times called Inside the White
Hot Center of AI Dumerism.
Anthropic, a safety-focused AI startup, is trying to compete with chat GPT while preventing
an AI apocalypse.
It's been a little stressful.
Kevin Roos describes the culture at Anthropics.
thusly, saying, Anthropics' employees aren't just worried that their app will break or that users
won't like it. They're scared at a deep existential level about the very idea of what they're doing,
building powerful AI models and releasing them into the hands of people who might use them to do
terrible and destructive things. Indeed writes, Roos, at Anthropic, the Doom Factor is turned up to
11. I spent weeks, he wrote, interviewing Anthropic executives, talking to engineers and researchers,
and sitting in on meetings with product teams ahead of Claude 2's launch. And while I initially thought
I might be shown a sunny, optimistic version of AI's potential, a world where polite chatbots
tutor students, make office workers more productive, and help scientists cure diseases, I soon
learned that rose-colored glasses weren't Anthropics thing. They were more interested in
scaring me. In a series of long, candid conversations, anthropic employees told me about the
harms they worried future AI systems could unleash, and some compared themselves to modern-day
Robert Oppenheimer's, weighing moral choices about powerful new technology that could profoundly
alter the course of history. Now, it's a little out of scope for this particular video,
but I encourage you to go read this New York Times piece.
It's a really interesting look inside the culture at Anthropic.
And I think has ramifications also for how we understand the people who are closest to the development of these new models are thinking about the risks as well.
Coming back to Claude 2 in practice, again quoting Sully Omar on Twitter,
Claude 2 is definitely going to force OpenAI's hand.
It's cheaper and quicker than GPT4.
Output isn't as good, but it's almost there for a lot of tasks.
I don't see myself using GPT4 as much anymore unless they don't.
drop prices. And so again, as I said right at the beginning, I think what we have here in Claude
2 is the first realistic competitor for GPT4. The areas where it doesn't perform as well are clear,
but the benefits that it offers in terms of cost, in terms of context window, are pretty clear.
However, if there is one thing that could extend GPT4's dominance, I think it has to be code
interpreter. Many people have called code interpreter basically a sneaky GPT 4.5.
Developer and Latent Space podcast host Swix wrote,
Very interesting theory on why OpenAI cannot name code interpreter GPT 4.5 because of pause optics.
That was followed by a quote tweet from someone who said,
You didn't forget the open pause letter that had the whole world shaking, did you?
They just pushed GPT4.5 into the world.
The cover works so well most assume it's just a plug-in.
Mission accomplished.
And so we are left today with an LLLN landscape that has more competition.
Dem DiPolowski even says,
waiting for the next move from Google.
So when it comes to the capabilities that we have access to as consumers and as businesses, this is nothing but good news.
On the other hand, to the extent that one thinks the never-ending quest for greater capabilities and the business AI arms race that it is inciting
could have deleterious impacts when it comes to AI risk and safety, then it's hard to say how good this or any other advancement really is.
But for now, the inexorable march of technology forward has continued, and Anthropics Cloud 2 is a major player.
That's going to do it for today's AI breakdown.
If you're enjoying this, please hit the notification button below.
I want to make sure you see all of the AI breakdown videos as soon as they come out.
And until next time, peace.
