The AI Daily Brief: Artificial Intelligence News and Analysis - ChatGPT's First True Competitor? Anthropic Releases Claude 2

Starting point is 00:00:00 Today on the AI breakdown, Anthropic releases Claude 2 and GPT4 finally has some real competition. Before that on the brief, the OECD says 27% of jobs are at high risk for disruption. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information. Welcome back to the AI breakdown brief. All the AI headline news you need in five-ish minutes or less. Today we kick off with a new report from the OECD. This is their annual employment report, and it found that 27% of jobs are at the highest level of risk from AI.

Starting point is 00:00:40 Now, the OECD is the Organization of Economic Cooperation and Development. It's a 38-member transnational organization, and this survey was something they do every year. The OECD surveyed 5,300 professionals across 2,000 firms in seven countries, and the jobs that they said were at the highest risk were those who had at least 25 of the 100 skills and abilities that AI experts said could be easily automated away. They found that the jobs that have the highest risk of being automated away comprised 27% of employment in the OECD block. And making matters worse, this survey was collected before the rise of chat GPT. Now, in addition to that headline number, the OECD employment report also looked into how companies in different industries were going to handle these

Starting point is 00:01:24 changes. For example, when it came to finance, 64% of companies planned on retraining or upskilling their internal workers. Fifty-three percent said that they would buy services from external companies. 35 percent that they would be hiring new workers. And 17 percent anticipated layoffs because of redundancies. Now, at the same time, there were some positive things here as well. Sixty-three percent of respondents in the manufacturing and finance industries said that AI had improved their enjoyment of their job. This would follow along from the idea that AI helps automate away wrote tasks that are frustrating but key parts of any given profession. Seventy-nine percent of finance workers and 80 percent of manufacturing workers said it had improved their performance.

Starting point is 00:02:01 There were even mental health benefits with 54% in finance and 55% in manufacturing saying that it had improved their mental health. But there is still anxiety. 63% of professionals surveyed in finance are worried about job losses in the next 10 years due to AI, along with 57% of people in manufacturing. Next up today, the latest salvo in the battle around AI model training, Google has been hit by a class action lawsuit accusing it of stealing millions of. of users data to train their various AI tools. Clarkson Law Firm brought the suit against Google,

Starting point is 00:02:33 alphabet, and its subsidiary deep mind, and alleged that Google had, quote, been secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans. Now, Google for that part, don't seem particularly concerned. Their general counsel called the suit baseless and said, quote, we've been clear for years that we use data from public sources, like information published to the open web and public data sets to train the AI models behind services like Google Translate, responsibly, and in line with our AI principle. Now, you'll remember that just last week there was an updated privacy policy that said Google had access to even more publicly available information, so clearly these things are top of mind. Meanwhile, showing a slightly different path forward for how big models could access data, OpenAI and Shutterstock have come to an agreement to expand their deal around building generative AI tools.

Starting point is 00:03:18 As part of the deal over the next six years, OpenAI will have access to shutterstock data that includes everything from images to videos to music and its associated metadata. As part of the deal, Shutterstock will get what Open AI calls priority access to their latest tech capabilities, and it sounds like they're trying to integrate those tools directly into Shutterstock's content library as well. Given how litigious Getty images has been around stability AI's use of their images in the training of stable diffusion, this seems like a very different approach that Open AI is going for. Next, we follow up from one of our big stories from yesterday. You'll remember that the White House held its first ever classified AI briefing for senators, and after that meeting we got a lot of different responses.

Starting point is 00:03:58 I think the ways that I would sum it up are a couple different parts. First, there's definitely a sense that this meeting reinforced the urgency of dealing with these issues. Senator John Kennedy from Louisiana said, AI has this extraordinary potential to make our lives better if it doesn't kill us first. A second notable piece of the discourse coming out of this briefing is the extent to which people realize that there is an inherent tension, a line that they have to walk, between on the one hand wanting to provide some sort of guardrails and on the other, not wanting to over-regulate an industry which is extremely geostrategically important.

Starting point is 00:04:30 Marco Rubio from Florida said, one thing I'm certain of is, I know of no technological advancement in human history you've been able to roll back. It's going to happen. The question is how do we build guardrails and practices around it so that we can maximize its benefits and diminish its harm? At the same time, Rubio said, I just don't particularly know enough about AI yet

Starting point is 00:04:46 to even understand what it is we're trying to regulate. There's probably some role to play in codifying how government uses it in the defense realms and so forth, but beyond that I'm not prepared to give you an opinion because I think it's something we're still learning about. Now, that sentiment that we need to be careful with regulation even though we know we need regulation is something that wasn't just Republicans saying. In fact, Martin Henrik, Democrat from New Mexico, said, one of the interesting things about this space right now is it doesn't feel particularly partisan. So we have a moment

Starting point is 00:05:12 we should take advantage of. And the last takeaway for me is that it is definitely clear who the big boogeyman is when it comes to AI and it's China. For those of you who are watching this, you can see it on your screen right here. Fox News is headline. is Senators leave classified AI briefing confident but wary of existential threat posed by China. Eric Schmidt, Republican from Missouri, described China as, quote, playing for keeps. Senator Joni Ernst from Iowa said, We should always be concerned about China always and strive to do anything better and faster than China. Tim Cain from Virginia said, I think we're all very concerned about it.

Starting point is 00:05:44 So given all this, when it comes to what Congress and the Senator are expected to actually do, it's not exactly clear yet, but Chuck Schumer, the Senate Majority Leader, is trying to give some sense of it, saying, our timetable in terms of producing legislation is not years and not days and weeks, but months. We can't rush too fast, but we can't go slowly that either other governments that are authoritarian or bad actors who are private sector actors get ahead of us. So taking this all together, I think that we are very clearly going to hear a lot more about AI policy in the months to come. That's going to do it, however, for today's AI breakdown brief.

Starting point is 00:06:15 If you enjoyed it and you want to be part of a larger conversation, come check out the AI breakdown Discord. The link is bit.ly slash AI breakdown, and I'll be looking forward to seeing you there. For now, thanks for watching or listening, and I'll be back soon with the main AI breakdown. Before we get into the main AI breakdown, I want to tell you about today's sponsor, Supermanage. If you work in a professional setting, you probably have some version of a one-on-one meeting, either with the people that work for you or the people that you work with. Unfortunately, all too often, those one-on-one meetings become glorified catch-up calls.

Starting point is 00:06:50 Don't you wish you could jump right to the stuff that really matters? That's where SuperManage comes in. Supermanage AI magically distills your team's public Slack channels into a real-time brief on any employee, any time. Catch up on contributions, work in progress, challenges they're facing, sentiment, everything you need to show up ready for a truly meaningful conversation. And it's completely free. Visit supermanage.a.i forward slash breakdown today to start making the most of your

Starting point is 00:07:16 one-on-ones. And thanks again to SuperManage for sponsoring the AI breakdown. The LLM Wars have ratcheted up another huge notch today, as Anthropic has released Claude 2, and many feel it's the first time GPT4 has some real competition. Welcome back to the AI breakdown. Today we are talking about what is a major moment in the history of the competition between LLMs. Since last November, with the launch of ChatGPT, really nothing has actively competed for

Starting point is 00:07:49 supremacy with chat GPT. Sure, Google's bard has at times shown flashes of how it might be able to keep up or compete through integration with other Google products. And yes, Microsoft has obviously made a ton of advances in LLMs integrated into their services as well. But of course, they were connected to open AI in an integral way. And so in many ways, Microsoft success also seems like OpenAI success. Anthropic, and specifically Claude II, offers something a little bit different. There have been two moments this year where Anthropic really hopped into public attention. One was when they announced their constitutional AI model. The idea of Anthropics constitutional model is that when it comes to answering questions such as which questions an LLM will

Starting point is 00:08:31 engage with versus deem inappropriate, what types of actions will it encourage versus discourage? Instead of using a human feedback model for specific instances of those questions, Anthropic would try to instill values in its model through what it calls constitutional AI. After having identified that the human feedback version has a number of problems, including difficulty scaling, requiring people to interact with disturbing outputs, and of course the substantial time and resources it takes, Anthropic has an approach where, quote, the Constitution guides the model to take on the normative behavior described in the Constitution,

Starting point is 00:09:03 such as helping to avoid toxic or discriminatory outputs, avoiding helping a human engage in illegal or unethical activities, and broadly creating an AI system that is helpful, honest, and harmless. The next thing that captured attention was when Anthropic introduced a 100K context window. Context window refers to how much information a model can ingest at one time. Right now, for some comparison, the GPT4 context window that you use when you're interacting with chat GPT is around 8,000 tokens. So 100,000 tokens makes a significant difference.

Starting point is 00:09:34 When you're using chat GPT, you have to use embeddings and other ways to get around that context window, whereas with this 100K context window, that corresponds to around 75,000 words. That's roughly the length of the Great Gatsby, meaning it's hundreds of pages of of material. This also means it can take in things like business 10Ks, complete research reports, and other dense information that would have had to be broken up in other models. Okay, so coming into yesterday, we have two things where Anthropic is either leading or doing something different, the 100K context window where they're leading, and the constitutional AI model where it's probably too early to call them leading, but at least they were thinking about it in new and interesting

Starting point is 00:10:11 and novel ways. Well, yesterday they announced Claude II. Claude II is their latest model, and it is putting up some impressive results. Those include passing grades on the USMLA medical exam, impressive results on the GRE on both verbal reasoning and analytical writing, significantly improved coding performance, and more. Now, of course, what people are really interested in is how it compares to Chad GPT, or more specifically, GPT4. Dr. Jim Fan from Nvidia did a comparison post. He writes, On standard exams, it's not quite at GPT4 yet, but catching up fast compared to version 1.3. GRE verbal, it scored 165 versus GPT4 is 169. On GRE writing, Claude 2 scored a 5 versus GPT4 is 4.

Starting point is 00:10:51 On the GRE quantitative, it got a 154 versus GPT4's 163. On the USMLE, as we said, Claude got a passing grade at 67, but it was far behind GPT4 with an 85. But on the bar, Claude 2 actually outperformed GPT4 very slightly. Now, when it comes to reasoning benchmarks, on human evaluation coding, Claude got a 71.2% versus Gpt 4 is 67%. On grade school math, Claude got an 88% versus GPT4 is 92%. Now, as Jim points out, big caveat on the standard exams. The prompting protocols may be very different and there's no error bars in a large number of exams. The comparison may not be statistically significant. What Jim is saying is that when it comes to that bar score of 76.5% for Claude versus 75.7% for GPT4, that's

Starting point is 00:11:37 functionally the same, but I think that for our purposes of understanding Anthropics Claude 2 as an actual real competitor for GPT4, the point remains the same. So what we have here is a model that seems pretty similar in performance to GPT4, which is obviously the state of the art. However, there are a number of other areas where Claude 2 offer something really different. One of those comes with its knowledge cutoff. For GPT4 at September 2021, whereas Claude 2's knowledge cutoff is in early 2023. Now, of course, access to Browse with Bing and other internet-connected models of GPT4 probably reduced the importance of that just a little bit, but it's still worth noting. A second huge difference is obviously that context window that we discussed.

Starting point is 00:12:18 Earlier this year, people were freaking out about the idea that ChatGPT would at some point move up to a 32K context window, so having a 100K context window in a model this powerful is significantly different. But when it comes to high-performance uses of this, maybe the most significant thing is that Claude 2 is currently four to five times cheaper than GPT432K. Jim writes prompt tokens cost $11 versus $60 per million, and completion costs $32 versus $120 per million, assuming similar tokenization length. So what you have here then is a model with comparable performance and capabilities,

Starting point is 00:12:55 with a much bigger context window, more recent knowledge natively, and much cheaper. And at this point, you're probably getting why I called this video the first, chat GPT competitor. So let's talk about what people are finding Claude 2 useful for right out of the gate. One thing is definitely around document summarization, which makes sense given that larger context window. Selly Omar writes, Anthropic literally killed every chat PDF wrapper with Claude 2. You can upload files now. I tried Tesla's latest Q1 update and asked, what are the key takeaways from this Q1 update? Can you make any forecast for the price? It was able to answer it flawlessly with sources. Now, interestingly, it isn't just single document summarization that's

Starting point is 00:13:36 valuable, but you can actually manage multiple documents at the same time. Given that, you can ask for things like comparisons, common points, changes between them, which obviously opens up a whole different set of use cases. Carlos E. Perez writes, oh my gosh, you can import several documents into Claude II and ask the relationship between the concept found in each document. It's conceptual blending on steroids. Karen Neuyan, who's technical staff at and Anthropics said, after spending some time with Claude 2, I've uncovered some interesting use cases for myself that might be useful for others too. One, she says, is U.X writing. Two is prototyping. Three is asking Claude to brainstorm conversation starters and topics for discussions with people based

Starting point is 00:14:15 on their background, uploading people's resume as a way to get it prompted. And then she also discusses data storytelling, i.e. understanding trends and data across multiple attachments, as well as editorial feedback. Professor Ethan Mollick writes, The new Claude 2 AI is quite good in early testing. Definitely much closer to GPT4 in terms of quality, continues to be the most pleasant AI personality, very good at summarizing documents, especially PDFs. On the doubt side, don't use Claude for data. It hallucinates answers.

Starting point is 00:14:42 And so far, that's definitely what I've seen from others as well, that there is still more hallucination than perhaps there is with GPT4. Now, the other big one is coding. As you heard before, Claude 2 seemed to be doing as well or better than GPT4 on coding, but is that the case in the real world? AI startup founder David writes, doing lots of tests between Claude 2 and GPT4, my initial observation is that Claude 2 actually seems to be following a given JSON schema's description a lot better.

Starting point is 00:15:07 GPT4 sometimes gets a bit too creative, even at temperature zero. Now, corresponding with this launch was a piece in the New York Times called Inside the White Hot Center of AI Dumerism. Anthropic, a safety-focused AI startup, is trying to compete with chat GPT while preventing an AI apocalypse. It's been a little stressful. Kevin Roos describes the culture at Anthropics. thusly, saying, Anthropics' employees aren't just worried that their app will break or that users

Starting point is 00:15:32 won't like it. They're scared at a deep existential level about the very idea of what they're doing, building powerful AI models and releasing them into the hands of people who might use them to do terrible and destructive things. Indeed writes, Roos, at Anthropic, the Doom Factor is turned up to 11. I spent weeks, he wrote, interviewing Anthropic executives, talking to engineers and researchers, and sitting in on meetings with product teams ahead of Claude 2's launch. And while I initially thought I might be shown a sunny, optimistic version of AI's potential, a world where polite chatbots tutor students, make office workers more productive, and help scientists cure diseases, I soon learned that rose-colored glasses weren't Anthropics thing. They were more interested in

Starting point is 00:16:09 scaring me. In a series of long, candid conversations, anthropic employees told me about the harms they worried future AI systems could unleash, and some compared themselves to modern-day Robert Oppenheimer's, weighing moral choices about powerful new technology that could profoundly alter the course of history. Now, it's a little out of scope for this particular video, but I encourage you to go read this New York Times piece. It's a really interesting look inside the culture at Anthropic. And I think has ramifications also for how we understand the people who are closest to the development of these new models are thinking about the risks as well. Coming back to Claude 2 in practice, again quoting Sully Omar on Twitter,

Starting point is 00:16:43 Claude 2 is definitely going to force OpenAI's hand. It's cheaper and quicker than GPT4. Output isn't as good, but it's almost there for a lot of tasks. I don't see myself using GPT4 as much anymore unless they don't. drop prices. And so again, as I said right at the beginning, I think what we have here in Claude 2 is the first realistic competitor for GPT4. The areas where it doesn't perform as well are clear, but the benefits that it offers in terms of cost, in terms of context window, are pretty clear. However, if there is one thing that could extend GPT4's dominance, I think it has to be code

Starting point is 00:17:19 interpreter. Many people have called code interpreter basically a sneaky GPT 4.5. Developer and Latent Space podcast host Swix wrote, Very interesting theory on why OpenAI cannot name code interpreter GPT 4.5 because of pause optics. That was followed by a quote tweet from someone who said, You didn't forget the open pause letter that had the whole world shaking, did you? They just pushed GPT4.5 into the world. The cover works so well most assume it's just a plug-in. Mission accomplished.

Starting point is 00:17:47 And so we are left today with an LLLN landscape that has more competition. Dem DiPolowski even says, waiting for the next move from Google. So when it comes to the capabilities that we have access to as consumers and as businesses, this is nothing but good news. On the other hand, to the extent that one thinks the never-ending quest for greater capabilities and the business AI arms race that it is inciting could have deleterious impacts when it comes to AI risk and safety, then it's hard to say how good this or any other advancement really is. But for now, the inexorable march of technology forward has continued, and Anthropics Cloud 2 is a major player. That's going to do it for today's AI breakdown.

Starting point is 00:18:22 If you're enjoying this, please hit the notification button below. I want to make sure you see all of the AI breakdown videos as soon as they come out. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - ChatGPT's First True Competitor? Anthropic Releases Claude 2

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.