The AI Daily Brief: Artificial Intelligence News and Analysis - Early Uses for Anthropic's Claude 3.5 and Artifacts

Episode Date: June 21, 2024

Anthropic has launched the latest model, Claude 3.5 Sonnet, and a new feature called artifacts. Claude 3.5 Sonnet outperforms GPT-4 in several metrics and introduces a new interface for generating and... interacting with documents, code, diagrams, and more. Discover the early use cases, performance improvements, and the exciting possibilities this new release brings to the AI landscape. Learn how to use AI with the world's biggest library of fun and useful tutorials: https://besuper.ai/ Use code 'youtube' for 50% off your first month. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, Anthropic releases its latest model. Before that in the headlines, is AI becoming a presidential issue? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the AI Daily News you need in around five minutes. We kick off today with some comments from former President Donald Trump, who is of course also running for president right now. He recently appeared on the All In podcast, part of a courting of the technology industry that he's doing in his campaign right now, and spent a bit of time talking about artificial intelligence as it relates to energy. Let's quickly listen to this one minute clip.
Starting point is 00:00:49 We have a phenomena coming up right now, and I was talking about it the other day to David, and that's AI, little things, simple, two little simple letters, but it's big. And I realized the other day, more than anything, when we were at David's house and, talking to a lot of geniuses from Silicon Valley and other places, they need electricity at levels that nobody's ever experienced before to be successful, to be a leader in AI. The amount of electricity that needs to double what we have right now and even triple what we have right now, it's incredible how much they need to be the leader. And we're going to have to be able to do that. And a windmill turning with its blade knocking out the birds and everything else is not going to be able to make us competitive.
Starting point is 00:01:39 You'll have China. What about nuclear, Mr. President? Yeah. So let me just give you a statistic on this China's building. So what's interesting to me about this is not only that like him or low them, when Trump speaks, it does set the tone of the conversation to come, but more that we're not just seeing a superficial surface level conversation around AI competitiveness and AI as it relates to a great. global geostrategic battle with China, but this specific awareness of the relationship between AI and energy. This is something that Sam Altman has talked about a lot before as well. When he's been asked about whether he's concerned about the environmental impact of artificial intelligence,
Starting point is 00:02:14 he's basically said that we're going to have to innovate new energy solutions for the sake of AI because we simply can't get enough with what we have. I'll be watching closely to see if AI starts appearing in this type of political discourse more. But for now, speaking of the government in AI. Also in a recent interview, OpenAI's Mira Muradi said that they give the government early access to new AI models. How do you minimize risk? Right. And providing people the tools to do that. And in the case of government, for example, it's very important to bring them along and give them early access to things, educate them. As you'll hear more in the main episode, today, as part of its announcement of its new model, Claude 3.5 Sonnet, Anthropics shared more
Starting point is 00:03:04 details about how they gave access to the UK AI Safety Institute. So again, maybe a bit of a shifting of the tide in terms of how these frontier labs think about their relationship with the government, even in advance of any sort of comprehensive legislation. Next up today, we've got a set of different funding announcements. Hey Jen, the Video Avatar and Visual Storytelling Company has announced a $60 million series A. They write that in just over a year, they've grown from 1 million in ARR to over 35 million, and have been profitable since quarter two of last year. What's more, they say, more than 40,000 companies are currently using the tools. These numbers are pretty impressive to me, especially from the standpoint that we're only
Starting point is 00:03:42 just beginning to scratch the surface on avatar use cases in the workplace. I happen to think that this is going to be an incredibly rich area of experimentation, where we're going to see everything from company onboarding to change sales processes, and 60 million more in the war chest will definitely give Hagen the ability to compete at the front of that pack. Another French company has also raised a big round. Poolside is according to TechCrunch raising $400 million at a $2 billion valuation. This is not yet an announced deal. TechCrunch writes that Bain Capital Ventures and DST are in talks to co-lead the round, with BCV being a previous investor and DST being a new investor. You might remember this company from when they raised a $126 million
Starting point is 00:04:21 seed round last year. Poolside is focused on using AI to accelerate and augment human software developers. TechCrunch points out that not only is France pumping out some very high-profile AI companies, not one but two others have also raised nine-figure seed rounds. Mistral raised $113 million, and H raised $220 million. TechCrunch writes, The City of Light might need to be renamed the City of AI at this rate. AI Language Tudor Speak announced that it had reached a half-billion-dollar valuation, raising $20 million in a series B3 financing.
Starting point is 00:04:55 The company writes, speaks AI language tutor builds a path towards spoken fluency without reliance on human conversation. Users gain fluency by learning speaking patterns and practicing repetition and crafted lessons rather than memorizing vocabulary and grammar. An impressive statistic from this press release, learners speak a thousand times on average in their first week. Finally today, on the other end of the financing spectrum, the information reports that chip company Cerebris has quietly filed for an IPO.
Starting point is 00:05:22 The information writes, The IPO plans show the eight-year-old company wants to ride a wave of investor enthusiasm over AI hardware sales that have made InVIDIA the world's most valuable company and boosted scores of other stocks. The startup's financial results couldn't be learned. It said in a blog post in December that it had recently reached cash flow break-even without elaborating. Still, the information writes that a new share authorization suggests Cereverus is valuing itself at around $2.5 billion. So, friends, that is the news from here, and that's going to do it for today's headlines. Next up, the main episode.
Starting point is 00:05:51 A quick note before we get back to the show, today's episode is brought to you by Super Intelligent. Super is, of course, the platform that we built and released a couple months ago to help people learn how to use AI. It's built around fun, fast tutorials that get you actually using AI in minutes, not hours, and certainly not days. The learning all happens in the context of an engaged community, and we've got a bunch of exciting features rolling out in the weeks to come. One of those is a new team's version of the platform, which includes a custom curated playlist, as well as a showcase where people on your team can share their use cases and projects in AI across the organization. If you are interested in being a part of the Super for Teams beta, go to BSUPER.A.I.
Starting point is 00:06:31 slash partner so you can learn about the program. Welcome back to the AI Daily Brief. Today we have one of my favorite types of stories, which is when a very cool new tool gets released and people get all excited to go try it out. Of course, this tool coming from one of the biggest labs in the space, Anthropic, and it beating out GPD-40 on a number of different metrics means even more excitement. First, let's talk about what was actually released. It is called Claude 3.5 Sonnet. Anthropic rights were launching Claude 3.5 Sonnet, our first release in the forthcoming
Starting point is 00:07:02 Claude 3.5 Model family. So basically, this is an update to their mid-tier model Sonnet, which was previously behind their top-tier model Opus. Claude 3.5 Sonet is more intelligent based on benchmark scores, but also still costs less in terms of price per million tokens. The 200K token context window has remained the same, but the benchmarks appear very impressive. A highlight, for example, that they call out, in an internal agentic coding evaluation, Claude 3.5 Sonnet solves 64% of problems, outperforming Claude 3 opus, which solved 38%. On graduate level reasoning, 3.5 Sonnet scored a 59.4% compared to GPT-40's 53.6%.
Starting point is 00:07:39 On the MMLU, its five-shot score of 88.7% was the same as GPT-40's zero-shot score. The Claude zero-shot score was just a little bit behind it 88.3%. And so on and so forth, you get the idea. Basically, on most of these tests, with the exception of one particular math benchmark, Claude 3.5 Sonnet was outperforming not only Claude 3 opus, but GBT40 as well. Anthropic also calls out Claude for having state-of-the-art vision. They write, Claude 3-Sonet is our strongest vision model yet, surpassing Claude 3 opus on standard vision benchmarks.
Starting point is 00:08:11 These step-change improvements are most notable for tasks that require visual reasoning, like interpreting charts and graphs. Cloud 3.5 Sonnet can also accurately transcribe text from imperfect images, a core capability for retail logistics and financial services. Another interesting note on the safety side of things is they say that they provided Claude 3.5 Sonnet to the UK's artificial intelligence safety institute for pre-deployment safety evaluation. So basically, the model testing wasn't just internal. Okay, so basically part one of this is that we have a model upgrade. The Verge writes Claude 3.5 Sonet is apparently Anthropics, smartest, and most
Starting point is 00:08:44 personable model yet. The verge sums up, for now the model is the big news, and the pace of improvement here is wild to watch. Anthropic launched Claude 3 Opus in March, proudly saying it was as good as GPT4 and Gemini 1.0, before OpenAI and Google released better versions of their models. Now Anthropic has made its next move, and it surely won't be long before its competition does so too. Claude doesn't get talked about as much as Gemini or ChatGPT, but it's very much in the race. Another reason for it to remain in the race is that this model update also comes with a pretty meaningful interface update. Mike Krieger, the company's chief product officer, who was previously the co-founder of Instagram, wrote, were also launching a preview of artifacts on clod.aI. You can ask
Starting point is 00:09:25 Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games. Artifacts appear next to your chat, letting you see, iterate and build on your creations in real time. I've used it to work on writing, coding, and diagram projects. Yesterday for Superintelligent, I did a tutorial about four early use cases for Claude and Artifacts. The first was writing an SOP document for a remote tech company. If you're watching this, you can see that what an artifact represents is basically a right side of the chatbot that's all about previewing the output of your instructions which are going on on the left side. This is in many ways primarily a user experience update. By separating the section where you are providing instructions and interacting with the chatbot
Starting point is 00:10:05 from the output is arguably just a better way to set up the interface. For example, after I got that SOP document. I asked it to add a tongue-in-cheek section about taking advantage of the fun parts of work from home, including dress code and family time, which it did as a second document preview that I could see alongside the first version. A couple other use cases I shared. I did a screenshot of growth statistics from our super intelligent YouTube channel that I got from Social Blade and asked Claude to turn it into a line chart, which it dutifully did. I had it write the copy and the code for an AI consultancy to help small businesses. One interesting note here is after I got the first version, I asked it to update the name to Mighty AI Consulting and explained that I wanted the name to reference
Starting point is 00:10:45 the idea of these businesses as small but mighty, which it then took as a prompt to update the tagline, not just the name as well. If you've spent any time on Twitter or X since Claude 3.5 came out, you might have seen people coding up complete games. Ali Miller, for example, provided a screenshot of the instructions for the classic Game Mancala and asked Claude 3.5 to read the instructions, code the game, and then preview it so she could test and play. It was able to successfully do this in a matter of seconds. There are literally dozens of examples as well of this type of game popping up as an example of what Cloud 3.5 Sonic can do.
Starting point is 00:11:20 Outside of the capacities, people took note of the speed. Perplexity writes Claude 3.5 Sonnet is now available on perplexity. With 2x faster speed than Opus, Claude 3.5 Sonnet unlocks new possibilities for complex AI applications across reasoning, knowledge, and coding tasks. Dan Shipper, who just put out a new product called Spiral, writes the really fun thing about building spiral as the product just got way better and cheaper to run, and we did nothing. Anthropic just dropped a new model that's smarter and lower cost. All we have to do is change one line of code to take advantage. Pretty crazy world.
Starting point is 00:11:50 While a lot of the focus, what's on the artifacts interface, some like Maratka and Koilin, also tested it for its reasoning capabilities, and found it much improved as well. So what about any critiques or skepticism? One thing I did see a little bit from the safety crowd, was summed up by Mikhail Semen, who wrote, as a reminder, Dario, the CEO of Anthropic, told multiple people Anthropics,
Starting point is 00:12:09 won't release models that push the frontier of AI capabilities. He then shared a couple of screenshots, seemingly referencing that point. Given that Claude 3.5 Sonnet does seem to be ahead of GBT 4-0, is this a betrayal of that? The take that I've seen most often in response to this is that while 3.5 Sonnet might be slightly better across a number of different dimensions, it's not some phase-shift revolutionary jump up. And so the spirit of what Dario is saying is not necessarily undermined by releasing a slightly more advanced model than the state of the art. Also, the information about him telling people that they won't release models that push the frontier is all secondhand, so it may not even be really true.
Starting point is 00:12:45 Wired continued their pattern of getting more and more negative about AI, with a piece titled, We're still waiting for the next big leap in AI. Anthropics' latest Claude AI model pulls ahead of rivals from OpenAI and Google, but advances in machine learning have lately been more incremental than revolutionary. This is functionally true. It's exactly the argument that I was just giving about, Dario saying that they wouldn't push the state of the art.
Starting point is 00:13:04 At the same time, I think this sort of critique from media rings a little hollow to me, one, as I said, I think Wired is now in a category of publications that is just looking for things to not like about artificial intelligence. And second, I think that it inherently fails to recognize how much additional value in terms of productivity, time saved, new opportunities, even quote-unquote incremental advances really represent. One of my bully pulpits here on this show is the idea that functionally the way that AI is impacting the world right now is not that we wake up and entire categories of jobs are gone, but that people are winning back their lives 20 minutes at a time. Ultimately, I think the big advance here, even though the updated model is
Starting point is 00:13:43 great, is really the artifacts interface. Professor Ethan Malik writes, the thing Anthropic is nailing is making their systems fun to use. ChatGPT has a lot of key features Claude is missing, web access, full code interpreter, voice Gptys, but it requires some trial and error to figure out since it isn't obvious. Even Gemini feels more complicated. For us, the consumers, it's basically nothing but upside. So if you haven't tried it out yet, go check out the latest Claude. It's available right now at clod.aI. And if you create anything really cool, please tag me on Twitter slash X so we can see what you did.
Starting point is 00:14:13 Like I said, we're dropping a new tutorial about this on Superintelligent today, so if you are a member, you will get access to that. All right, guys, thanks as always for listening or watching the show. And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.