The AI Daily Brief: Artificial Intelligence News and Analysis - What AI Is Capable Of Today

Starting point is 00:00:00 Today on the AI breakdown, we're examining what tasks AI can actually do well. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown Network for more information about our YouTube or Discord and our newsletter. Hello friends. Happy weekend. And being a weekend, of course, this is a long reads episode. Two shills to get out of the way before we dive in. The first is, of course, for Super Intelligent. It's the new AI learning platform we just launched.

Starting point is 00:00:37 It's totally focused on useful, practical AI video tutorials. There are about 300 of them on the site now. We're adding 30 to 50 a week. If you are interested, go check it out at B-Supert.A.I. Second Schill actually relates to the content. We are going to be basing our episode today off of an essay by Professor Ethan Malik, who just published a great new book called Co-Intelligence, Living and Working with AI.

Starting point is 00:01:00 So if you like what you hear in this episode, definitely go pick that book up. All right. So the piece that we are working off of is one point. published by Ethan this week called What Just Happened, What is Happening Next. And really what it is, is a look at where we are from a capability standpoint. Now, part of the context for this that I think is interesting is that we have finally entered the era of GPT4 class models. It's not just GPT4 anymore. There are now models from Google and Anthropic that are firmly in that range. And really, up until OpenAI did its GPT4 turbo update this week,

Starting point is 00:01:31 Anthropics Claude 3 Opus was to some even a better model. But regardless of preference here and there, the point is, there we're no longer at GBT 3.5 level for everyone, or increasingly at GPT4 level for everyone. I think that makes it an interesting time to see what those types of models can do, and that's exactly the point of this piece. Ethan writes, The current best estimates of the rate of improvement in large language models shows capabilities doubling every 5 to 14 months. Then Ethan points out that he had that in mind when he was trying to finish his book, knowing that by the time it came out, it could easily be out of date. So then, in many ways, this essay represents an attempt to update his understanding of

Starting point is 00:02:06 what models can do currently. The first section of the essay is called, we still don't know the full capabilities of current frontier models. And here's the big point from it. He writes, while the release of the first models to clearly surpass GPT4 are expected to occur in the coming months, one thing we have learned is that we have not come close to exhausting the abilities of even existing AIs. In fact, he writes, it is often very hard to know what these models can't do because most people stop experimenting when an approach doesn't work. Yet careful prompting can often make an AI do something that seemed impossible. He pointed to an example of a programmer who made a $10,000 bet that GPTs would never be able to solve a certain type of problem, and within a day,

Starting point is 00:02:44 multiple people figured out how to get the AI to solve the problem. The point, Ethan says, this doesn't mean that systems are capable of every human task, but merely that it is hard to prove that they don't do something well from simple experiments. At the same time, there is no single best way to prompt AIs, making trying to show that AI can do something dependent on a mix of art, skill, and motivation. And this doesn't even include the added element of tool use. When you give AI access to things like Google search, the system can actually outperform humans at fact-checking, an area where AIs without tools are notoriously weak. The next section of the essay is perhaps even more interesting, and write in the conversation for those of you who have been spending any time on AI Twitter.

Starting point is 00:03:20 It's called More Signs of Superhuman Performance at Human Tasks. Again, Ethan writes, Over the past few months, we have begun to see more careful papers indicating that AIs can exceed human performance at very human tasks. A phenomenon some researchers call superhuman performance. To be fair, it isn't clear what superhuman performance is where AIs are concerned. Is it better than the average human, better than the best human? But it at least suggests capabilities that begin to have radical impacts. Famously, Sam Altman mentioned that AI may be capable of superhuman persuasion in the near future. New research suggests he might be right already. Ethan then talks about the study that we talked about on this show, where a

Starting point is 00:03:54 researchers found that in a three-round debate, with GPD4 arguing the opposite side of a person's opinions, AI was able to lower conspiracy theory beliefs in a significant and sustainable way. At the time, we discussed how, on the one hand, this was exciting and potentially a valuable use case of AI, but at the same time, also an indicator of just how persuasive it was, which is, of course, a neutral capability that can be used for good or ill. As Ethan writes, lowering conspiratorial beliefs might be a good use of AI persuasion, but the high levels of AI persuasiveness also suggests some concerns about what AIs might be able to talk humans into doing. As Altman wrote, highly persuasive bots, quote, may lead to some very strange outcomes,

Starting point is 00:04:30 which might be an understatement. I suspect major changes in the world of marketing at a minimum. Next, Ethan describes how AI, quote, seems to do all sorts of specialized tasks that it was not necessarily trained for at very high levels. He pointed to a test which was designed to be particularly challenging for AIs, where PhDs with access to the internet and unlimited time got 34% of questions right outside their specialty, and 65% to 75% on questions in their field. Claude 3 got 60% overall. Another study found that AI was better than physicians in processing medical data and doing clinical reasoning on real patient cases.

Starting point is 00:05:03 The paper authors of that study do not suggest that that means AI can replace doctors, but shows that, quote, LLMs are capable of mimicking some of the most powerful processes that we use to make diagnoses, processes that, until basically last year, we physicians thought were unique to us. Ethan then writes, paying attention to that conclusion, because you were going to see it in many places. Things that we thought were uniquely human a year ago are going to be done by machines in the coming years, often at a superhuman level. We need to be ready for that to occur while also watching out for the

Starting point is 00:05:30 biases and weaknesses of these systems that might be hidden by their seemingly impressive abilities. Professions that begin to prepare for this future will do better than those that ignore the coming change. Finally, Ethan examines the state of agents. For a year now, agents have been the hotness in AI, the thing that seemed perpetually just on the other side of the future. So, So what's the actual status of them? Well, Ethan writes, agents like Devon, the software developer, and open source alternatives, suggest that we are close to a new paradigm of AI use. He then makes this interesting argument about how agents might be the way that AI comes

Starting point is 00:06:02 into organizations. He writes, I increasingly suspect that agents may also be a key to integrating AI's with organizations, at least in the short term. Organizations are often reluctant to empower their workers to use AI, but agents fit more naturally into the existing structure of organizations. Because you assign them tasks like humans, they can work as AI. contract workers who are delegated jobs. Agents also represent the first break away from the chatbot and co-pilot models for interacting

Starting point is 00:06:25 with AI. There is something compelling about assigning a task to an agent like Devin, and then not worrying about it again because you know it is on it and will message you if it has any questions. Ultimately, this is how Ethan concludes his piece. One of the biggest questions about the current state of AI is whether the next generation of models are going to significantly improve over the existing one.

The AI Daily Brief: Artificial Intelligence News and Analysis - What AI Is Capable Of Today

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.