The AI Daily Brief: Artificial Intelligence News and Analysis - What AI Is Capable Of Today
Episode Date: April 13, 2024A discussion inspired by https://www.oneusefulthing.org/p/what-just-happened-what-is-happening ** CHECK OUT THE JUST-LAUNCHED SUPERINTELLIGENT PLATFORM - 300+ AI video tutorials https://besuper.ai/ **... ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're examining what tasks AI can actually do well.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown Network for more information about our YouTube or Discord and our newsletter.
Hello friends. Happy weekend.
And being a weekend, of course, this is a long reads episode.
Two shills to get out of the way before we dive in.
The first is, of course, for Super Intelligent.
It's the new AI learning platform we just launched.
It's totally focused on useful, practical AI video tutorials.
There are about 300 of them on the site now.
We're adding 30 to 50 a week.
If you are interested, go check it out at B-Supert.A.I.
Second Schill actually relates to the content.
We are going to be basing our episode today off of an essay by Professor Ethan Malik,
who just published a great new book called Co-Intelligence,
Living and Working with AI.
So if you like what you hear in this episode, definitely go pick that book up.
All right.
So the piece that we are working off of is one point.
published by Ethan this week called What Just Happened, What is Happening Next.
And really what it is, is a look at where we are from a capability standpoint.
Now, part of the context for this that I think is interesting is that we have finally entered the era of
GPT4 class models. It's not just GPT4 anymore. There are now models from Google and Anthropic
that are firmly in that range. And really, up until OpenAI did its GPT4 turbo update this week,
Anthropics Claude 3 Opus was to some even a better model. But regardless of preference here and there,
the point is, there we're no longer at GBT 3.5 level for everyone, or increasingly at GPT4 level for
everyone. I think that makes it an interesting time to see what those types of models can do,
and that's exactly the point of this piece. Ethan writes,
The current best estimates of the rate of improvement in large language models shows capabilities
doubling every 5 to 14 months. Then Ethan points out that he had that in mind when he was trying
to finish his book, knowing that by the time it came out, it could easily be out of date.
So then, in many ways, this essay represents an attempt to update his understanding of
what models can do currently. The first section of the essay is called, we still don't know the
full capabilities of current frontier models. And here's the big point from it. He writes,
while the release of the first models to clearly surpass GPT4 are expected to occur in the coming
months, one thing we have learned is that we have not come close to exhausting the abilities of even
existing AIs. In fact, he writes, it is often very hard to know what these models can't do
because most people stop experimenting when an approach doesn't work. Yet careful prompting can often make an
AI do something that seemed impossible. He pointed to an example of a programmer who made a $10,000
bet that GPTs would never be able to solve a certain type of problem, and within a day,
multiple people figured out how to get the AI to solve the problem. The point, Ethan says,
this doesn't mean that systems are capable of every human task, but merely that it is hard
to prove that they don't do something well from simple experiments. At the same time, there is no
single best way to prompt AIs, making trying to show that AI can do something dependent on a mix of art,
skill, and motivation. And this doesn't even include the added element of tool use. When you give AI
access to things like Google search, the system can actually outperform humans at fact-checking,
an area where AIs without tools are notoriously weak. The next section of the essay is perhaps even more
interesting, and write in the conversation for those of you who have been spending any time on AI Twitter.
It's called More Signs of Superhuman Performance at Human Tasks. Again, Ethan writes,
Over the past few months, we have begun to see more careful papers indicating that AIs can
exceed human performance at very human tasks. A phenomenon some researchers call superhuman performance.
To be fair, it isn't clear what superhuman performance is where AIs are concerned.
Is it better than the average human, better than the best human? But it at least suggests
capabilities that begin to have radical impacts. Famously, Sam Altman mentioned that AI may
be capable of superhuman persuasion in the near future. New research suggests he might be right
already. Ethan then talks about the study that we talked about on this show, where a
researchers found that in a three-round debate, with GPD4 arguing the opposite side of a person's
opinions, AI was able to lower conspiracy theory beliefs in a significant and sustainable way.
At the time, we discussed how, on the one hand, this was exciting and potentially a valuable
use case of AI, but at the same time, also an indicator of just how persuasive it was,
which is, of course, a neutral capability that can be used for good or ill. As Ethan writes,
lowering conspiratorial beliefs might be a good use of AI persuasion, but the high levels
of AI persuasiveness also suggests some concerns about what AIs might be able to talk humans into doing.
As Altman wrote, highly persuasive bots, quote, may lead to some very strange outcomes,
which might be an understatement. I suspect major changes in the world of marketing at a minimum.
Next, Ethan describes how AI, quote, seems to do all sorts of specialized tasks that it was not
necessarily trained for at very high levels. He pointed to a test which was designed to be
particularly challenging for AIs, where PhDs with access to the internet and unlimited time
got 34% of questions right outside their specialty, and 65% to 75% on questions in their field.
Claude 3 got 60% overall.
Another study found that AI was better than physicians in processing medical data and doing
clinical reasoning on real patient cases.
The paper authors of that study do not suggest that that means AI can replace doctors,
but shows that, quote, LLMs are capable of mimicking some of the most powerful processes
that we use to make diagnoses, processes that, until basically last year, we physicians thought
were unique to us.
Ethan then writes,
paying attention to that conclusion, because you were going to see it in many places. Things that
we thought were uniquely human a year ago are going to be done by machines in the coming years,
often at a superhuman level. We need to be ready for that to occur while also watching out for the
biases and weaknesses of these systems that might be hidden by their seemingly impressive abilities.
Professions that begin to prepare for this future will do better than those that ignore the coming
change. Finally, Ethan examines the state of agents. For a year now, agents have been the hotness
in AI, the thing that seemed perpetually just on the other side of the future. So,
So what's the actual status of them?
Well, Ethan writes, agents like Devon, the software developer, and open source alternatives,
suggest that we are close to a new paradigm of AI use.
He then makes this interesting argument about how agents might be the way that AI comes
into organizations.
He writes, I increasingly suspect that agents may also be a key to integrating AI's
with organizations, at least in the short term.
Organizations are often reluctant to empower their workers to use AI, but agents fit more naturally
into the existing structure of organizations.
Because you assign them tasks like humans, they can work as AI.
contract workers who are delegated jobs.
Agents also represent the first break away from the chatbot and co-pilot models for interacting
with AI.
There is something compelling about assigning a task to an agent like Devin, and then not worrying
about it again because you know it is on it and will message you if it has any questions.
Ultimately, this is how Ethan concludes his piece.
One of the biggest questions about the current state of AI is whether the next generation
of models are going to significantly improve over the existing one.
