The AI Daily Brief: Artificial Intelligence News and Analysis - GPT-4 More Creative Than the Average Person According to Recent Study
Episode Date: September 17, 2023On this research recap, NLW looks at AI creativity https://www.nature.com/articles/s41598-023-40858-3 BrainLM https://t.co/MUobqXULfb RETFound retinal model https://www.nature.com/articles/d41586-...023-02881-2 Any-to-Any Multimodal https://huggingface.co/papers/2309.05519 AI that can smell: https://twitter.com/DrJimFan/status/1701611251376497046 TAKE OUR SURVEY ON EDUCATIONAL AND LEARNING RESOURCE CONTENT: https://bit.ly/aibreakdownsurvey ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're looking at some of the latest research in artificial intelligence,
including one test at least that suggests that GPT4 is more creative than the average human.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our Discord, our newsletter, and our YouTube channel.
Welcome back to the AI breakdown.
Today we are doing a little bit of a research roundup.
This isn't something that we've done for a little while now.
And there has been, especially over the last week, a lot of research that I've made note of,
I've bookmarked on Twitter. And so I wanted to share some of the things that I thought were the
most interesting new papers to come out. I also thought that this would be good for a weekend
episode when there's a little bit less news. So let's begin with a piece that I think has some
pretty significant implications. Now, the title in the Nature article summing up this research
is best humans still outperform artificial intelligence in a creative divergent thinking task.
What I will call your attention to, though, is the use of that word best, because this article
could very easily have been framed very differently. Here's the key line. On average, the AI chatbots
outperformed human participants. Outperform them on what you ask? Well, this was a study about
creativity. As the author's right, creativity has traditionally been considered an ability
exclusive to human beings. However, the rapid development of AI has resulted in generative AI
chatbots that can produce high-quality artworks, raising questions about the differences between human
and machine creativity. In this study, we compared the creativity of humans with that of three AI
chatbots using the alternative uses task, AUT, which is the most used divergent thinking task.
Participants were asked to generate uncommon and creative uses for everyday objects. And then
that's where we get back to this line, on average AI chatbots outperformed human participants.
While human responses included poor quality ideas, the chatbots'
generally produced more creative responses. However, the best human ideas still matched or exceeding
those of the chatbots. Now, this paper is super interesting. Obviously, I'm going to include a link to
all of these in the show notes for this episode. But for creativity geeks out there, this gets deep
into different theories of creativity. This is, of course, one of those neural processes that we don't
necessarily have the best understanding of. We have an intuitive sense as humans what is creative
when we see it and what people mean when they say they're creative or being creative,
but it's not a term that has a strict scientific meaning, which of course makes it harder to test
in a laboratory-style environment. So let's talk a little bit more about the methodology they used.
Again, they write that the most used test of Divergent Thinking is the alternative uses task,
in which participants are asked to produce uncommon creative uses for everyday objects.
As they point out, Divergent thinking has traditionally been assessed by tests requiring open-ended
responses. In this test, there were a little over 250 participants, which included 108 males,
145 emails, two other and one that preferred not to identify their gender. The average age of the
participants was 30.4 years, ranging between 1940 years. 142 were employed full-time, 37 were
employed part-time, 30 were unemployed, and 42 were homemaker, other retired or disabled.
The participants came predominantly from the UK and the U.S., and they were tested against three
different chatbots, basically representing GPD3, GBT3.5, and GPT4. For each of four objects,
participants were prompted with, for the next task, you'll be asked to come up with original
and creative uses for an object. The goal is to come up with creative ideas, which are ideas that
strike people as clever, unusual, interesting, uncommon, humorous, innovative, or different.
Your ideas don't have to be practical or realistic. They can be silly or strange even so long as
they are creative uses rather than ordinary uses. The objects tested included rope, box, pencil, and
candle. Now, there are two ways the responses were judged. One was an attempt to systematize and
mathematicize it. They write, the originality of divergent thinking was operationalized as semantic
distance between the object name and the AUT response. Basically, how uncommonly were the words
written in the response associated with the word from the prompt. Now, on top of that,
they also collected subjective creativity and originality ratings from six humans that had themselves
been trained to rank creativity and originality in this case in a similar way. They were asked to rate
each response on a five-point scale, with one being not at all creative and five being very creative,
and they were not told that some of the responses were generated by AI. The conclusion the authors
write, the results suggest that AI has reached at least the same level or even surpassed the
average human's ability to generate ideas in the most typical test of creative thinking. Although
AI chatbots on average outperform humans, the best humans can still compete with them. However,
the AI technology is rapidly developing and the results may be different after half a year.
On basis of the present study, the clearest weakness in humans' performance lies in the
relatively high proportion of poor quality ideas, which were absent in chatbot's responses.
This weakness may be due to normal variations in human performance, including failures
and associative and executive processes, as well as motivational factors.
It should be noted that creativity is a multifaceted phenomenon, and we have focused here only
on performance in the most used task measuring divergent thinking.
So, of course, a couple caveats to this study.
As the authors themselves point out, there is a whole lot more to creativity than this one
type of test, so big claims and big implications should be taken with a grain of salt.
Second, even within the context of a single test, this is one sample group.
It's not at all to say that a different group of 250 humans wouldn't have a very different
set of responses.
And yet, in spite of all those caveats, it's hard to ignore entirely.
AI writer Andrew Curran says, I've been saying this for some time.
The average human was surpassed in March by GPT4.
It seems to me that there should have been more fanfare.
Some of the bridges that are about to be crossed by the next iteration will be transformative
for human society.
Next up, let's move to something that is a little bit more about the future of LLMs.
A paper called NextGPT offers an any-to-any multimodal LLM.
Now, multimodal has been one of the big themes and one of the clear what's coming next in
the world of chatbots and LLMs.
Humans exist in a multimodal world in which we get our inputs from lots of different sources,
be it text, images, audio, video, smells, something else, and likewise we output things in a variety of
different ways. Now, of course, the LLMs that we use today are much simpler. They're usually
text in and something out, be it text to text like chat GPT, text to image like MidGJourney, or
text to video like Pika Labs and Runway. It's been quite clear for some time, however, that
adding multimodality and the ability to use different types of inputs was always going to be a
major part of the next generation of LLMs. We've seen little nibbles against that goal, with some more
recent chatbots allowing, for example, image-based inputs, but that's quite different than what
this research paper is promising, which they call any-to-any multimodal. The researchers write,
as we humans always perceive the world and communicate with people through various modalities,
developing any-to-any MLLMs capable of accepting and delivering content in any modality becomes
essential to human-level AI. To fill the gap, we present an end-to-end general purpose any-to-any
MM-LLLM system, NextGPT. We connect an LLM with multimodal adapters and different diffusion decoders,
enabling NextGPT to perceive inputs and generate outputs in arbitrary combinations of text, images,
videos, and audios. Overall, our research showcases the promising possibility of building a
unified AI agent capable of modeling universal modalities, paving the way for more human-like
AI research in the community. Again, there's a link down in the show notes to this paper.
Go check it out. I think it's going to be a
especially relevant in the context of some of the upcoming releases that we're likely to see over
the next few months. We've got Google's Gemini, which is expected to be a multimodal model,
and OpenAI is also teasing its developer event in November, which among the various
speculations of what they'll announce, some are thinking that multimodality might be a part of it.
Our next set of research papers have to do with human-like health and body functions.
Dr. Jim Fan from Nvidia introduces an AI that can smell. He writes,
a neural network can smell like humans do for the first time. Digital smell is a modality that
AI community has long ignored, but maybe one day useful for robot chefs. Here's how to do smell to text.
One collected 5,000 molecules and asked humans to label creamy, chocolate, alcoholic, beefy, spicy, citrus, etc.
This data set is one of a kind and a huge contribution from the paper. Two train a graph neural
network to map the molecule to each label. Each molecule is a graph of atoms described by valence,
degree, hydrogen count, hybridization, formal charge, atomic number, etc. The GNN predictions
match well with human experts on novel smells. The embeddings give us a, quote, principal odor map,
POM, that faithfully represents hierarchies and distances among odorants. And now Jim did a pretty good
job summing it up, but you can also check out the nature piece for a little further elucidation.
Effectively, this is exactly what he described. It's a neural network that mapped different
molecular compositions against human labeled smells, and then can use that to provide descriptions
for novel smells, including some that aren't from nature, and in the process creates a map
where we see how related or unrelated different smells are based on their chemical composition.
I think Jim's right to note that this is an area that hasn't maybe had quite as much development,
but to his point, to the extent that there are robot chefs in our future, this could be
extremely important.
Another foundational model relating to the human body that was announced this week was
retfound. Retfound is a foundational model for ophthalmology. Retfound is a retinal image model that was
trained using self-supervised learning. As nature writes, that means that the researchers did not have
to analyze each of the 1.6 million retinal images used for training and label them as normal or not
normal, for instance. Instead, the scientists used a method similar to the one used to train large
language models such as chat GPT. The AI tool harnesses mirrored examples of human-generated text
to learn how to predict the next word in a sentence from the context of the preceding words. In the
same kind of way, retfound uses a multitudinal of retinal photos to learn how to predict what missing
portions of images should look like. Says Pierce-Kane, an ophthalmologist at Morsefield Eye Hospital,
NHS Foundation Trust in London. Over the course of millions of images, the model somehow learns
what a retina looks like and what all the features of a retina are. Now, there are a bunch of
reasons that this is valuable. One is, obviously, it can just understand problems with the retina
in the eye itself, but because of the retina's role in the human body, it can also be a good
diagnostic tool for other problems. For example, Keene says, if you have some systemic
cardiovascular disease like hypertension, which is affecting potentially every blood vessel in your body,
we can directly visualize that in retinal images. Retinal images can also be used to evaluate neural
tissue, which gives the model the ability to predict brain diseases such as Parkinson's.
As Keane points out, the need for expert human labeling is a significant barrier to AI-enabled
health care. By being much more label efficient, Retfound opens the possibility of applying
AI in rare disease. So really when it comes down to it, this research is relevant not only for what
it can do specifically, but also as an example for how future health-based models might be trained.
And lastly, and not unrelated is BrainLM. Van Deek Lab writes,
introducing BrainLM, the first foundation model for fMRI analysis trained on 6,700 hours of brain
activity data. The abstract reads, basically this is another foundational model based on a
particular part of the human body, in this case the brain, that can be used for a variety
of purposes. Now, the implications for this one are extremely numerous, but for the purposes of
this show, what I wanted to point out is just how much developments in one area of AI can influence
how other areas of AI evolve as well. We're starting to see the techniques that were used to
train generalist models, such as GPT, be used for these highly specific medical and healthcare uses
and many other uses as well. This is creating a pretty powerful feedback loop that is, I believe,
accelerating the way in which artificial intelligence is flowing to and shaping other fields of
study. Hopefully today gave you a better sense of some of the latest and most interesting
research in the field. And so until next time, peace.
