The AI Daily Brief: Artificial Intelligence News and Analysis - GPT-4 More Creative Than the Average Person According to Recent Study

Starting point is 00:00:00 Today on the AI breakdown, we're looking at some of the latest research in artificial intelligence, including one test at least that suggests that GPT4 is more creative than the average human. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our Discord, our newsletter, and our YouTube channel. Welcome back to the AI breakdown. Today we are doing a little bit of a research roundup. This isn't something that we've done for a little while now. And there has been, especially over the last week, a lot of research that I've made note of,

Starting point is 00:00:36 I've bookmarked on Twitter. And so I wanted to share some of the things that I thought were the most interesting new papers to come out. I also thought that this would be good for a weekend episode when there's a little bit less news. So let's begin with a piece that I think has some pretty significant implications. Now, the title in the Nature article summing up this research is best humans still outperform artificial intelligence in a creative divergent thinking task. What I will call your attention to, though, is the use of that word best, because this article could very easily have been framed very differently. Here's the key line. On average, the AI chatbots outperformed human participants. Outperform them on what you ask? Well, this was a study about

Starting point is 00:01:20 creativity. As the author's right, creativity has traditionally been considered an ability exclusive to human beings. However, the rapid development of AI has resulted in generative AI chatbots that can produce high-quality artworks, raising questions about the differences between human and machine creativity. In this study, we compared the creativity of humans with that of three AI chatbots using the alternative uses task, AUT, which is the most used divergent thinking task. Participants were asked to generate uncommon and creative uses for everyday objects. And then that's where we get back to this line, on average AI chatbots outperformed human participants. While human responses included poor quality ideas, the chatbots'

Starting point is 00:02:00 generally produced more creative responses. However, the best human ideas still matched or exceeding those of the chatbots. Now, this paper is super interesting. Obviously, I'm going to include a link to all of these in the show notes for this episode. But for creativity geeks out there, this gets deep into different theories of creativity. This is, of course, one of those neural processes that we don't necessarily have the best understanding of. We have an intuitive sense as humans what is creative when we see it and what people mean when they say they're creative or being creative, but it's not a term that has a strict scientific meaning, which of course makes it harder to test in a laboratory-style environment. So let's talk a little bit more about the methodology they used.

Starting point is 00:02:38 Again, they write that the most used test of Divergent Thinking is the alternative uses task, in which participants are asked to produce uncommon creative uses for everyday objects. As they point out, Divergent thinking has traditionally been assessed by tests requiring open-ended responses. In this test, there were a little over 250 participants, which included 108 males, 145 emails, two other and one that preferred not to identify their gender. The average age of the participants was 30.4 years, ranging between 1940 years. 142 were employed full-time, 37 were employed part-time, 30 were unemployed, and 42 were homemaker, other retired or disabled. The participants came predominantly from the UK and the U.S., and they were tested against three

Starting point is 00:03:19 different chatbots, basically representing GPD3, GBT3.5, and GPT4. For each of four objects, participants were prompted with, for the next task, you'll be asked to come up with original and creative uses for an object. The goal is to come up with creative ideas, which are ideas that strike people as clever, unusual, interesting, uncommon, humorous, innovative, or different. Your ideas don't have to be practical or realistic. They can be silly or strange even so long as they are creative uses rather than ordinary uses. The objects tested included rope, box, pencil, and candle. Now, there are two ways the responses were judged. One was an attempt to systematize and mathematicize it. They write, the originality of divergent thinking was operationalized as semantic

Starting point is 00:03:59 distance between the object name and the AUT response. Basically, how uncommonly were the words written in the response associated with the word from the prompt. Now, on top of that, they also collected subjective creativity and originality ratings from six humans that had themselves been trained to rank creativity and originality in this case in a similar way. They were asked to rate each response on a five-point scale, with one being not at all creative and five being very creative, and they were not told that some of the responses were generated by AI. The conclusion the authors write, the results suggest that AI has reached at least the same level or even surpassed the average human's ability to generate ideas in the most typical test of creative thinking. Although

Starting point is 00:04:39 AI chatbots on average outperform humans, the best humans can still compete with them. However, the AI technology is rapidly developing and the results may be different after half a year. On basis of the present study, the clearest weakness in humans' performance lies in the relatively high proportion of poor quality ideas, which were absent in chatbot's responses. This weakness may be due to normal variations in human performance, including failures and associative and executive processes, as well as motivational factors. It should be noted that creativity is a multifaceted phenomenon, and we have focused here only on performance in the most used task measuring divergent thinking.

Starting point is 00:05:12 So, of course, a couple caveats to this study. As the authors themselves point out, there is a whole lot more to creativity than this one type of test, so big claims and big implications should be taken with a grain of salt. Second, even within the context of a single test, this is one sample group. It's not at all to say that a different group of 250 humans wouldn't have a very different set of responses. And yet, in spite of all those caveats, it's hard to ignore entirely. AI writer Andrew Curran says, I've been saying this for some time.

Starting point is 00:05:39 The average human was surpassed in March by GPT4. It seems to me that there should have been more fanfare. Some of the bridges that are about to be crossed by the next iteration will be transformative for human society. Next up, let's move to something that is a little bit more about the future of LLMs. A paper called NextGPT offers an any-to-any multimodal LLM. Now, multimodal has been one of the big themes and one of the clear what's coming next in the world of chatbots and LLMs.

Starting point is 00:06:09 Humans exist in a multimodal world in which we get our inputs from lots of different sources, be it text, images, audio, video, smells, something else, and likewise we output things in a variety of different ways. Now, of course, the LLMs that we use today are much simpler. They're usually text in and something out, be it text to text like chat GPT, text to image like MidGJourney, or text to video like Pika Labs and Runway. It's been quite clear for some time, however, that adding multimodality and the ability to use different types of inputs was always going to be a major part of the next generation of LLMs. We've seen little nibbles against that goal, with some more recent chatbots allowing, for example, image-based inputs, but that's quite different than what

Starting point is 00:06:49 this research paper is promising, which they call any-to-any multimodal. The researchers write, as we humans always perceive the world and communicate with people through various modalities, developing any-to-any MLLMs capable of accepting and delivering content in any modality becomes essential to human-level AI. To fill the gap, we present an end-to-end general purpose any-to-any MM-LLLM system, NextGPT. We connect an LLM with multimodal adapters and different diffusion decoders, enabling NextGPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audios. Overall, our research showcases the promising possibility of building a unified AI agent capable of modeling universal modalities, paving the way for more human-like

Starting point is 00:07:28 AI research in the community. Again, there's a link down in the show notes to this paper. Go check it out. I think it's going to be a especially relevant in the context of some of the upcoming releases that we're likely to see over the next few months. We've got Google's Gemini, which is expected to be a multimodal model, and OpenAI is also teasing its developer event in November, which among the various speculations of what they'll announce, some are thinking that multimodality might be a part of it. Our next set of research papers have to do with human-like health and body functions. Dr. Jim Fan from Nvidia introduces an AI that can smell. He writes,

Starting point is 00:08:00 a neural network can smell like humans do for the first time. Digital smell is a modality that AI community has long ignored, but maybe one day useful for robot chefs. Here's how to do smell to text. One collected 5,000 molecules and asked humans to label creamy, chocolate, alcoholic, beefy, spicy, citrus, etc. This data set is one of a kind and a huge contribution from the paper. Two train a graph neural network to map the molecule to each label. Each molecule is a graph of atoms described by valence, degree, hydrogen count, hybridization, formal charge, atomic number, etc. The GNN predictions match well with human experts on novel smells. The embeddings give us a, quote, principal odor map, POM, that faithfully represents hierarchies and distances among odorants. And now Jim did a pretty good

Starting point is 00:08:42 job summing it up, but you can also check out the nature piece for a little further elucidation. Effectively, this is exactly what he described. It's a neural network that mapped different molecular compositions against human labeled smells, and then can use that to provide descriptions for novel smells, including some that aren't from nature, and in the process creates a map where we see how related or unrelated different smells are based on their chemical composition. I think Jim's right to note that this is an area that hasn't maybe had quite as much development, but to his point, to the extent that there are robot chefs in our future, this could be extremely important.

Starting point is 00:09:16 Another foundational model relating to the human body that was announced this week was retfound. Retfound is a foundational model for ophthalmology. Retfound is a retinal image model that was trained using self-supervised learning. As nature writes, that means that the researchers did not have to analyze each of the 1.6 million retinal images used for training and label them as normal or not normal, for instance. Instead, the scientists used a method similar to the one used to train large language models such as chat GPT. The AI tool harnesses mirrored examples of human-generated text to learn how to predict the next word in a sentence from the context of the preceding words. In the same kind of way, retfound uses a multitudinal of retinal photos to learn how to predict what missing

Starting point is 00:09:53 portions of images should look like. Says Pierce-Kane, an ophthalmologist at Morsefield Eye Hospital, NHS Foundation Trust in London. Over the course of millions of images, the model somehow learns what a retina looks like and what all the features of a retina are. Now, there are a bunch of reasons that this is valuable. One is, obviously, it can just understand problems with the retina in the eye itself, but because of the retina's role in the human body, it can also be a good diagnostic tool for other problems. For example, Keene says, if you have some systemic cardiovascular disease like hypertension, which is affecting potentially every blood vessel in your body, we can directly visualize that in retinal images. Retinal images can also be used to evaluate neural

Starting point is 00:10:27 tissue, which gives the model the ability to predict brain diseases such as Parkinson's. As Keane points out, the need for expert human labeling is a significant barrier to AI-enabled health care. By being much more label efficient, Retfound opens the possibility of applying AI in rare disease. So really when it comes down to it, this research is relevant not only for what it can do specifically, but also as an example for how future health-based models might be trained. And lastly, and not unrelated is BrainLM. Van Deek Lab writes, introducing BrainLM, the first foundation model for fMRI analysis trained on 6,700 hours of brain activity data. The abstract reads, basically this is another foundational model based on a

Starting point is 00:11:06 particular part of the human body, in this case the brain, that can be used for a variety of purposes. Now, the implications for this one are extremely numerous, but for the purposes of this show, what I wanted to point out is just how much developments in one area of AI can influence how other areas of AI evolve as well. We're starting to see the techniques that were used to train generalist models, such as GPT, be used for these highly specific medical and healthcare uses and many other uses as well. This is creating a pretty powerful feedback loop that is, I believe, accelerating the way in which artificial intelligence is flowing to and shaping other fields of study. Hopefully today gave you a better sense of some of the latest and most interesting

Starting point is 00:11:45 research in the field. And so until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - GPT-4 More Creative Than the Average Person According to Recent Study

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.