Instant Genius - AI’s fight to understand creativity, with Ahmed Elgammal

Episode Date: May 11, 2023

Artificial intelligence has seeped into the art world, creating incredible paintings, winning art competitions, and turning amateurs into Picasso. But how does it work, and can it really replace artis...ts? We spoke to Ahmed Elgammal, a professor of computer science at Rutgers University to find out. Learn more about your ad choices. Visit podcastchoices.com/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Study and play. Come together on a Windows 11 PC. And for a limited time, college students get the best of both worlds. Get the Unreal College deal, everything you need to study and play with select Windows 11 PCs. Eligible students get a year of Microsoft 365 premium and a year of Xbox GamePass Ultimate
Starting point is 00:00:20 with a custom color Xbox wireless controller. Learn more at Windows.com slash student offer. While supplies last, ends June 30th, turns at AKA.m.m.S. college PC. This podcast is sponsored by name, audio and focal. Streaming has made music more accessible than ever, but true listening is about more than ease.
Starting point is 00:00:40 It's about quality. British audio experts name audio, alongside French acoustic specialist focal, combine handcrafted tradition with cutting-edge innovation and high-end materials, delivering digital precision with analogue warmth. So you can experience exceptional sound at home. Music just as the artist intended. Visit name or something.
Starting point is 00:01:00 Audio.com to learn more. From BBC Science Focus magazine. This is Instant Genius, a bite-sized masterclass in podcast form. I'm Alex Hughes, staff writer at BBC Science Focus magazine. This week, we're talking about AI art. In the past year, we've seen the internet flooded with a new type of art, but it isn't made by humans. Thanks to AI-powered software, anyone can create images. based just on a worded prompt.
Starting point is 00:01:37 Want a cartoon of a cat fighting a dinosaur? Easy. Need to see two teddy bears in Paris in the style of a rom-com poster? It's just one click away. But how does this all work? And what does it mean for the future of art and the artist creating it? I'm joined by Ahmed Elgamal to discuss this topic. He's the founder of the AI Art Tool, Playform,
Starting point is 00:01:59 and a professor of computer science at Rutgers University. this episode, he explores everything from how AI art platforms are trained through to its effect on the art world and its lack of truly creative understanding. So I imagine by now a lot of people have tried out an AI image generator in some form or seen how they work. But what is the, I guess, the actual process of AI creating an image when I search a prompt, what's actually happening? I would like to start by a little bit of history behind that, how AI is generate images. About five, six years ago, there was some advancement, AI code generate adversarial networks gangs where you can give it some images and it can try to generate more
Starting point is 00:02:49 images similar to what you give it. So give it images of cats. It can give you images of more cats. So that makes a revolution of using artificial intelligence in image generation and many artists and creative take notice and start using that. However, there was some issue of how can you control the AI generation. So here came another generation of image generator that use text to generate images. We can debate about whether this is a good choice or not. We can discuss this. But how this work.
Starting point is 00:03:23 Basically, these models are trained on lots of images and their text caption. So it tries to understand how the text caption relate to the images. So when you have an image of a bird on a tree, then try to guess what is the image, where is the tree and where is the bird. And doing that from billions of images, it tried to figure out which part of the image relate to which words.
Starting point is 00:03:50 So now, after trading these models on billion of images for hours and days and weeks, if we give it a text, it will first analyze the text using the same way, Shad GBT and other language model analyzed text to come up with mathematical representations. of this sentence you give it. And then it tried to correlate every word with wearing the image
Starting point is 00:04:12 that relates to based on what I've seen in all this data trained on. And it will try to generate an image basically based on this sentence. And I think people that have seen the words come from AIR, a lot of it looks incredible. It looks incredibly realistic. But I'm interested by what it is that stops it achieving, I guess, more complicated shapes. There's a lot of times where hands, there'll be an extra finger or there'll be a nose flying out of someone's ear or something. I'm curious why there's certain parts that it really struggles with. Definitely.
Starting point is 00:04:47 This model still struggles with small details. Anything that has small details, it will have hard time generating. Because basically the way these models are trained, as you are trained to optimize some, what's called loss function, which is basically a criteria that it tried to optimize. And usually this criteria is all over the image. So I try to get most of the image correct. So when you try to get most of the image correct,
Starting point is 00:05:12 you kind of neglect small details. I will try to get most of the image correct, but there are some small details that we as human are very tuned to catch, like a hand with four fingers or a three-legged person or a face that has a strange or unsymmetric features. we are very good at catching that. However, for the AI, it really doesn't know a difference between these small details and any other small details in the background or in an unrelevant area
Starting point is 00:05:41 that we as human cannot notice. So the current generation of these models struggle with that because this is the way it was trained. But I'm sure in the next few months you will come up with new models that have been trained to take into consideration these small details that are different. important to human, like where are the hands, the bows, the face, all these are important, and the model can add that into the optimization. So the criteria will now capture these issues. And we're talking a lot about the training here, which, as you mentioned, is based on, you know,
Starting point is 00:06:18 millions and millions of images. I'm curious what that's like in terms of an energy consumption kind of situation. Is it most of the energy for these is generated in the training process and then when you're actually doing searches, it's heavily decreasing from that point on or is it a constant similar output? Definitely training the AI. These models takes a lot of energy. You need to run this on GBIUs for days and weeks over billions of images and probably you have to rerun it many, many times to optimize the process and gets its factory results. However, even after training these models to be able to generate an image, these models need to be running on a GPU, what's called graphical processing units,
Starting point is 00:07:04 which is a specialized piece of software that are typically used for gaming. It has millions of, actually a thousand of processors, small processors. And these are very energy-consuming devices. And running this model to generate require that you run these GBUs. And if you are serving thousands of users, you need to have multiple of these GBUs running 24 hours. And that definitely has a significant energy consumption and environmental impacts. And when we're doing these training processes, I think something that OpenAI discussed quite openly when they were working in their Dali project
Starting point is 00:07:43 was the issues of, I guess, biases or incorrect. information or a tendency to lean into certain cultures that would leak in from the information they were collecting from. Is there a better solution to that problem? Is it a case of needing a list of, I guess, approved content that's been looked through or that covers a base of different cultures and beliefs? That's a very important issue. How can we control the data given into the AI. And that's very critical because there are different opinions on the world about everything, politics, religion, life, style of life, everything. So we cannot censor the data that are given to the AI to weigh certain voices more or less. That's very unfair. I think
Starting point is 00:08:33 AI naturally has to reflect different opinions of world, different viewpoint in the world, different culture, different religion, different political views. And AI has to learn on all the that we cannot really censor that this will come with a lot of misinformation that we live in. That's part of our life. The same way we look at your feed in a social media platform and you can filter out or kind of guess that this is false information or this is true information, we have to be trained that AI can generate false information because this is how it works. It really digest what we give it and be rendered it in its language.
Starting point is 00:09:13 so we should not take whatever AI give us as proof. It's basically a rendering of what's out there in a new way. It can be valid, can be misvalid. This is how we can look at AI generation. AI right now has no way to tell facts from fiction. For it, everything is just words. And once we start talking about facts, that's a big problem. What are the facts and what are the opinions?
Starting point is 00:09:41 that's really a harder thing to do. I think one of the, I guess, other big issues that's been raised a lot is around the rules and regulations of how this technology can be used, you know, in terms of who owns copyright, who owns the images that were put into originally. And I guess it's the use of it when it's monetized. It's, I guess you could say, you know, a sort of a Wild West situation right now. Do you think this is something that laws will clamp down on it over time? or is it kind of at its peak when it's unregulated? The copyright issue is a new issue that comes with the current generation of image generators that are mainly trained on billions of images.
Starting point is 00:10:24 However, this issue was not the case a couple of years ago when artists used to use AI through certain models like GANS, where you can actually train your own model using your own images. So the copyright issue was not that big at the time. The copyright issue comes now with the fact that you're trading this model on billions of images taken from the Internet without consent of the artist or the people who make the images. And now you can regenerate these images or regenerate a bastiche of these images, a mix of these images. And that is very problematic because in one sense it's unethical. In the other sense is it is not violating the copyright law because what you're generating is usually.
Starting point is 00:11:08 a transformative version of the image, not a direct derivative. So under any copyright law, usually this will not be a problem. But it's unethical still. So this is a big problem because now the copyright issue is a three-party problem. You have the person who generating the image might be violating the copyright of some other artist that he doesn't know about. And then the third party in this is the developer of the AI, the company. who developed the AI and trained that system based on these images, how that causes a problem.
Starting point is 00:11:46 So it's a three-way copyright problem now, which is a very new situation. It's a big problem, and I don't think the copyright laws right now is capable of doing anything about it. And there are some initiatives to clean up the dataset that are for training and use artist content and things like that. but I hardly see how this will work. We're talking about billions of images. The latest, they said that I've been used to train these models are like 5 billion images and probably soon going to see more than that. There are some incentives to ignore this problem from these big companies who train these models
Starting point is 00:12:26 and from some users who use these models, very happy about being able to generate images in the style of their favorite artist. they cannot get code. So it's causing a mess, basically, at this point. However, I have to remind everybody that this is not the way it's supposed to be. You can use AI to generate images by training it's using your own data and your own images without having to violate things. And even I have concern about these text prompting as a way to communicate with AI system,
Starting point is 00:12:58 is it the best way to generate an image to tell the AI what you want using text? is what the artist wants. Maybe consumer would like to do that. Somebody who's not an artist, who I'd like to generate an image, you need to write down a text, describe what do you want. But an artist wouldn't do that.
Starting point is 00:13:17 Artists are a visual thinker. Artists prefer to think visually, not through text. The fact that you want to describe what the art will look like is very foreign for an artist. That's not how artists work. So that fact of controlling the generation using text has many problems.
Starting point is 00:13:36 Copyright is one of them. But as a creative process, it's also very unnatural and very restrictive, restricting for many artists. I see it as a consumer-facing technology, more than artists-facing technology. And these problems with copyright is really fundamental. Ambition comes in all shapes and sizes. At First Citizens Bank, we roll with your goals,
Starting point is 00:14:02 because we're built for what you're building fit for your ambition for Citizens Bank. You said this place was steps from the water. We just haven't found the steps yet. How much did we save? Enough. Enough to get lost. Or you could book a stay with Hilton.
Starting point is 00:14:23 Welcome to your ocean front room. Just steps from the water. The Hilton sale is on now. Book on Hilton.com or the Hilton app and save up to 20% to get the stay you expected. When you want savings, not surprises. It matters where you stay. Hilton, for the stay.
Starting point is 00:14:40 This podcast is sponsored by Name, Audio and Focal. With over 100 years of combined expertise, Name and Focal have been bringing music to listeners just as the artist intended. Since day one, this mantra has shaped every innovation in high-fi design, technology and acoustic engineering, balancing craftsmanship and tradition with pioneers. engineering thinking. Name audio pushes cutting-edge technology to ensure digital precision
Starting point is 00:15:09 whilst sustaining Pratt, pace, rhythm and timing, the elusive quality that makes music feel alive and gives it emotional texture. Today, in partnership with French acoustic specialist's focal, name audio creates systems that deliver exceptional sound and unforgettable listening experiences at home. Try it for yourself at a focal powered by name boutique. Visit vocal powered by name.com for more information. So do you think the products that are, I guess, best known currently, they're all designed for consumers? Is there an option or a chance to also make a product that is more for artists
Starting point is 00:15:53 and is better aimed at artists? There are really important that are aiming at artists. Platform AI, for example, that is a platform that we developed back in 2019 before the current generation of AI text prompting generator. And that allows artists to train their own AI based on their own data without having to have any copyright issue, without having to be restrictive by text prompting or thinking through a text pipeline. So this platform have been around.
Starting point is 00:16:24 Now, I think the new generation of text to image platform are very popular because it's more facing consumers. and it bring AI more easy, make it easier accessible to the masses who can rest, write a text and generate an image. But arts have been using AI the last five years, even before these platforms and generating amazing things that have been exhibited in exhibitions and museums. And one thing, when I also mention,
Starting point is 00:16:52 that the use of AI before in making art has been welcomed in the art market, in exhibitions, in galleries, But now with this new generation of text to image generator, you have seen that there is a big debate about that many artists would start to ban AI images from their platform like in Reddit and Discord and others. This issue was not there before in the last five years. Many artists have been using AI without a problem and has been welcomed because they have been using AI based in their own images, in their own data, it's a creative process, it's a conceptual art process. that was you have been welcomed. But now with when you're dealing with a big model
Starting point is 00:17:33 that I've been trained on billion of images and you just basically, your job as an artist is to reverse engineer the system to generate an image by plugging certain keywords to manage the system. That's a very little creative process.
Starting point is 00:17:48 The creativity here becomes really a reverse engineering process of the system, not really a visual thinking or a natural way for artists. We've, I think mostly today touched on AI and its ability to make images, but it's been proven to create, I guess, text, music,
Starting point is 00:18:08 a bunch of different forms of art. Is there, in your eyes, a form of, I guess, digital creation that it would struggle to create or struggle to match humans in? I think most of the contents that we can digitize, whether it's text, music, images, videos, anything that we can digitized into digital format, AI can be trained to generate such as such content. Does that mean that AI is good at generating them?
Starting point is 00:18:38 It can be, it can be not. Anything that require higher level of semantics, AI might struggle. For example, AI can create boitry. Yes, we try tragedy with you and can generate some nice poetry, but definitely the level of sophistication of these poems that generated by AI is really very limited, maybe better than me and you're writing poems, but if you read them, it's very naive, it's not really thoughtful. That's where human creativity
Starting point is 00:19:08 is still surpass AI and will for a long time, writing a novel. Yes, I can write a novel, but it will be nothing compared to a great novel by a great writer. Because one very important thing, AI, we have to realize, AR doesn't generate, art. AI generate images. So when or music or notes, but not art. Making an image doesn't make you an artist. What makes an image and art is the artist behind the scene who use AI to make that image. That's when it's called art. So there is a human behind the scene that will use this machine, that's this power to generate images and iterate and keep going and select and curate from the outcome. This whole process is what art making is. The AI is just filling in one
Starting point is 00:20:00 part of that, which is a tool that can generate this content. The same thing applies when you, if you are a creative writer, yes, you might use AI to write certain paragraphs for you, but at the end, you are the one who have the idea in your head, and you are the one that want to create this whole novel or this whole story. AI is going to be just a way to help you. One thing also have to remember that what art is. Obviously, art and literature are ways to communicate between people. At the end, the art communication between human. So AI can generate things, but AI doesn't have a message to communicate. At the end, the message that communicates comes again from the human behind the scene who write the novel or write the story or write the music or make the art.
Starting point is 00:20:51 And is there something that AI could learn to replicate? Can it learn to experience that creativity and that flare? Or is it always going to just be replicating a prompt as accurately as it can and being used more as a tool for someone else's view of creativity? I believe the current generation of AI is limited to creating content that imitate human content at large and has to be controlled largely by human to can be. it's something useful. It can be a great tool, but it is not yet there where it can, has its
Starting point is 00:21:30 own conscious about the world. This is really a big issue. We are able to be an artist or a writer. You have to have a conscious about what's happening in the world and you have an opinion about what's happening in the world and your voice. And you use art and literature to communicate these ideas. The AI doesn't have that. As long as you have a bit, as AI does have a conscious, it cannot be an artist. This is basically
Starting point is 00:21:55 as simple as that. It can be a great tool for artists to use. But I cannot call them an artist or a musician. A couple of years ago, we use AI in a project
Starting point is 00:22:06 to generate Beethoven 10th Symphony. And by the way, this was using the same kind of language models that are used behind Shad, GBT, and others. So we train this
Starting point is 00:22:17 AI on lots of classical music Breyer to Beethoven and using Beethoven body of work. And then the AI look at the sketches that Beethoven left for a 10th symphony and it tried to generate completions of these notes
Starting point is 00:22:32 and harmonization and orchestration. And that project was a great example of how human use AI in making something interesting. So at the end, AI was a tool in the process. The composer and
Starting point is 00:22:48 the team tell it what to do. I tell it, here's a couple of melodies, complete this for me, and then they take that and fit it into the movement where it fits. And then I trade back and go to the AI again and ask them to orchestrate certain pieces or combine certain pieces. So it was an elaborate process of back and forth between the human and the AI until that work is done. I see this as a great model of what's happening now in the world, basically, everything happening now in terms of using AI. is a cycle, a creative process that mainly human is in charge, and AI is following the rules to generate content for that human to fit into his project or her project. In the future, do you think we're going to see AI developing in something that's, I guess,
Starting point is 00:23:41 more advanced, but less creative? So OpenAI was working on a project called Point E, which was a method of generating 3D shapes from prompts in the same way that ChatGBT or Dali works. Is there a room where we could start to see an AI generator that's creating full 3D environments or entire worlds that could be explored? Yes, and this will be really the interesting thing to see. Because these are tasks that are very time-consuming to generate a 3-D environment in graphics for gaming or for virtual environment, is very time-consuming to generate.
Starting point is 00:24:17 and if we have the AI power and the machine power to generate that for us based on our instruction, that will be really great. Another example is when you're creating a video, when you create a video showing certain things, it takes a lot of time and effort to shoot these videos and edit it and montage it and do it as human. So any tool that can really create video clips for you showing what you want will be really great. all these massive tasks that human do and takes a lot of time for human, I think AI will play a big role in helping human doing that. But at the end, the creative process, the human is going to be like the creative director. And AI tools will be more of a kind of people, it's not people, basically more of a slave, digital slaves working for that human to, that creative director to
Starting point is 00:25:14 create whatever they want, whether it's music or art or literature. And fortunately, the AI is not conscious, so having a digital slave in that sense is totally fine. It's not unethical about that. I think a good thing about that also, it kind of make this possible, make it possible for artists who doesn't have the means to have a studio assistant or somebody to work with you in a workshop, to have access to this technology so they really can grow and do what they want. In the best,
Starting point is 00:25:48 I mean, we have seen artists like Andy Warhol and others have big workshops of creators working for them and doing what they want. But an emerging artist doesn't have this ability. So you can use AI as an emerging artist to really
Starting point is 00:26:04 help you create things at scale. And a couple of years ago, we did a study where we asked artists who used to use AI at a time that was like four or five years ago when AI was very limited compared to what it is now, why be using AI what they found as a value of using AI? And we found that there are mainly two things artists like in AI. First thing is the fact it creates novel ideas that we didn't think about.
Starting point is 00:26:32 The AI look at the world in different eyes from our human eyes and can give them new ideas. We like that very much. The other thing is the volume, the AI really can create lots or lots of assets and images for them that they can use in the projects. And this is something that takes a lot of time and they have to hire assistants sometime to do that job. So the fact that this can be done very fast using AI is very valuable. These are the two value propositions in using AI in the creative process. The text can give you new ideas out of the box and that it can give you creative volume. on lots of data.
Starting point is 00:27:12 However, looking at what's happening now with text prompting as a way to generate, I think we are losing the first one. We are losing the fact that AI has been, give us ideas out of the box or out of ordinary, because now AI is constrained by our language. It looks at the word from
Starting point is 00:27:29 the lens of our own language. So we added a constrain that we limit the AI ability to be imaginative or be generating interesting concept visually. However, that's very useful in other contexts, because if you are using AI to generate something linguistic, text or something very structured like music, that's
Starting point is 00:27:53 very important to have language in the process. So I think we have a long way to go in terms of how AI can fit the creative process for different artists, and what we see now is just still early stages of what is possible. When you look back through history, you know, we've had a huge row of art movements throughout time, cubism, impressionism,
Starting point is 00:28:16 realism, all these different stages throughout history. Is there, I guess, an argument that AI is simply the latest
Starting point is 00:28:23 art movement, or is it something more than that? I think in the last five years, we have seen this, and I think ended already,
Starting point is 00:28:31 where the early artists who have been using AI in the last five years, especially using GANS, have specific aesthetics in their world, because of AI has been in what's called the Caney Valley.
Starting point is 00:28:45 It carries these cany-looking images. And if you look at most of the art have been engineered by many artists in that period, it has this look, it has this style, it has this feeling and aesthetics. I think now AI becoming more photorealistic, very good at generating photorealistic images, very good at generating graphics,
Starting point is 00:29:03 very good at emitting styles. And it lost this ability to be surprising and uncanny and have this surreal effects. So these last five years, some people even give it a name. I remember some people called GANism. It was GAN. But I think this is gone.
Starting point is 00:29:23 It is already as a period in our history of using technology in making art. It was a really amazing period and it's already gone. Now it's AI is coming to become more of a tool for everybody to generate a photorealistic image, a graphic design, a logo. but it lost its unique aesthetics it used to have. Thank you for listening to this episode of Instant Genius. That was Ahmed Elgamal talking about AI art.
Starting point is 00:29:53 The Instant Genius podcast is brought to you by the team behind BBC Science Focus magazine, which you can find on sale now in supermarkets and newsagents, as well as on your preferred app store. Alternatively, you can come and find us online at sciencefocus.com. This podcast is sponsored by Name, Audio and Focal. The texture and emotional depth of music can be lost through digital sources or poor signal. Name Audio believes you can have digital precision with analogue warmth. Alongside French acoustic specialist focal,
Starting point is 00:30:39 Name creates high-end audio systems combining innovation with craftsmanship, so you can listen to music, just as the artist intended. Discover more at Name Audio.com. com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.